-
dolly
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
Python ★ 11k 3y agoExplain → -
dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
Python ★ 475 1d agoExplain → -
dbx
🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
Python ★ 462 2mo agoExplain → -
dqx
Databricks framework to validate Data Quality of pySpark DataFrames and Tables
Python ★ 424 2d agoExplain → -
tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
Jupyter Notebook ★ 342 2mo agoExplain → -
mosaic
An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.
Jupyter Notebook ★ 325 1mo agoExplain → -
ucx
Automated migrations to Unity Catalog
Python ★ 308 8d agoExplain → -
dlt-meta
Metadata driven Spark Declarative Pipelines framework for bronze/silver pipelines
Python ★ 264 2d agoExplain → -
overwatch ▣
THIS PROJECT IS DEPRECATED. Capture deep metrics on one or all assets within a Databricks workspace
Scala ★ 230 5mo agoExplain → -
cicd-templates ▣
Manage your Databricks deployments and CI with code.
Python ★ 203 3y agoExplain → -
migrate
Tools to migrate Databricks assets between environments
Python ★ 198 2y agoExplain → -
automl-toolkit ▣
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
HTML ★ 191 5y agoExplain → -
ontos
A Business Catalog for Unity Catalog
Python ★ 186 17m agoExplain → -
ontobricks
OntoBricks turns Databricks tables into a materialized knowledge graph with ontology design, R2RML mapping, reasoning, and auto-generated GraphQL APIs.
Python ★ 157 57m agoExplain → -
lakebridge
Accelerates migrations to Databricks by automating key migration activities
Python ★ 148 17h agoExplain → -
dataframe-rules-engine
Extensible Rules Engine for custom Dataframe / Dataset validation
Scala ★ 141 2y agoExplain → -
discoverx
A Swiss-Army-knife for your Data Intelligence platform administration.
Python ★ 141 2mo agoExplain → -
pytester
Python Testing for Databricks
Python ★ 135 8d agoExplain → -
geoscan ▣
Geospatial clustering at massive scale
Scala ★ 111 2mo agoExplain → -
mcp
No description.
Python ★ 92 11mo agoExplain → -
brickster
R Toolkit for Databricks
R ★ 81 2d agoExplain → -
kasal
No description.
Python ★ 80 1h agoExplain → -
sandbox
Experimental labs projects
Python ★ 75 2d agoExplain → -
jupyterlab-integration ▣
DEPRECATED: Integrating Jupyter with Databricks via SSH
HTML ★ 70 4y agoExplain → -
blueprint
Baseline for Databricks Labs projects written in Python
Python ★ 69 17d agoExplain → -
feature-factory
Accelerator to rapidly deploy customized features for your business
Python ★ 57 2y agoExplain → -
lakeflow-community-connectors
No description.
Python ★ 53 2d agoExplain → -
doc-qa
No description.
Python ★ 52 2y agoExplain → -
databricks-sync
An experimental tool to synchronize source Databricks deployment with a target Databricks deployment.
Python ★ 50 2y agoExplain → -
transpiler ▣
SIEM-to-Spark Transpiler
Scala ★ 44 2mo agoExplain → -
delta-oms
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/
Scala ★ 41 2y agoExplain → -
lsql
Lightweight SQL execution wrapper only on top of Databricks SDK
Python ★ 36 2d agoExplain → -
coding-agents-databricks-apps
Run coding agents on Databricks Apps 🚀
Python ★ 33 6d agoExplain → -
pylint-plugin
Databricks Plugin for PyLint
Python ★ 33 2mo agoExplain → -
splunk-integration
Databricks Add-on for Splunk
Python ★ 29 5mo agoExplain → -
databricks-sdk-r
Databricks SDK for R (Experimental)
R ★ 24 2y agoExplain → -
tika-ocr
No description.
Rich Text Format ★ 22 1y agoExplain → -
arcuate ▣
Delta Sharing + MLflow for ML model & experiment exchange (arcuate delta - a fan shaped river delta)
Python ★ 22 4mo agoExplain → -
impulse
Large-scale time-series measurement data analytics on Apache Spark
Python ★ 17 1d agoExplain → -
delta-sharing-java-connector
A Java connector for delta.io/sharing/ that allows you to easily ingest data on any JVM.
Java ★ 15 2y agoExplain → -
partner-connect-api
No description.
Scala ★ 13 1y agoExplain → -
geobrix
GeoBrix is a high-performance spatial processing library.
Jupyter Notebook ★ 12 2d agoExplain → -
chatx
No description.
Python ★ 12 9mo agoExplain → -
waterbear ▣
Automated provisioning of an industry Lakehouse with enterprise data model
Python ★ 9 2mo agoExplain → -
access-insights
No description.
Python ★ 7 6mo agoExplain → -
firefly
No description.
TypeScript ★ 5 1mo agoExplain → -
meta-conversions-api-app
A companion Databricks App for the Meta Conversions API marketplace listing. Provides a guided setup experience for connecting your Databricks lakehouse to Meta's Conversions API (CAPI).
TypeScript ★ 0 13d agoExplain →
No repos match these filters.