Skip to content

Commit

Permalink
Added example notebook doe scan using source data formats
Browse files Browse the repository at this point in the history
  • Loading branch information
souravg-db committed Dec 28, 2023
1 parent e0fa1ca commit 346d498
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Operations are applied concurrently across multiple tables
* **Custom**
* [Arbitrary SQL template execution across multiple tables](docs/Arbitrary_multi-table_SQL.md)
* Create Mlflow gateway routes for MosaicML and OpenAI ([example notebook](examples/mlflow_gateway_routes_examples.py))
* Scan using User Specified Data Source Formats ([example notebook](examples/scan_with_user_specified_data_source_formats.py))

## Getting started

Expand Down
56 changes: 56 additions & 0 deletions examples/scan_with_user_specified_data_source_formats.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Databricks notebook source
# MAGIC %md
# MAGIC # Scan Tables with User Specified Data Source Formats

# COMMAND ----------

# MAGIC %md
# MAGIC ### Declare Variables

# COMMAND ----------

dbutils.widgets.text("catalogs", "*", "Catalogs")
dbutils.widgets.text("schemas", "*", "Schemas")
dbutils.widgets.text("tables", "*", "Tables")

# COMMAND ----------

catalogs = dbutils.widgets.get("catalogs")
schemas = dbutils.widgets.get("schemas")
tables = dbutils.widgets.get("tables")
from_table_statement = ".".join([catalogs, schemas, tables])

# COMMAND ----------

# MAGIC %md
# MAGIC ### Initiaize discoverx

# COMMAND ----------

from discoverx import DX

dx = DX()

# COMMAND ----------

# MAGIC %md
# MAGIC ### DiscoverX will scan all delta tables by default

# COMMAND ----------

dx.from_tables(from_table_statement).scan()

# COMMAND ----------

# MAGIC %md
# MAGIC ### User can specify data source formats as follows

# COMMAND ----------

(dx.from_tables(from_table_statement)
.with_data_source_formats(["DELTA","JSON"])
.scan())

# COMMAND ----------


0 comments on commit 346d498

Please sign in to comment.