Skip to content

Commit

Permalink
Add Support to Export to DBML (datacontract#215)
Browse files Browse the repository at this point in the history
* Add Support to Export to DBML

- Create basic project info from what the datacontract has
- Map all the Models into tables, _not_ taking into regard nested fields (as there is no way to express them anyway)
- Also create references, when they are given for a field, so the connections between tables become visible

* Adapt CHANGELOG and README

* Add generated info to make clear this is a generated file

* Add support to convert to  a specific servers data types

- support selecting a server using --server
- then the data types will be converted to the selected servers specific types
  • Loading branch information
jpraetorius authored May 22, 2024
1 parent 26c9acf commit 2908e2f
Show file tree
Hide file tree
Showing 7 changed files with 512 additions and 26 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- `datacontract export --format dbml`: Export to [Database Markup Language (DBML)](https://dbml.dbdiagram.io/home/) (#135)

## [0.10.4] - 2024-05-17

### Added

- `datacibtract catalog` Search
- `datacontract catalog` Search
- `datacontract publish`: Publish the data contract to the Data Mesh Manager
- `datacontract import --format bigquery`: Import from BigQuery format (#110)
- `datacontract export --format bigquery`: Export to BigQuery format (#111)
Expand Down
60 changes: 35 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -557,31 +557,33 @@ models:

Convert data contract to a specific format. Prints to stdout or to the specified output file.

╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ location [LOCATION] The location (url or path) of the data contract yaml. [default: datacontract.yaml]
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * --format [jsonschema|pydantic-model|sodacl|dbt|dbt-sources|dbt-staging The export format. [default: None] [required]
│ -sql|odcs|rdf|avro|protobuf|great-expectations|terraform|avro │
│ -idl|sql|sql-query|html|bigquery|go]
│ --output PATH Specify the file path where the exported data will be saved. │
│ If no path is provided, the output will be printed to stdout. │
[default: None]
│ --server TEXT The server name to export. [default: None]
│ --model TEXT Use the key of the model in the data contract yaml file to │
│ refer to a model, e.g., `orders`, or `all` for all models │
│ (default). │
[default: all]
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ RDF Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --rdf-base TEXT [rdf] The base URI used to generate the RDF graph. [default: None]
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ SQL Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --sql-server-type TEXT [sql] The server type to determine the sql dialect. By default, it uses 'auto' to automatically detect the sql │
│ dialect via the specified servers in the data contract. │
[default: auto]
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ location [LOCATION] The location (url or path) of the data contract yaml. [default: datacontract.yaml]
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * --format [jsonschema|pydantic-model|sodacl|dbt|dbt-sources|db The export format. [default: None] [required]
│ t-staging-sql|odcs|rdf|avro|protobuf|great-expectati │
│ ons|terraform|avro-idl|sql|sql-query|html|go|bigquer │
│ y|dbml]
│ --output PATH Specify the file path where the exported data will be │
│ saved. If no path is provided, the output will be │
│ printed to stdout. │
[default: None]
│ --server TEXT The server name to export. [default: None]
│ --model TEXT Use the key of the model in the data contract yaml │
│ file to refer to a model, e.g., `orders`, or `all`
│ for all models (default). │
[default: all]
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ RDF Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --rdf-base TEXT [rdf] The base URI used to generate the RDF graph. [default: None]
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ SQL Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --sql-server-type TEXT [sql] The server type to determine the sql dialect. By default, it uses 'auto' to automatically │
│ detect the sql dialect via the specified servers in the data contract. │
[default: auto]
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

```
Expand Down Expand Up @@ -611,6 +613,7 @@ Available export options:
| `bigquery` | Export to BigQuery Schemas ||
| `go` | Export to Go types ||
| `pydantic-model` | Export to pydantic models ||
| `DBML` | Export to a DBML Diagram description ||
| Missing something? | Please create an issue on GitHub | TBD |

#### Great Expectations
Expand Down Expand Up @@ -651,6 +654,13 @@ Having the data contract inside an RDF Graph gives us access the following use c
- Apply graph algorithms on multiple data contracts (Find similar data contracts, find "gatekeeper"
data products, find the true domain owner of a field attribute)

#### DBML

The export function converts the logical data types of the datacontract into the specific ones of a concrete Database
if a server is selected via the `--server` option (based on the `type` of that server). If no server is selected, the
logical data types are exported.


### import

```
Expand Down
1 change: 1 addition & 0 deletions datacontract/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ class ExportFormat(str, Enum):
html = "html"
go = "go"
bigquery = "bigquery"
dbml = "dbml"


@app.command()
Expand Down
4 changes: 4 additions & 0 deletions datacontract/data_contract.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from datacontract.export.avro_converter import to_avro_schema_json
from datacontract.export.avro_idl_converter import to_avro_idl
from datacontract.export.bigquery_converter import to_bigquery_json
from datacontract.export.dbml_converter import to_dbml_diagram
from datacontract.export.dbt_converter import to_dbt_models_yaml, \
to_dbt_sources_yaml, to_dbt_staging_sql
from datacontract.export.great_expectations_converter import \
Expand Down Expand Up @@ -334,6 +335,9 @@ def export(self, export_format, model: str = "all", rdf_base: str = None, sql_se
if found_server.type != 'bigquery':
raise RuntimeError(f"Export to {export_format} requires selecting a bigquery server from the data contract.")
return to_bigquery_json(model_name, model_value, found_server)
if export_format == "dbml":
found_server = data_contract.servers.get(self._server)
return to_dbml_diagram(data_contract, found_server)
else:
print(f"Export format {export_format} not supported.")
return ""
Expand Down
111 changes: 111 additions & 0 deletions datacontract/export/dbml_converter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
from datetime import datetime
from importlib.metadata import version
import pytz
from datacontract.export.sql_type_converter import convert_to_sql_type
import datacontract.model.data_contract_specification as spec
from typing import Tuple


def to_dbml_diagram(contract: spec.DataContractSpecification, server: spec.Server) -> str:

result = ''
result += add_generated_info(contract, server) + "\n"
result += generate_project_info(contract) + "\n"

for model_name, model in contract.models.items():
table_description = generate_table(model_name, model, server)
result += f"\n{table_description}\n"

return result

def add_generated_info(contract: spec.DataContractSpecification, server: spec.Server) -> str:
tz = pytz.timezone("UTC")
now = datetime.now(tz)
formatted_date = now.strftime("%b %d %Y")
datacontract_cli_version = get_version()
dialect = 'Logical Datacontract' if server is None else server.type

generated_info = """
Generated at {0} by datacontract-cli version {1}
for datacontract {2} ({3}) version {4}
Using {5} Types for the field types
""".format(formatted_date, datacontract_cli_version, contract.info.title, contract.id, contract.info.version, dialect)

comment = """/*
{0}
*/
""".format(generated_info)

note = """Note project_info {{
'''
{0}
'''
}}
""".format(generated_info)

return """{0}
{1}
""".format(comment, note)

def get_version() -> str:
try:
return version("datacontract_cli")
except Exception:
return ""

def generate_project_info(contract: spec.DataContractSpecification) -> str:
return """Project "{0}" {{
Note: "{1}"
}}\n
""".format(contract.info.title, ' '.join(contract.info.description.splitlines()))

def generate_table(model_name: str, model: spec.Model, server: spec.Server) -> str:
result = """Table "{0}" {{
Note: "{1}"
""".format(model_name, ' '.join(model.description.splitlines()))

references = []

# Add all the fields
for field_name, field in model.fields.items():
ref, field_string = generate_field(field_name, field, model_name, server)
if ref is not None:
references.append(ref)
result += "{0}\n".format(field_string)

result += "}\n"

# and if any: add the references
if len(references) > 0:
for ref in references:
result += "Ref: {0}\n".format(ref)

result += "\n"

return result

def generate_field(field_name: str, field: spec.Field, model_name: str, server: spec.Server) -> Tuple[str, str]:

field_attrs = []
if field.primary:
field_attrs.append('pk')

if field.unique:
field_attrs.append('unique')

if field.required:
field_attrs.append('not null')
else:
field_attrs.append('null')

if field.description:
field_attrs.append('Note: "{0}"'.format(' '.join(field.description.splitlines())))

field_type = field.type if server is None else convert_to_sql_type(field, server.type)

field_str = '"{0}" "{1}" [{2}]'.format(field_name, field_type, ','.join(field_attrs))
ref_str = None
if (field.references) is not None:
# we always assume many to one, as datacontract doesn't really give us more info
ref_str = "{0}.{1} > {2}".format(model_name, field_name, field.references)
return (ref_str, field_str)
Loading

0 comments on commit 2908e2f

Please sign in to comment.