[196] Add support for MSSQL (datacontract#204)

* [196] Add support for MSSQL * [196] Correct soda config serializer * [196] Move requirements for MSSql into dev reqs * [196] Correct usage of regex in datacontract to be pattern instead * [196] Correct pyproject toml * [196] Formalize support for SQLServer - Add required packages - Add required connection details - Semi complete tests * feat(export/jsonschema): supports array type (datacontract#200) * Support logical Types in Avro Export (datacontract#199) * Support logical Types in Avro Export - Map Datacontracts date-type fields to avro logical types - date: `int/date` - timestamp, timestamp_tz: `long/timestamp-millis` - timestamp_ntz: `long/local-timestamp-millis` * Update CHANGELOG * Add ability to export to go types (datacontract#195) * Add ability to export to go types * add test * rename to types * updated naming * update docs * Update boto3 requirement from <1.34.99,>=1.34.41 to >=1.34.41,<1.34.104 (datacontract#189) Updates the requirements on [boto3](https://github.com/boto/boto3) to permit the latest version. - [Release notes](https://github.com/boto/boto3/releases) - [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst) - [Commits](boto/boto3@1.34.41...1.34.103) --- updated-dependencies: - dependency-name: boto3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update botocore requirement (datacontract#190) Updates the requirements on [botocore](https://github.com/boto/botocore) to permit the latest version. - [Changelog](https://github.com/boto/botocore/blob/develop/CHANGELOG.rst) - [Commits](boto/botocore@1.34.41...1.34.103) --- updated-dependencies: - dependency-name: botocore dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jochenchrist <[email protected]> * Update snowflake-connector-python[pandas] requirement (datacontract#172) Updates the requirements on [snowflake-connector-python[pandas]](https://github.com/snowflakedb/snowflake-connector-python) to permit the latest version. - [Release notes](https://github.com/snowflakedb/snowflake-connector-python/releases) - [Commits](snowflakedb/snowflake-connector-python@v3.6.0...v3.10.0) --- updated-dependencies: - dependency-name: snowflake-connector-python[pandas] dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * 91 JSON Schema (datacontract#201) * add import for jsonschemas and extend export * fix tests * remove unused import * Update changelog * First support for server-specific types in a config map. Resolves datacontract#150 * Issues/193 fix all todos in html export (datacontract#203) * Make Examples in Fields work - we have to have them declared in the model to make them show up at all :) * Add Definitions in HTML Export - add examples to the model, so we can render them - create new tables akin to what we do for the models * Add examples * Handle nested fields in HTML Export We just go one level deep but add an additional set of rows for fields contained in a models field. * Update CHANGELOG * Update Tests for breaking and changelog Now that we include the `example` property in the field there's more things being pointed out, so adjust the tests accordingly * Handle Model Fields and their nesting through partials - added jinja partials as dependency - extracted the model and the nesting handling out to its own partial * Update definitions - move them to their opwn partial - move enum into the content column - try to highlight the different optional aspects a tad * Move some more blocks into partials * Add partials to manifest * Removew the nested headline --------- Co-authored-by: jochen <[email protected]> [196] Formalize support for SQLServer - Add required packages - Add required connection details - Semi complete tests [196] Add SQLServer type serializer * [196] Add msodbcsql18 to docker file * [196] Apply ruff formatting * @simonharrer PR suggestion to make naming more consistent * [196] Add changes to changelog * [196] Update readme with new SQLServer information * [196] Add CI/CD step to install msodbcsql driver * [196] Skip test if outside of CI/CD environment * [196] Add msqsql package back --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Robert DeRienzo <[email protected]> Co-authored-by: jochen <[email protected]> Co-authored-by: JAEJIN LEE <[email protected]> Co-authored-by: Joachim Praetorius <[email protected]> Co-authored-by: Mark Olliver <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jochenchrist <[email protected]> Co-authored-by: Mark Olliver <[email protected]> Co-authored-by: Simon Harrer <[email protected]>
marcggl · May 24, 2024 · 702ba50 · 702ba50
1 parent 479c72e
commit 702ba50
Show file tree

Hide file tree

Showing 17 changed files with 322 additions and 58 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -28,6 +28,10 @@ jobs:
         uses: actions/setup-python@v5
         with:
           python-version: ${{matrix.python-version}}
+      - name: Install msodbcsql18
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y msodbcsql18
       - name: Install dependencies
         run: |
           python -m pip install --upgrade pip

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+ - Added support for `sqlserver` (#196)
 
 - `datacontract export --format dbml`: Export to [Database Markup Language (DBML)](https://dbml.dbdiagram.io/home/) (#135)
 
@@ -43,7 +44,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Added support for `delta` tables on S3 (#24)
 - Added new command `datacontract catalog` that generates a data contract catalog with an `index.html` file.
 - Added field format information to HTML export
- 
+
 ### Fixed
 - RDF Export: Fix error if owner is not a URI/URN
 
@@ -70,13 +71,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 - Added export format **html** (#15)
 - Added descriptions as comments to `datacontract export --format sql` for Databricks dialects
-- Added import of arrays in Avro import 
+- Added import of arrays in Avro import
 
 ## [0.9.8] - 2024-04-01
 
 ### Added
- 
-- Added export format **great-expectations**: `datacontract export --format great-expectations`  
+
+- Added export format **great-expectations**: `datacontract export --format great-expectations`
 - Added gRPC support to OpenTelemetry integration for publishing test results
 - Added AVRO import support for namespace (#121)
 - Added handling for optional fields in avro import (#112)
@@ -158,7 +159,7 @@ We start with JSON messages and avro, and Protobuf will follow.
 ## [0.9.0] - 2024-01-26 - BREAKING
 
 This is a breaking change (we are still on a 0.x.x version).
-The project migrated from Golang to Python. 
+The project migrated from Golang to Python.
 The Golang version can be found at [cli-go](https://github.com/datacontract/cli-go)
 
 ### Added

diff --git a/Dockerfile b/Dockerfile
@@ -22,7 +22,7 @@ RUN python -c "import duckdb; duckdb.connect().sql(\"INSTALL httpfs\");"
 
 FROM ubuntu:22.04 AS runner-image
 
-RUN apt-get update && apt-get install --no-install-recommends -y python3.11 python3.11-venv && \
+RUN apt-get update && apt-get install --no-install-recommends -y  python3.11 python3.11-venv msodbcsql18 && \
    apt-get clean && rm -rf /var/lib/apt/lists/*
 
 COPY --from=builder-image /opt/venv /opt/venv

diff --git a/README.md b/README.md
@@ -19,7 +19,7 @@ It uses data contract YAML files to lint the data contract, connect to data sour
 
 ## Getting started
 
-Let's look at this data contract:  
+Let's look at this data contract:
 [https://datacontract.com/examples/orders-latest/datacontract.yaml](https://datacontract.com/examples/orders-latest/datacontract.yaml)
 
 We have a _servers_ section with endpoint details to the S3 bucket, _models_ for the structure of the data, _servicelevels_ and _quality_ attributes that describe the expected freshness and number of rows.
@@ -191,11 +191,11 @@ Commands
 
 ### init
 
-```                                                                                                
- Usage: datacontract init [OPTIONS] [LOCATION]                                                  
-                                                                                                
- Download a datacontract.yaml template and write it to file.                                    
-                                                                                                
+```
+ Usage: datacontract init [OPTIONS] [LOCATION]
+
+ Download a datacontract.yaml template and write it to file.
+
 ╭─ Arguments ──────────────────────────────────────────────────────────────────────────────────╮
 │   location      [LOCATION]  The location (url or path) of the data contract yaml to create. │
 │                             [default: datacontract.yaml]                                    │
@@ -213,10 +213,10 @@ Commands
 ### lint
 
 ```
- Usage: datacontract lint [OPTIONS] [LOCATION]                                                                                     
-                                                                                                                                   
- Validate that the datacontract.yaml is correctly formatted.                                                                       
-                                                                                                                                   
+ Usage: datacontract lint [OPTIONS] [LOCATION]
+
+ Validate that the datacontract.yaml is correctly formatted.
+
 ╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │   location      [LOCATION]  The location (url or path) of the data contract yaml. [default: datacontract.yaml]                 │
 ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
@@ -230,10 +230,10 @@ Commands
 ### test
 
 ```
- Usage: datacontract test [OPTIONS] [LOCATION]                                                                                     
-                                                                                                                                   
- Run schema and quality tests on configured servers.                                                                               
-                                                                                                                                   
+ Usage: datacontract test [OPTIONS] [LOCATION]
+
+ Run schema and quality tests on configured servers.
+
 ╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │   location      [LOCATION]  The location (url or path) of the data contract yaml. [default: datacontract.yaml]                 │
 ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
@@ -266,11 +266,11 @@ Data Contract CLI connects to a data source and runs schema and quality tests to
 $ datacontract test --server production datacontract.yaml
 ```
 
-To connect to the databases the `server` block in the datacontract.yaml is used to set up the connection. 
+To connect to the databases the `server` block in the datacontract.yaml is used to set up the connection.
 In addition, credentials, such as username and passwords, may be defined with environment variables.
 
 The application uses different engines, based on the server `type`.
-Internally, it connects with DuckDB, Spark, or a native connection and executes the most tests with _soda-core_ and _fastjsonschema_. 
+Internally, it connects with DuckDB, Spark, or a native connection and executes the most tests with _soda-core_ and _fastjsonschema_.
 
 Credentials are provided with environment variables.
 
@@ -456,7 +456,7 @@ dbutils.library.restartPython()
 from datacontract.data_contract import DataContract
 
 data_contract = DataContract(
-  data_contract_file="/Volumes/acme_catalog_prod/orders_latest/datacontract/datacontract.yaml", 
+  data_contract_file="/Volumes/acme_catalog_prod/orders_latest/datacontract/datacontract.yaml",
   spark=spark)
 run = data_contract.test()
 run.result
@@ -481,7 +481,7 @@ servers:
 models:
   my_table_1: # corresponds to a table
     type: table
-    fields: 
+    fields:
       my_column_1: # corresponds to a column
         type: varchar
 ```
@@ -539,7 +539,7 @@ servers:
 models:
   my_table_1: # corresponds to a table
     type: table
-    fields: 
+    fields:
       my_column_1: # corresponds to a column
         type: varchar
 ```
@@ -553,9 +553,47 @@ models:
 
 
 
+
+### Postgres
+
+Data Contract CLI can test data in Postgres or Postgres-compliant databases (e.g., RisingWave).
+
+#### Example
+
+datacontract.yaml
+```yaml
+servers:
+  postgres:
+    type: sqlserver
+    host: localhost
+    port: 5432
+    database: tempdb
+    schema: dbo
+    driver: ODBC Driver 18 for SQL Server
+models:
+  my_table_1: # corresponds to a table
+    type: table
+    fields:
+      my_column_1: # corresponds to a column
+        type: varchar
+```
+
+#### Environment Variables
+
+| Environment Variable             | Example            | Description |
+|----------------------------------|--------------------|-------------|
+| `DATACONTRACT_SQLSERVER_USERNAME` | `root`         | Username    |
+| `DATACONTRACT_SQLSERVER_PASSWORD` | `toor` | Password    |
+| `DATACONTRACT_SQLSERVER_TRUSTED_CONNECTION` | `True` | Use windows authentication, instead of login    |
+| `DATACONTRACT_SQLSERVER_TRUST_SERVER_CERTIFICATE` | `True` | Trust self-signed certificate    |
+| `DATACONTRACT_SQLSERVER_ENCRYPTED_CONNECTION` | `True` | Use SSL    |
+
+
+
 ### export
 
 ```
+
  Usage: datacontract export [OPTIONS] [LOCATION]                                                                                                                           
 
  Convert data contract to a specific format. Prints to stdout or to the specified output file.                                                                                                     
@@ -599,9 +637,9 @@ Available export options:
 
 | Type                 | Description                                             | Status |
 |----------------------|---------------------------------------------------------|--------|
-| `html`               | Export to HTML                                          | ✅     | 
-| `jsonschema`         | Export to JSON Schema                                   | ✅     | 
-| `odcs`               | Export to Open Data Contract Standard (ODCS)            | ✅     | 
+| `html`               | Export to HTML                                          | ✅     |
+| `jsonschema`         | Export to JSON Schema                                   | ✅     |
+| `odcs`               | Export to Open Data Contract Standard (ODCS)            | ✅     |
 | `sodacl`             | Export to SodaCL quality checks in YAML format          | ✅     |
 | `dbt`                | Export to dbt models in YAML format                     | ✅     |
 | `dbt-sources`        | Export to dbt sources in YAML format                    | ✅     |
@@ -621,11 +659,11 @@ Available export options:
 
 #### Great Expectations
 
-The export function transforms a specified data contract into a comprehensive Great Expectations JSON suite. 
+The export function transforms a specified data contract into a comprehensive Great Expectations JSON suite.
 If the contract includes multiple models, you need to specify the names of the model you wish to export.
 
 ```shell
-datacontract  export datacontract.yaml --format great-expectations --model orders 
+datacontract  export datacontract.yaml --format great-expectations --model orders
 ```
 
 The export creates a list of expectations by utilizing:
@@ -635,7 +673,7 @@ The export creates a list of expectations by utilizing:
 
 #### RDF
 
-The export function converts a given data contract into a RDF representation. You have the option to 
+The export function converts a given data contract into a RDF representation. You have the option to
 add a base_url which will be used as the default prefix to resolve relative IRIs inside the document.
 
 ```shell
@@ -688,7 +726,7 @@ In this case there's no need to specify `source` but instead `bt-project-id`, `b
 
 For providing authentication to the Client, please see [the google documentation](https://cloud.google.com/docs/authentication/provide-credentials-adc#how-to) or the one [about authorizing client libraries](https://cloud.google.com/bigquery/docs/authentication#client-libs).
 
-Example: 
+Example:
 ```bash
 # Example import from SQL DDL
 datacontract import --format sql --source my_ddl.sql
@@ -722,10 +760,10 @@ Available import options:
 ### breaking
 
 ```
- Usage: datacontract breaking [OPTIONS] LOCATION_OLD LOCATION_NEW                                                            
-                                                                                                                             
- Identifies breaking changes between data contracts. Prints to stdout.                                                       
-                                                                                                                             
+ Usage: datacontract breaking [OPTIONS] LOCATION_OLD LOCATION_NEW
+
+ Identifies breaking changes between data contracts. Prints to stdout.
+
 ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ *    location_old      TEXT  The location (url or path) of the old data contract yaml. [default: None] [required]        │
 │ *    location_new      TEXT  The location (url or path) of the new data contract yaml. [default: None] [required]        │
@@ -738,10 +776,10 @@ Available import options:
 ### changelog
 
 ```
- Usage: datacontract changelog [OPTIONS] LOCATION_OLD LOCATION_NEW                                                           
-                                                                                                                             
- Generate a changelog between data contracts. Prints to stdout.                                                              
-                                                                                                                             
+ Usage: datacontract changelog [OPTIONS] LOCATION_OLD LOCATION_NEW
+
+ Generate a changelog between data contracts. Prints to stdout.
+
 ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ *    location_old      TEXT  The location (url or path) of the old data contract yaml. [default: None] [required]        │
 │ *    location_new      TEXT  The location (url or path) of the new data contract yaml. [default: None] [required]        │
@@ -754,10 +792,10 @@ Available import options:
 ### diff
 
 ```
- Usage: datacontract diff [OPTIONS] LOCATION_OLD LOCATION_NEW                                                                
-                                                                                                                             
- PLACEHOLDER. Currently works as 'changelog' does.                                                                           
-                                                                                                                             
+ Usage: datacontract diff [OPTIONS] LOCATION_OLD LOCATION_NEW
+
+ PLACEHOLDER. Currently works as 'changelog' does.
+
 ╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ *    location_old      TEXT  The location (url or path) of the old data contract yaml. [default: None] [required]        │
 │ *    location_new      TEXT  The location (url or path) of the new data contract yaml. [default: None] [required]        │
@@ -889,14 +927,14 @@ Create a data contract based on the requirements from use cases.
    ```bash
    $ datacontract init
    ```
-   
+
 2. Add examples to the `datacontract.yaml`. Do not start with the data model, although you are probably tempted to do that. Examples are the fastest way to get feedback from everybody and not loose someone in the discussion.
 
 3. Create the model based on the examples. Test the model against the examples to double-check whether the model matches the examples.
     ```bash
     $ datacontract test --examples
     ```
-   
+
 4. Add quality checks and additional type constraints one by one to the contract and make sure the examples and the actual data still adheres to the contract. Check against examples for a very fast feedback loop.
     ```bash
     $ datacontract test --examples

diff --git a/datacontract/data_contract.py b/datacontract/data_contract.py
@@ -391,7 +391,7 @@ def _get_examples_server(self, data_contract, run, tmp_dir):
         )
         run.log_info(f"Using {server} for testing the examples")
         return server
-    
+
     def _check_models_for_export(self, data_contract: DataContractSpecification, model: str, export_format: str) -> typing.Tuple[str, str]:
         if data_contract.models is None:
             raise RuntimeError(f"Export to {export_format} requires models in the data contract.")
@@ -412,7 +412,7 @@ def _check_models_for_export(self, data_contract: DataContractSpecification, mod
                 raise RuntimeError(
                     f"Model {model_name} not found in the data contract. Available models: {model_names}"
                 )
-            
+
         return model_name, model_value
 
     def import_from_source(self, format: str, source: typing.Optional[str] = None, bigquery_tables: typing.Optional[typing.List[str]] = None, bigquery_project: typing.Optional[str] = None, bigquery_dataset: typing.Optional[str] = None) -> DataContractSpecification:

diff --git a/datacontract/engines/soda/check_soda_execute.py b/datacontract/engines/soda/check_soda_execute.py
@@ -9,6 +9,7 @@
 from datacontract.engines.soda.connections.kafka import create_spark_session, read_kafka_topic
 from datacontract.engines.soda.connections.postgres import to_postgres_soda_configuration
 from datacontract.engines.soda.connections.snowflake import to_snowflake_soda_configuration
+from datacontract.engines.soda.connections.sqlserver import to_sqlserver_soda_configuration
 from datacontract.export.sodacl_converter import to_sodacl_yaml
 from datacontract.model.data_contract_specification import DataContractSpecification, Server
 from datacontract.model.run import Run, Check, Log
@@ -69,6 +70,10 @@ def check_soda_execute(
         read_kafka_topic(spark, data_contract, server, tmp_dir)
         scan.add_spark_session(spark, data_source_name=server.type)
         scan.set_data_source_name(server.type)
+    elif server.type == "sqlserver":
+        soda_configuration_str = to_sqlserver_soda_configuration(server)
+        scan.add_configuration_yaml_str(soda_configuration_str)
+        scan.set_data_source_name(server.type)
 
     else:
         run.checks.append(