Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle dbt-spark incremental on_schema_change behaviour #780

Open
sp-cveeragandham opened this issue Aug 28, 2024 · 2 comments
Open

Handle dbt-spark incremental on_schema_change behaviour #780

sp-cveeragandham opened this issue Aug 28, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@sp-cveeragandham
Copy link

sp-cveeragandham commented Aug 28, 2024

Describe the bug

We are using merge incremental strategy in dbt and have tried the on_schema_change = "sync_all_columns" in order to handle any additions or deletions of columns.
However, due to an exception created in spark__alter_relation_add_remove_columns macro in dbt-spark, it simply throws this error whenever a column deletion is detected - Delta lake does not support dropping columns from tables although delta lake supports column deletions provided the table has required table properties.
In order to work around this problem, we have created an override macro that does the following,

  1. Checks if the required tblproperties are set. If not, it sets them.
  2. Creates an alter query to drop the column(s) and runs the query of column deletion is detected.

Creating this bug request to handle this error and perhaps create an override macro in dbt-databricks incrementals.

Steps To Reproduce

  1. Create a dbt incremental model with at least 2 columns. And set the dbt incremental config - on_schema_change = "sync_all_columns"
{{
    config(
        materialized="incremental",
        unique_key="id",
        on_schema_change="sync_all_columns",
    )
}}
with
    sample_data as (
        select 1 as id, 'name1' as name
        union
        select 2 as id, 'name2' as name
    )
select *
from sample_data
  1. Run the dbt model to materialize it in Databricks.
  2. Drop a column and run the model again.
{{
    config(
        materialized="incremental",
        unique_key="id",
        on_schema_change="sync_all_columns",
    )
}}
with
    sample_data as (
        select 1 as id
        union
        select 2 as id
    )
select *
from sample_data

Expected behavior

When a column is deleted in an incremental model, we expect that the column is dropped in the target incremental model.
Implement a macro in dbt-databricks incrementals that overrides the default behaviour of spark__alter_relation_add_remove_columns in dbt-spark repo.

  1. Remove the exception for Delta Lake (Line number 406 in the above repo)
  2. Set required tblproperties. (delta.minReaderVersion: 2, delta.minWriterVersion: 5 and delta.columnMapping.mode: name)
  3. Add an alter statement for removing columns

Screenshots and log output

Error when trying to drop a column.
Compilation Error in model test_on_schema_change (models/stage/br/test_on_schema_change.sql)
Delta Lake does not support dropping columns from tables

in macro spark__alter_relation_add_remove_columns (macros/adapters.sql)
called by macro alter_relation_add_remove_columns (macros/adapters/columns.sql)
called by macro sync_column_schemas (macros/materializations/models/incremental/on_schema_change.sql)
called by macro process_schema_changes (macros/materializations/models/incremental/on_schema_change.sql)
called by macro materialization_incremental_databricks (macros/materializations/incremental/incremental.sql)
called by model test_on_schema_change (models/stage/br/test_on_schema_change.sql)

System information

Core:

  • installed: 1.8.5
  • latest: 1.8.5 - Up to date!

Plugins:

  • databricks: 1.8.5 - Up to date!
  • spark: 1.8.0 - Up to date!

Additional context

I found a PR to fix this in dbt-spark. Not sure what the lead time is to get this updated in dbt-spark repo.

@sp-cveeragandham sp-cveeragandham added the bug Something isn't working label Aug 28, 2024
@sp-cveeragandham sp-cveeragandham changed the title Override dbt-spark on_schema_change behaviour Handle dbt-spark incremental on_schema_change behaviour Aug 28, 2024
@AlexVialaBellander
Copy link

Is this still a thing?

@benc-db
Copy link
Collaborator

benc-db commented Sep 12, 2024

Can you submit a PR for your macro override? I was unaware this limitation has been lifted. I wonder which Databricks runtime version it's compatible with.

@github-staff github-staff deleted a comment from robykartis Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants