[BUG][Spark] delta-spark allows reading column mapping when missing from table features #3890

zachschuermann · 2024-11-19T18:07:09Z

Bug

Which Delta project/connector is this regarding?

Describe the problem

TLDR you can relatively easily create a table which (according to the protocol) shouldn't allow column mapping, but is read with column mapping in delta-spark.

I think there are two pieces to this issue:

[bug] delta-spark uses column mapping to read a table without column mapping in table reader features
[api sharp edge?] delta's upgradeTableProtocol will upgrade from reader version 2 to reader version 3 without adding any table features. This is a problem since it effectively silently turns of column mapping. (since it is enabled/supported in reader version 2 but requires that the table feature be present when reader version is 3)

Steps to reproduce

See example below for code implementing these steps:

the table is created with reader version 2 and writer version 7 with "writerFeatures":["columnMapping","icebergCompatV1"] and delta.columnMapping.mode = name
then upgradeTableProtocol(3, 7) gives reader version 3 with no reader features. this effectively turns off column mapping.
when reading the table it looks like it is read with columnMapping = name

# using pyspark
df = get_sample_data(spark)
delta_path = str(Path(case.delta_root).absolute())
# table at version 0
delta_table: DeltaTable = (
    DeltaTable.create(spark)
    .location(delta_path)
    .addColumns(df.schema)
    .property("delta.enableIcebergCompatV1", "true")
    .execute()
)
delta_table.upgradeTableProtocol(3, 7)
df.repartition(1).write.format("delta").mode("append").save(case.delta_root)

Observed results

Read with column mapping

Expected results

Should not be read with column mapping

Further details

Environment information

Delta Lake version: 3.2.1
Spark version: 3.5?
Scala version:

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

Yes. I can contribute a fix for this bug independently.
Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
No. I cannot contribute a bug fix at this time.

The text was updated successfully, but these errors were encountered:

zachschuermann added the bug Something isn't working label Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG][Spark] delta-spark allows reading column mapping when missing from table features #3890

[BUG][Spark] delta-spark allows reading column mapping when missing from table features #3890

zachschuermann commented Nov 19, 2024

[BUG][Spark] delta-spark allows reading column mapping when missing from table features #3890

[BUG][Spark] delta-spark allows reading column mapping when missing from table features #3890

Comments

zachschuermann commented Nov 19, 2024

Bug

Which Delta project/connector is this regarding?

Describe the problem

Steps to reproduce

Observed results

Expected results

Further details

Environment information

Willingness to contribute