You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to load an Excel file (suppliers_testing.xlsx) in Permissive mode. This file has empty cells at the end of a row. The dataframe columns for these cells allow nulls.
When I attempt this, the rows are marked as invalid with a comment going into my nominated columnNameOfCorruptRecord column.
I looked into the underlying xml for suppliers_testing.xlsx and could see that there was no mention at all of the empty columns in the specific row.
Rename suppliers_testing.xlsx to suppliers_testing.zip
Double click on suppliers_testing.zip in windows explorer
Browse to xl\worksheets\sheet1.xml
Look for the second row (<row r="2") and observe the columns. There are no H2, I2 or J2 cells.
I edited the excel file and added a border around the empty cells. I can now see the previously missing cells in the underlying xml and the file does load using the code above.
Has anyone come across this issue before? Is there some setting I can use to programmatically deal with. I would prefer not to use the adding a border manual workaround.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I am trying to load an Excel file (suppliers_testing.xlsx) in Permissive mode. This file has empty cells at the end of a row. The dataframe columns for these cells allow nulls.
When I attempt this, the rows are marked as invalid with a comment going into my nominated columnNameOfCorruptRecord column.
The PySpark code I use is:
I looked into the underlying xml for suppliers_testing.xlsx and could see that there was no mention at all of the empty columns in the specific row.
I edited the excel file and added a border around the empty cells. I can now see the previously missing cells in the underlying xml and the file does load using the code above.
Has anyone come across this issue before? Is there some setting I can use to programmatically deal with. I would prefer not to use the adding a border manual workaround.
Versions:
Thanks,
Eoin.
suppliers_testing.xlsx
Beta Was this translation helpful? Give feedback.
All reactions