You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An error occurred while calling o3150.showString.
: com.github.pjfanning.xlsx.exceptions.ParseException: Error reading XML stream
at com.github.pjfanning.xlsx.impl.StreamingRowIterator.getRow(StreamingRowIterator.java:126)
at com.github.pjfanning.xlsx.impl.StreamingRowIterator.hasNext(StreamingRowIterator.java:627)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
I also tried writing dataframe to delta table and received below error:
An error occurred while calling o3063.save.
: com.github.pjfanning.xlsx.exceptions.ParseException: Error reading XML stream
at com.github.pjfanning.xlsx.impl.StreamingRowIterator.getRow(StreamingRowIterator.java:126)
at com.github.pjfanning.xlsx.impl.StreamingRowIterator.hasNext(StreamingRowIterator.java:627)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
Excel has 11 sheets, I'm trying to read data from 1 sheet only that has 389,862 rows.
Expected Behavior
The resulting dataframe should display and write to delta table correctly
jmichaelsoliven
changed the title
[BUG] Cannot read/ write dataframe after loading file in Databricks Runtime 3.3.1 Spark
[BUG] Cannot read/ write dataframe after loading file in Databricks 12.1 Runtime 3.3.1 Spark
Mar 30, 2023
Is there an existing issue for this?
Current Behavior
When running pyspark code below after in Databricks 12.1 with 3.3.1 Spark runtime:
df = spark.read.format("com.crealytics.spark.excel")
.option("dataAddress", "'" + param_excel_sheet + "'!" + param_excel_row_start)
.option("header", False)
.option("treatEmptyValueAsNulls", True)
.option("maxRowsInMemory",20)
.option("inferSchema", "false")
.load(param_mountPoint + param_in_adls_raw_path + param_in_file_name)
df.show(truncate = False)
I received the following error:
An error occurred while calling o3150.showString.
: com.github.pjfanning.xlsx.exceptions.ParseException: Error reading XML stream
at com.github.pjfanning.xlsx.impl.StreamingRowIterator.getRow(StreamingRowIterator.java:126)
at com.github.pjfanning.xlsx.impl.StreamingRowIterator.hasNext(StreamingRowIterator.java:627)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
I also tried writing dataframe to delta table and received below error:
An error occurred while calling o3063.save.
: com.github.pjfanning.xlsx.exceptions.ParseException: Error reading XML stream
at com.github.pjfanning.xlsx.impl.StreamingRowIterator.getRow(StreamingRowIterator.java:126)
at com.github.pjfanning.xlsx.impl.StreamingRowIterator.hasNext(StreamingRowIterator.java:627)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
Excel has 11 sheets, I'm trying to read data from 1 sheet only that has 389,862 rows.
Expected Behavior
The resulting dataframe should display and write to delta table correctly
Steps To Reproduce
Set the ff parameters to your desire value:
param_excel_sheet = excel sheet ex: Sheet1
param_excel_row_start = row start ex: A2
param_mountPoint + param_in_adls_raw_path + param_in_file_name = folder path including filename
Then run below code.
df = spark.read.format("com.crealytics.spark.excel")
.option("dataAddress", "'" + param_excel_sheet + "'!" + param_excel_row_start)
.option("header", False)
.option("treatEmptyValueAsNulls", True)
.option("maxRowsInMemory",20)
.option("inferSchema", "false")
.load(param_mountPoint + param_in_adls_raw_path + param_in_file_name)
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: