Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] When Read Excel Files, Several Errors Using Java #837

Open
1 task done
yumble opened this issue Feb 28, 2024 · 2 comments
Open
1 task done

[BUG] When Read Excel Files, Several Errors Using Java #837

yumble opened this issue Feb 28, 2024 · 2 comments

Comments

@yumble
Copy link

yumble commented Feb 28, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Some problems are occurring when reading Excel files with Spark, Java.

I'm currently making a service,

This service cannot specify the schema of the file. Because users upload their files, the file format is random.

Excel file capture photos
image

photos of formats that should be displayed
image

photos that are currently experiencing the problem.
image

The problems of the current Excel file are as follows.

  1. [] The first column does not apply the date format (it should be displayed like the second picture, but it is not recognized) -> i want to display "yyyy-MM-dd'T'HH:mm:ss.SSSSZ" format.

  2. [] Despite the same cell format for the second and third columns, the second column appears as a string with "₩ ", and the third column has a Scientific notation format

But I want both to be expressed in numbers. ₩ 100,000 -> 100000 ( i want to display this format)

=> In the second column, the numbers are values, and the form adds monetary units and spaces Like 3000000 -> ₩ 3,000,000
In the third column, the plus value of Excel cells is the default value. Like =C3+(C3*0.35)

  1. [] Columns without headers should also be displayed if there is data, but columns without headers are currently ignored.
String workSheet = String.format("'%s'!A1", excel.getWorkSheet());

        Dataset<Row> df = sparkSession.read()
                .format("com.crealytics.spark.excel")
                .option("dataAddress", workSheet)
                .option("header", excel.isDefaultHeader())
                .option("maxColumns", 1000) //todo GUARDRAILS
                .option("columnNameOfCorruptRecord", "true")
                .option("columnNameOfRowNumber", "true")
                .option("inferSchema", "false")
                .option("enforceSchema", "false")
                .option("dateFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSSZ")
                .option("timestampFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSSZ")
                .load(paths.left);
        df.show();
        df.schema();

I've tried changing the values of the options written here, but it's still the same situation.

Please let me know if you know one problem..

Expected Behavior

  1. [] i want to display "yyyy-MM-dd'T'HH:mm:ss.SSSSZ" format.

  2. [] I want to display plain number without cell formats(styles), without Scientific notation.

  3. [] Columns without headers should also be displayed if there is data, but columns without headers are currently ignored.

Steps To Reproduce

error.xlsx

Environment

- Spark version:

implementation group: 'org.apache.spark', name: 'spark-yarn_2.12', version: '3.5.0'
implementation group: 'org.apache.spark', name: 'spark-core_2.12', version: '3.5.0'
    implementation group: 'org.apache.spark', name: 'spark-sql_2.12', version: '3.5.0'

- Spark-Excel version:

implementation group: 'com.crealytics', name: 'spark-excel_2.12', version: '0.14.0'

- OS: Spring boot 2.7.6

- Cluster environment

Anything else?

No response

Copy link

Please check these potential duplicates:

Repository owner deleted a comment from github-actions bot Feb 28, 2024
Repository owner deleted a comment from github-actions bot Feb 28, 2024
Repository owner deleted a comment from github-actions bot Feb 28, 2024
Repository owner deleted a comment from github-actions bot Feb 28, 2024
Repository owner deleted a comment from github-actions bot Feb 28, 2024
Repository owner deleted a comment from github-actions bot Feb 28, 2024
Repository owner deleted a comment from github-actions bot Feb 28, 2024
Repository owner deleted a comment from github-actions bot Feb 28, 2024
@nightscape
Copy link
Owner

Please always use the newest version when reporting bugs. Some things might already have been fixed in the mean time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants