-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to read 250MB file even with 100G driver memory and 100G executor memory #732
Open
1 task done
Comments
Why not try .option("maxRowsInMemory", 20) // Optional, default None. If set, uses a streaming reader which can help with big files (will fail if used with xls format files) |
I tried with this option and got
|
Ok, so it fails during schema inference. Are you able to specify a schema manually? |
Oh, we have many different files so specifying a schema isn't possible right now. We also tried without inferring the schema and it failed with Stackoverflow exception. |
Did you try the combination of specifying a schema and using |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is there an existing issue for this?
Current Behavior
When we try to use the jar to read excel data from S3, the shell comes out with OOM issue. Unfortunately, I cannot share the file here.
Below is the code being used
As soon as I run the command, it comes out of
spark-shell
with the OOM error as shown below.Please suggest. Thanks.
Expected Behavior
No response
Steps To Reproduce
No response
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: