-
-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot import large csv tables #2995
Comments
We've discussed making inference (i.e. guessing column types) optional during the import process, I believe that should fix this. |
In #3050 I made some changes which I think might help with this problem you are having. This change will be available in Mathesar 0.1.4 which should be released in the next few weeks. Before my change, importing large amounts of data required significant computational time to determine the best Postgres type for each column. (We refer to this process as "column type inference".) I'm fairly certain that the error you observed was due to timeouts during the inference process. After my change, column type inference is optional. You can disable it during import here: With column type inference disabled, Mathesar will use the "Text" type for all columns by default. But you can still manually configure column types during import as show here: This is "optional column type inference" approach is somewhat of a stopgap measure, intended to hopefully offer a quick fix to this problem so that people like you can import large CSV data sets. In the future we'd like to make more improvements to the inference process as well, and we've opened #2346 to track them. I'd love to hear back from you once you've had a chance to try your import again while manually disabling column type inference. Does this fix your problem? Do you have other thoughts or feedback about how we can improve this import functionality? We'd really appreciate your feedback! I'm going to leave this ticket open while we wait to hear back from you, but I'm moving it out of our 0.1.4 milestone because we don't plan to make any additional changes to the import process for 0.1.4. |
Hello @seancolsen thank you for taking a time to look at this problem! Unfortunately right now I'm on vacation and I don't have access to the original very large csv that had the bad behavior. I tried it with another CSV and it worked great; hopefull the other csv will also work fine when importing all columns as text. I suggest you close this issue, if I've got any more problems related to this I'll open a new one. Kind regards, |
Thanks @spapas! Closing now. |
Description
I am trying to import a large csv file (>20 MB), table with ~ 200k rows and ~ 30 cols per row. After the data is uploaded it says "Please wait while we prepare a preview for you" and then dispalys the table preview with only the column names (without data). Then if I visit my tables I see that there's a new table with the name of the csv with a comment "Needs Import confirmation". When I click this I get the same problem with the preview.
I did some debug on the network requests and it seems to me that the import page (URL at
/db/mathesar_data/20/import/83/
) tries to fetch the url/api/db/v0/tables/83/type_suggestions/?columns_might_have_defaults=false
via ajax but this call takes too long and is killed by gunicorn (i.e takes more than 30 seconds) thus I get a "Faild to load preview" error.See the images below for more info.
Expected behavior
To be able to import the data. My understanding about this issue is that mathsar tries to be smart by altering the column types; this takes way too long when there's a lot of data on the table.
To Reproduce
Unfortunately I cannot provide the data I use to get the error because it is internal however I believe that you'd get similar behavior if you use a large file. Please notice that I tried importing the same columns but with only a couple of rows and it worked fine.
Environment
Additional context
It waits on this state for a long time (until the request is killed):
The ajax request that gets killed because of the 30 secodn gunicorn limit:
The error I get
The text was updated successfully, but these errors were encountered: