-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exceed 2^31-1 bytes #64
Comments
I also have this issue with no solution yet. It seems to be specific when using import_src on the 2,15 GB data_float_h.csv.gz file from SICdb, all other datasets worked fine. Some things I've tried:
Any other suggestions to try would be much appreciated. |
The configuration files under The error message posted stems from the fact, that ricu/inst/extdata/config/data-sources.json Line 9740 in 7f2cc42
rawdata column is specified to be of type col_double . The database documentation here: https://www.sicdb.com/Documentation/Signal_Data clearly states, that this column is a binary data column compressing up to 60 floats into a single cell of the csv table, to keep the row count of the table manageable, while still providing up to a minute level of resolution for some variables. 60 compressed floats naturally do not cast well to a col_double and thus one gets a full error message for every single line of the entire data_float_h table, this error messages are all concatenated by ricu instead by this function report_problems here, concatenating this many error messages blows the R string size of 2^31-1 bytes, which explains the error message.
Interestingly there's a second Potential fix is:
Hope this helps |
@manuelburger Thank you so much for the clear explanation. I've removed the redundant 'report_problems' function and changed the rawdata column from Probably this has to do with the changes you mentioned in #30 which are not merged with the main branch. Is there any particular reason that these changes are not available? Or is it only me for which SICdb 1.0.6 is not working in ricu? |
So short update, I've taken the branch mentioned in #30 as created by @prockenschaub and recompiled the ricu package (the older 0.5.5 version that is) and tried with this to add SICdb. The previous error does not occur, however after importing 86% a new one does: I've tried tracing back the code to see if there was an obvious explanation, but could not find one. It is not clear to me what function res should be. Is there anyone with a working SICdb environment? And could they tell me which codebase they used? |
@mcr1213 I originally meant to work with SICdb when it was released but this has been pushed back repeatedly, so I haven't touched the code in a while. I originally thought that SICdb was fully integrated in Since there appears to be increased interest in SICdb, maybe now is a good time to look at it again. I will try to find some time in the coming days to look at your error and see what's wrong / how we can bring the code into the latest version of Edit: I had a quick look. |
@prockenschaub Thanks for your suggestion. Unfortunately, I'm no expert in debugging R-packages and it does not yet work for me. At the moment my hypothesis is that the mentioned 'sic_data_float_h' cannot be found. When doing ls("package:ricu") this function does not show up in the available functions. I do know that this function is placed in the new (compared to the original release) file "./R/callback-tb-R". Searches in google/chatgpt suggested mentioning the file in the main DESCRIPTION file, but the other files are not referenced there. I've also tried to 'Reoxygenize' the package to recreate NAMESPACE, but no luck. Can you tell me if I'm on the right track? Does the sicdb work for you? |
I will resolve this issue in the next version (i.e., in June). In the meantime, if this is an urgent matter for anyone, my suggestion is to simply perform manual conversion to First, I split the
And then all tables can be converted to
Once the |
Thanks for the help everyone! The tables can now be successfully imported. |
Happy to see active interest on the issue. @dplecko I ran the Python and R code snippets and while I was able to generate the data_float_h in parts; however, when they moving inside Thank you in advance for your help and active maintenance of the repository. |
@partizanos I'm afraid I had to do some combination of all the solutions provided. I'm not exactly sure which step was crucial to result in a working sicdb. The script above I used to unpack the data. In the end I ended up with a single data_float_h.fst file that worked. I guess that multiple .fst files in dir data_float_h should work too, as other datasets use the same structure. |
@partizanos here is how the tables are organized for me in the The folder should be called A proper fix for all of this will happen some time this summer in |
Hello, I tried quite some of the combinations some progress but still no luck. I enlist my experience below in case it helps investigation. The investigation below was done using R version 4.3.3 , ricu 0.5.5
Let me know if you have any suggestions on how to debug this further and thank you for the useful comments and active support of this repository. |
Hello happy update I managed to replicate the pipeline you describedusing 0.6.1 ricu and by reading/writing with the splitting method. I rewrote the splitting function to work with cpp single python was taking long type with my python script. Uploaded here:
Succesfull compilation should give you a ./split_csv binary to execute.
Thanks everyone for the help! |
Hello I try to use ricu with sic dataset however I face this issue (below) any ideas?
The text was updated successfully, but these errors were encountered: