Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot write DataFrames with more than 2^31 rows #87

Open
Para7etamol opened this issue Jul 25, 2024 · 2 comments
Open

Cannot write DataFrames with more than 2^31 rows #87

Para7etamol opened this issue Jul 25, 2024 · 2 comments

Comments

@Para7etamol
Copy link

Para7etamol commented Jul 25, 2024

Sadly, JDF.jl depends on Blosc v1, which only supports 32bit adressing.

So this is impossible:

JDF.save(joinpath("/tmp", "test.jdf"), DataFrame(bytes=rand(Int8, 3_000_000_000)))

Please allow to disable the compression or to choose other algorithms. Or a way to split the JDF-files into partitions with < 2^31 rows.

Greetings
Para

@PallHaraldsson
Copy link

There's already https://juliahub.com/ui/Packages/General/Blosc2

Bloch2 is a new compression format, and software (also retaining old API), and it's 64-bit. The older is yes 32-bit, I think though should be able to go up to 4 GB, without changing the file format (but would need a fix in Blosh.jl, there's an issue on the limitation there).

@xiaodaigh
Copy link
Owner

working on this now. It's write it now but not read it yet. So hopefully I will find some time to get the reading working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants