-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT CLOSE OR MERGE] Rti 12596 compress to file #32
base: main
Are you sure you want to change the base?
Conversation
static int save_file(const char* fileName, const void* buff, size_t buffSize) | ||
{ | ||
FILE* const oFile = fopen(fileName, "a"); | ||
if (!oFile) { | ||
return 0; | ||
} | ||
size_t const wSize = fwrite(buff, 1, buffSize, oFile); | ||
if (wSize != (size_t)buffSize) { | ||
return 0; | ||
} | ||
if (fclose(oFile)) { | ||
return 0; | ||
} | ||
return 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this part is IO bound and it's not marked as such, then opening and closing are expensive operations on top of that doing it constantly is going to be slow. There is no fsync
to ensure that the data is written to the disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I highly recommend reading the I/O Queues
part of the erl_nif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also probably it should be scheduled with ERL_NIF_DIRTY_JOB_IO_BOUND
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding fsync
: I tried to use fsync or any variation like fdatasync, but they are not compliant with C99 : error: implicit declaration of function 'fsync' is invalid in C99 [-Werror,-Wimplicit-function-declaration] fsync(); ^ 1 error generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding I/O Queues, that is a good idea, I just found this very good example, we could replicate it: https://github.com/erlang/otp/blob/master/erts/emulator/nifs/common/zlib_nif.c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also probably it should be scheduled with
ERL_NIF_DIRTY_JOB_IO_BOUND
I'm not sure about that as compressing uses a lot of cpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because fsync is under unistd.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if we have a way with IO queues, then I would rather follow that approach rather than directly writing to disk.
static ERL_NIF_TERM zstd_nif_compress_to_file(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) { | ||
ErlNifBinary bin; | ||
if(!enif_inspect_iolist_as_binary(env, argv[0], &bin)) return enif_make_badarg(env); | ||
if (bin.size > MAX_BYTES_TO_NIF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data should be splitted and handle in shorter nif calls. I recommend reading Yielding NIF
and Long-Running NIFs
https://www.erlang.org/doc/man/erl_nif.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should do the same as zlib
#include <zstd.h> | ||
|
||
#define MAX_BYTES_TO_NIF 20000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How and Why this value was chosen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same value that the erts uses for their nifs: https://github.com/erlang/otp/blob/f83f3a77d0ee73627b7a45d559dc0a50f931e9fa/lib/crypto/c_src/mac.c
You can build some NIF functions that do the actual compression and then do the rest using erlang in the same module if you do not feel comfortable with C. |
You also forgot to have options for the compression level and the "long distance matching with given window log" or |
I was trying to do the compression and writing in C in order to be able to release memory as soon as possible. It also should be more efficient. |
The |
@raz-adroll I suggest merging this change as it is now and work on the improvements whenever the next ticket is prioritized. |
This shouldn't be merged as it is. |
What do you propose to do as the minimum for merging? I believe adding the fsync should be enough |
The minimum would be not breaking trino querying |
Sure, then this PR will be kept unmerged until the next ticket is prioritized. |
No description provided.