Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use qs2 serialization #162

Open
wlandau opened this issue Oct 31, 2024 · 9 comments
Open

Use qs2 serialization #162

wlandau opened this issue Oct 31, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@wlandau
Copy link

wlandau commented Oct 31, 2024

As we have discussed, crew cannot set a serialization method through mirai because the mechanism relies on everywhere(). As a compromise, would it be okay to have the option to disable built-in serialization and leave it to the user? That would allow crew to leverage the compression capabilities of qs::qserialize()/qs::qdeserialize() without incurring duplicate overhead from serializing twice.

@shikokuchuo
Copy link
Owner

I'm copying @traversc as we need to have a joined-up discussion here.

The only meaningful integration I can see is at the C level with nanonext, if qs2 can be used as a drop-in replacement for R serialization.

@traversc, for qs2 do you plan / consider to expose functions with the following signatures (or equivalent) by registering them as C callables?

void qs2_serialize(unsigned char *, size_t *, SEXP);
SEXP qs2_unserialize(unsigned char *, size_t);

For serialization, I'm thinking it could be 2-pass, where passing NULL will return the size required. But you might do all the memory management in-package so I won't pre-suppose anything.

It should be fairly straightforward to wrap your C++ functions with an extern "C" for these exports.

@traversc
Copy link

traversc commented Nov 1, 2024

A two pass would be pretty inefficient. If it was C++ returning a std::vector would make sense, what would be a C way of doing something like that? Returning a buffer that the caller would then need to manage?

unsigned char * qs2_serialize(size_t * size, SEXP) {
   *size = ...
   unsigned char * result = (unsigned char*)malloc(*size);
   return result;
}

@shikokuchuo
Copy link
Owner

Yeah the 2-pass is a kind of C way of thinking of things. I think you should just do the serialization and store it in whatever C++ object or class.

Then have an extern "C" wrapper which returns the raw pointer and the size, which can be consumed by a C program. Don't worry about this step - I can have a look at it when you get there.

Just wanted to check conceptually you plan it to be a drop-in for R serialization?

@traversc
Copy link

traversc commented Nov 1, 2024

Yes, it should be a drop in replacement since I'm using R_serialize. The one difference is I'm not passing a callback function which I haven't had a use for.

Question for you, do you see a use for in-memory compression? Or is the data generally small enough that it doesn't matter? We can get on average ~8x compression with a some overhead, but that can be tuned based on zstd compression level.

@wlandau
Copy link
Author

wlandau commented Nov 1, 2024

Question for you, do you see a use for in-memory compression?

From my end, in-memory compression would be extremely valuable, and it's the reason why I am interested in this thread and qsbase/qs2#4

@traversc
Copy link

traversc commented Nov 4, 2024

Yeah the 2-pass is a kind of C way of thinking of things. I think you should just do the serialization and store it in whatever C++ object or class.

Then have an extern "C" wrapper which returns the raw pointer and the size, which can be consumed by a C program. Don't worry about this step - I can have a look at it when you get there.

Just wanted to check conceptually you plan it to be a drop-in for R serialization?

Can you have a look at the latest commit and example here? qsbase/qs2#4

@shikokuchuo
Copy link
Owner

I haven't been able to run it yet, but it looks promising! Sorry I was not totally up to date - I thought qs2 was still an early stage proof of concept, didn't realise it had already reached CRAN. Will let you know once I've tested.

As for compression, yes it would be useful as creating the serialised object (buffer) necessarily creates a copy. So I think this would be useful even on the same machine. But regardless, minimising bytes transferred over the wire makes sense in a networked scenario e.g. HPC cluster.

@traversc
Copy link

traversc commented Nov 8, 2024

@shikokuchuo Have you had a chance to check it out? I am hoping to submit a CRAN update soon.

@shikokuchuo
Copy link
Owner

@shikokuchuo Have you had a chance to check it out? I am hoping to submit a CRAN update soon.

Hi Travers, sorry I didn't get the chance today. I'll see if I have time over the weekend, otherwise please go ahead if you don't hear from me by Monday.

@shikokuchuo shikokuchuo changed the title Turn off serialization? Use qs2 serialization Nov 14, 2024
@shikokuchuo shikokuchuo added the enhancement New feature or request label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants