-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API to allow incremental updates #7
Comments
What can I do to help you finish this up and release this update? Thanks! |
Sorry, I was off for a couple days, hence the delay in responding. I think this needs a bit of a research to (a) quantify how expensive the de-serialization and serialization for every increment is an issue in practice. OTOH we can't do much about this anyway, because the digest is already a plain varlena buffer anyway, so most of the overhead will come from pglz and toasting. (b) see if compacting after every value has negative impact on accuracy of the t-digest, or whether we need to batch it somehow (which I think we could without breaking the on-disk format by using a flag). I think the easiest way to evaluate this is to compare tdigests built incrementally (by using the WIP patch) and using the regular aggregates. |
I've significantly reworked/improved this feature, cleaned up the code and added docs and tests. The API that I propose has three new functions:
I'm not sure whether to rename the third function to There's one more thing - by default, serializing the in-memory representation into t-digest forces compaction. But forcing it after adding each individual value does not seem like a great idea, so I've added a parameter that allows disabling it. Not sure if that's a good idea, though. |
I've polished the dev branch a bit and pushed it as 9d34031. |
Started as an e-mail discussion with @sporty81, who asked how to do incremental updates, i.e. being able to do something like this:
or something like that. A very early WIP version of this is implemented in #6 which essentially adds
tdigest_add()
andtdigest_union()
that are conceptually equivalent tohll_add
andhll_union
. I'm sure the current code is rather inefficient but hopefully it's good enough for some basic feedback.I'm more worried about accuracy, though. Essentially, every
tdigest_add
has to deserialize the value, adds the new value and then compacts the t-digest. The (de)serialization might be fairly expensive, but I'm sure there are ways to improve that, to some extent (one of the limiting factors will be that Postgres does not allow partial / in-place updates, so any change to the tdigest has to store a new copy of the whole value).But I suspect compacting the digests after adding every single value may have negative impact on accuracy of the digest, so maybe we'll need to reconsider this strategy and invoke compaction less frequently.
The text was updated successfully, but these errors were encountered: