Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds test cases for sum and delta system macros #137

Merged
merged 3 commits into from
Nov 21, 2024

Conversation

popematt
Copy link
Contributor

Issue #, if available:

#88

Description of changes:

Adds test cases for sum and delta.

As I was writing the test cases, it seemed really odd that sum can have zero-to-many argument values, but delta cannot. So, I made a decision to change that, and in these test cases, delta accepts zero-to-many argument values, and the implicit initial value is zero. (Just like sum.)

If there's disagreement, I can revert back to the currently spec'd behavior. If we agree, then I can update the spec accordingly.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

(binary "EF 12 02 07 FF FF FF 01")
(text "(:delta -1 -1 -1)")
(produces -1 -2 -3))
(each "4 arguments"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you throw in one or two simple tests showing that delta-of-delta works too?

@zslayton
Copy link
Contributor

zslayton commented Nov 21, 2024

EDIT: Per offline discussion, I was confusing delta with a past version of sum that took a bias and a stream of numbers to encode with that bias. The new version of delta should work fine!

You can disregard the below, but I'll leave it for historians.


The tests look good, so I've gone ahead and approved this. We should talk about the change to delta's arguments, though.

As I was writing the test cases, it seemed really odd that sum can have zero-to-many argument values, but delta cannot. So, I made a decision to change that, and in these test cases, delta accepts zero-to-many argument values, and the implicit initial value is zero. (Just like sum.)

For others' reference, here's the original description of delta:
image

I think the reason we had the initial value specified separately was in case the first value in the sequence was an outlier. If the first value is -8 and every value that follows it is 1_000_000_000, the encoding will be really suboptimal. Ideally, the seed value would be the median (or mode?) of the sequence.

The version you propose is reasonable if the writer can't/won't do any analysis of the sequence before writing. A writer aiming for compactness would probably take the time to sample the data before encoding it.

The original version of delta (with a distinct seed) can be used to define a new macro that sets 0 as the initial value.

I think modifying the definition so the second parameter (deltas) was * instead of + would be ok. If you set the seed and deltas is empty, it expands to the empty stream.

What do you think?

@popematt
Copy link
Contributor Author

popematt commented Nov 21, 2024

Summary of offline conversation in which we decided to keep delta as it is defined in this PR.

The version in this PR is aligned with the commonly known concept of directed delta encoding. It is intended to be a compact representation of groups of numbers that are relatively close to each other.

In practice, pre-analysis of the numbers to select a median/mode as the initial seed is unlikely to result in any significant savings over this method since delta encoding is designed to take advantage of things that have small differences. (As an aside, pre-analysis can be useful to determine whether delta encoding is actually beneficial—this strategy is used in some audio codecs.)

The "difference from initial seed" function can be constructed using a template macro such as this:

(macro biased_ints (initial biases*) (.for (b (%biases)) (.sum (%initial) (%b))))

@popematt
Copy link
Contributor Author

As per offline conversation about the usefulness of allowing an arbitrary number of operands for sum, I have simplified sum to accept exactly 2 operands instead of an arbitrary number of operands.

@popematt popematt merged commit 14e8ce8 into amazon-ion:main Nov 21, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants