Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialization: implement new serialization traits for the currently supported types #862

Merged
merged 29 commits into from
Dec 5, 2023

Conversation

piodul
Copy link
Collaborator

@piodul piodul commented Nov 20, 2023

The new SerializeCql and SerializeRow traits are supposed to eventually replace Value and ValueList. Currently, the new traits are implemented with a blanket implementation based on the old traits - this was just done in order to parallelize work on adapting the whole codebase to the new traits; the blanket implementations are inefficient and not type safe. This PR gets rid of them and, for each type for which we implement Value or ValueList, introduces proper implementations of SerializeCql and SerializeRow.

All types now check whether the CQL type serialized to matches, and most of them serialize exactly the same as they used to with the old traits. However, some slight differences were made in order to utilize the information available to the new traits and make it more robust:

  • CqlValue::UserDefinedType now doesn't assume that the order of the fields is the same as in the CQL type - it sorts the fields according to the CQL type.
  • impl SerializeRow for BTreeMap/HashMap doesn't serialize in the "named values" format anymore as it is no longer supported by the new traits. Instead, it uses provided information about the column names to properly order the values before sending them to the database.

Existing tests in value_tests.rs are adapted so that they perform serialization using both old and new traits and compare the results (with a small number of exceptions where serialization behavior was changed).

Fixes: #821

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@piodul piodul requested a review from Lorak-mmk November 20, 2023 02:32
@piodul piodul force-pushed the implement-for-new-serialization branch 3 times, most recently from fa08790 to a60de4b Compare November 24, 2023 00:43
@piodul
Copy link
Collaborator Author

piodul commented Nov 24, 2023

v1.1: more complete docstrings for the error types

@Lorak-mmk
Copy link
Collaborator

Depends on #851

Isn't this the other way around?

@piodul
Copy link
Collaborator Author

piodul commented Nov 29, 2023

Depends on #851

Isn't this the other way around?

Yeah, my mistake. Updated the description.

@piodul piodul force-pushed the implement-for-new-serialization branch from a60de4b to 14a67aa Compare November 30, 2023 08:24
@piodul
Copy link
Collaborator Author

piodul commented Nov 30, 2023

v2:

@piodul piodul force-pushed the implement-for-new-serialization branch from 14a67aa to 66db312 Compare December 1, 2023 05:04
@piodul
Copy link
Collaborator Author

piodul commented Dec 1, 2023

v3:

  • adjusted the builder interface to make it safer/easier to handle CQL value size overflows
  • adjusted the impl SerializeCql blocks to the new interface - new errors are handled by unwraps for fixed sized types, and by properly returning an error for dynamically sized ones
  • got rid of the .expect calls in map/list/set serialization code

@piodul
Copy link
Collaborator Author

piodul commented Dec 1, 2023

Looks like the CI failed due to a known issue: #813

@piodul
Copy link
Collaborator Author

piodul commented Dec 1, 2023

Now it's #864 ...

@Lorak-mmk Lorak-mmk mentioned this pull request Dec 1, 2023
8 tasks
Comment on lines +263 to +269
impl SerializeRow for SerializedValues {
fallback_impl_contents!();
}

impl<'b> SerializeRow for Cow<'b, SerializedValues> {
fallback_impl_contents!();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a implementation created to make transition to new API easier? If so, let's add a TODO / create and issue so that we don't forget to remove it later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create an issue about removing the legacy API after we release the new one. I assure you that we won't forget about this piece of code - it will stop compiling after we remove SerializedValues.

scylla-cql/src/types/serialize/value.rs Show resolved Hide resolved
piodul added 13 commits December 4, 2023 15:50
Some conversions between time-related types return `ValueTooBig` in case
when one representation doesn't fit into the other. This is a
misinterpretation of the `ValueTooBig`'s semantics: it is supposed to be
returned when the value's size in bytes is longer than the maximum size
of `i32::MAX`, not when the value overflows the allowed range (e.g.
doesn't fit in a 64bit number).

Introduce `ValueOverflow` error in order to differentiate the cases when
a value is "too big" and use it in conversions between time-related
types.

Adjust relevant `Value` implementations which use the conversions to map
the `ValueOverflow` to `ValueTooBig`. This is done in order not to
introduce breaking changes to the soon-to-be-legacy interface.
Adjust the existing tests for the `Value` trait to also do serialization
via `SerializeCql` and compare results produced by both traits.

The tests pass at the moment because the new trait is implemented using
the old one. In the commits that follow, we will introduce proper
implementations of the new trait for the existing types. In nearly all
cases we will be interested in preserving compatibility, so the exising
tests will be very helpful.
Not all types that implement `Value` have a corresponding test in
`value_tests`. Add more tests to improve coverage and help make sure
that implementations of `SerializeCql` that will be introduced in
further commits will be correct.
The CQL format imposes the following limits:

- At most u16::MAX values to be provided to a query,
- At most i32::MAX bytes in a single value.

Currently, the interface defined in the types::serialize::writers module
checks those limits and panics if they are exceeded during
serialization.

While those limits are quite large, they sometimes can accidentally be
exceeded, and the legacy serialization code handles such situations
properly (i.e. it does not panic). We should not regress and make it
possible/easy to handle it in the new interface.

The following changes are made:

- `CellWriter::set_value` and `CellValueBuilder::finish` now return a
  Result, indicating whether there was an overflow or not.
- In order to account for the interface changes, `CountingWriter` is
  split into multiple separate types, for each of the traits that the
  original type implemented.
- The limit for u16::MAX values in `RowWriter` is lifted, and an `usize`
  is used to track the value count. The driver code is supposed to check
  whether the limit is exceeded or not before sending the query, and
  deferring the overflow check to that point should make the
  serialization code slightly more performant.
Currently, there is a blanket implementation of `SerializeCql` for all
types that implement `Value`. Replace it with an `impl SerializeCql`
block for every type that currently implement `Value`.

This is just a preparation - the blocks that were introduced still
utilize the same fallback logic as the blanket impl. They will be
replaced with proper impls in the following commits.
piodul added 15 commits December 4, 2023 15:56
Currently, the `MaybeUnset` wrapper requires the inner type to implement
`Value`. As we want to use the type parameter both with the old and the
new serialization framework, this restriction doesn't make much sense in
the case of the new framework, so remove it. The impls for `Value` and
`SerializeCql` on the `MaybeUnset` itself should themselves require
appropriate traits to be implemented on the inner type.
Some but not all CQL types support an empty, zero-bytes value, in
addition to their natural representation - for example, `int` is usually
4 bytes but can also be empty. Introduce a function which tells whether
given CQL type supports the additional empty value; it will be used in
`impl SerializeCql for CqlValue`, the `CqlValue::Empty` case.
`CqlValue` was the last type for which `SerializeCql` was supposed to be
implemented, so remove the obsolete `fallback_impl_contents` macro as
well.
The behavior of UDTs represented via CqlValue has changed to a safer
one: they no longer need their fields to be in correct order to be
serialized properly. Add a test for it.
Modifies the existing tests for `ValueList` to also perform
serialization via `SerializeRow` and compare the results. It will help
in getting the hand-written implementation of `SerializeRow` for the
currently supported types correct.
Like it was done for `SerializeCql`, remove the blanket `ValueList` ->
`SerializeRow` impl and replace with explicit `impl` blocks for all
types for which we currently implement `ValueList`.

The newly introduced `impl SerializeRow` blocks use the same fallback
logic as the original blanket implementation - it will be replaced in
further commits.
@piodul piodul force-pushed the implement-for-new-serialization branch from 66db312 to 7def91e Compare December 4, 2023 17:06
@piodul
Copy link
Collaborator Author

piodul commented Dec 4, 2023

v4:

  • Rebased
  • Implemented two declarative macros that allows to generate a quick and dirty implementation of SerializeRow/SerializeCql based on ValueList/Value

Introduces two procedural macros:

- impl_serialize_cql_via_value implements SerializeCql based on an
  existing Value impl,
- impl_serialize_row_via_value_list implements SerializeRow based on an
  existing ValueList impl.

The macros are intended to be used to help people gradually migrate
their codebase to the new serialization API in case it is not easy to
write a SerializeCql/SerializRow impl right from the beginning.
@piodul piodul force-pushed the implement-for-new-serialization branch from 7def91e to 0642eb5 Compare December 4, 2023 17:23
@piodul piodul merged commit bdcf349 into scylladb:main Dec 5, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants