Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement timestamp generators #1128

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

smoczy123
Copy link

@smoczy123 smoczy123 commented Nov 20, 2024

To achieve parity with cpp-driver we need to implement client-side timestamp generators.
This pull request adds a TimestampGenerator trait and a MonotonicTimestampGenerator that implements it,
together with an extension to SessionBuilder that provides an ability to set a TimestampGenerator in Session
and use it to generate timestamps.

Fixes #1032

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

Copy link

github-actions bot commented Nov 20, 2024

cargo semver-checks found no API-breaking changes in this PR! 🎉🥳
Checked commit: 149844f

@smoczy123 smoczy123 force-pushed the implement-timestamp-generator branch 2 times, most recently from 80fa6c4 to 3c0da90 Compare December 2, 2024 23:09
@smoczy123 smoczy123 force-pushed the implement-timestamp-generator branch 3 times, most recently from 811f0bf to 1d70873 Compare December 3, 2024 23:36
@github-actions github-actions bot added the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label Dec 3, 2024
@smoczy123 smoczy123 force-pushed the implement-timestamp-generator branch from 1d70873 to 0f151f1 Compare December 4, 2024 00:05
@github-actions github-actions bot removed the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label Dec 4, 2024
@smoczy123 smoczy123 force-pushed the implement-timestamp-generator branch 2 times, most recently from 6d1943a to 876d544 Compare December 5, 2024 00:14
@smoczy123 smoczy123 marked this pull request as ready for review December 5, 2024 00:20
@smoczy123 smoczy123 requested a review from wprzytula December 5, 2024 00:20
Copy link
Collaborator

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do other drivers (beside cpp-driver) have timestamp generators? If so, how do they implement them?
I'm asking because cpp-compatible implementation does not have to reside in Rust Driver: cpp-rust-driver can implement it in its codebase. We should think about what implementation(s) we want to provide to our users, not what cpp-rust-driver needs here.

Did you maybe test how long a call to MonotonicTimestampGenerator::next_timestamp() takes? The clock has a microsecond resolution and microsecond is a lot of time. It may be possible that if we call the generator often (like in your unit test), then it may always choose last + 1 branch, because we are still in the same microsecond as previous call. In that case after multiple calls the returned value may be far in the future compared to system clock. If there are multiple clients, it may cause issues.

@smoczy123
Copy link
Author

I've also checked Java and Python drivers, they implement it in the exact same way, taking the microsecond time since epoch. Unfortunately, next_timestamp() is quite fast, with last + 1 branch being used around 500 out of a 1000 iterations in the unit test. However, I think that such behavior is expected, that's why the default settings give the system clock 1 second to catch up. I think switching to a nanosecond timestamp would be manageable (i64 would support timestamps until around 2250), however I'm not sure that we would like to provide a behavior different to all other drivers implementing this generator.

@smoczy123 smoczy123 force-pushed the implement-timestamp-generator branch from 876d544 to feba1aa Compare December 5, 2024 14:50
@wprzytula wprzytula requested a review from Lorak-mmk December 8, 2024 08:22
Copy link
Collaborator

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I don't think connection is the right layer to utilize the generator, imo it is more fitting for the session layer. Look into session.rs file, into functions like execute or run_query (cc: @wprzytula )

  • Nit: the commit messages have too long first lines, which makes github not render them correctly. Avoid having any lines longer than 70 characters, especially the first line (which should ideally have at most 50 characters).

  • The MonotonicTimestampGenerator struct could use more explanation in its doc comment. Please describe what it guarantees, how it behaves (errors, drifting etc).

  • On that front, documentation book should also be updated with info about timestamp generators. It should either be a new file in queries folder, or a new folder. @wprzytula @muzarski wdyt it the better place?

Comment on lines 1053 to 1058
let mut timestamp = None;
if query.get_timestamp().is_none() {
if let Some(x) = self.config.timestamp_generator.clone() {
timestamp = Some(x.next_timestamp().await);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to clone here, a reference should be enough to generate a timestamp.

Comment on lines 23 to 56
warning_threshold_us: i64,
warning_interval_ms: i64,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently you use raw integers for those durations, and you treat "0" as a special value.
The less error prone and more Rusty way is to use std::time::Duration to store thresholds / intervals, and use Option<std::time::Duration> instead of using a special value.

@muzarski
Copy link
Contributor

* On that front, documentation book should also be updated with info about timestamp generators. It should either be a new file in `queries` folder, or a new folder. @wprzytula @muzarski wdyt it the better place?

Ideally, we could have a new directory with a file for each generator. However, if it turns out in multiple very short .md files, I don't see a problem with merging them into one file, and putting it under queries.

@smoczy123
Copy link
Author

My first thought was to put the timestamp generation in the session layer, however the functions I've changed in connection.rs are being called from multiple user-facing functions in the session layer directly and from even more indirectly. Generating the timestamps in the session layer would require us to generate a timestamp in every single one of those functions. Furthermore, if we decide to add more functions in the session layer, we would have to remember to add timestamp generation to them. I think this approach would be really bug-prone.

@Lorak-mmk
Copy link
Collaborator

My first thought was to put the timestamp generation in the session layer, however the functions I've changed in connection.rs are being called from multiple user-facing functions in the session layer directly and from even more indirectly. Generating the timestamps in the session layer would require us to generate a timestamp in every single one of those functions. Furthermore, if we decide to add more functions in the session layer, we would have to remember to add timestamp generation to them. I think this approach would be really bug-prone.

For normal queries modifying Sesion::run_query should be enough. The problem is with QueryPager, which has multiple constructors (QueryPager::new_for_prepared_statement, QueryPager::new_for_connection_execute_iter etc). I admit we lack a good place for this, so let's leave it in connection for now - improving this seems out of scope for this PR.

@smoczy123 smoczy123 force-pushed the implement-timestamp-generator branch from feba1aa to 4ad3cd7 Compare December 11, 2024 21:57
@smoczy123 smoczy123 requested a review from Lorak-mmk December 11, 2024 21:57
@wprzytula wprzytula requested a review from muzarski December 12, 2024 06:05
@wprzytula wprzytula added this to the cpp-rust-driver milestone Dec 12, 2024
@smoczy123 smoczy123 force-pushed the implement-timestamp-generator branch 2 times, most recently from 69b7789 to a66210e Compare December 12, 2024 18:39
scylla/src/transport/timestamp_generator.rs Outdated Show resolved Hide resolved
scylla/src/transport/timestamp_generator.rs Outdated Show resolved Hide resolved
scylla/src/transport/timestamp_generator.rs Outdated Show resolved Hide resolved
Comment on lines 110 to 129
async fn next_timestamp(&self) -> i64 {
loop {
let last = self.last.load(Ordering::SeqCst);
let cur = self.compute_next(last).await;
if self
.last
.compare_exchange(last, cur, Ordering::SeqCst, Ordering::SeqCst)
.is_ok()
{
return cur;
}
}
}
Copy link
Collaborator

@Lorak-mmk Lorak-mmk Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 When the client is under high load, won't this approach be a problem?
We should benchmark this, @muzarski could you help @smoczy123 with that?

The alternative approach would be to just put last under mutex, avoiding the retries. The added benefit of that is that you can store Instant under Mutex, which would simplify the code in compute_next.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prepared a branch that uses rust-driver version from this PR: https://github.com/muzarski/cql-stress/tree/timestamp-gen.

@smoczy123, you can set the timestamp generator for c-s frontend in src/bin/cql-stress-cassandra-stress/main.rs in prepare_run function. The Session object is created in this function.

Commands for some simple workloads:

cql-stress-cassandra-stress write n=1000000 -pop seq=1..1000000 -rate threads=20 -node <ip addresses>
cql-stress-cassandra-stress read n=1000000 -pop seq=1..1000000 -rate threads=20 -node <ip addresses>

First one will insert 1M rows to the databse, while the latter reads the rows and validates them. You can play around with the run parameters and options. You can also try running multiple loads simultaneously to simulate multi-client scenario.

To run scylla locally, you can see https://hub.docker.com/r/scylladb/scylla/. Then you can replace <ip addresses> in the commands above, with your scylla nodes' ips (comma-delimited list of ips).

If you stumble upon any problems, feel free to ping me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've ran write and read workloads for a different numbers of threads in both single and multi client scenarios and I've seen no measurable difference in speed between using a monotonic timestamp generator and not using one. It seems like this does not cause issues with latency.

scylla/src/transport/timestamp_generator.rs Outdated Show resolved Hide resolved
scylla/src/transport/timestamp_generator.rs Outdated Show resolved Hide resolved
docs/source/queries/queries.md Show resolved Hide resolved
docs/source/SUMMARY.md Outdated Show resolved Hide resolved
docs/source/queries/timestamp-generators.md Outdated Show resolved Hide resolved
docs/source/queries/timestamp-generators.md Outdated Show resolved Hide resolved
@smoczy123 smoczy123 force-pushed the implement-timestamp-generator branch from a66210e to 569168c Compare December 21, 2024 23:12
@smoczy123 smoczy123 force-pushed the implement-timestamp-generator branch 4 times, most recently from 095bbfe to f0e5493 Compare December 22, 2024 11:13
Added TimestampGenerator trait and MonotonicTimestampGenerator
based on c++ driver's implementation
Also added an ability to set it through Session Builder
The timestamp generator in ConnectionConfig is set in Session::Connect()
Generated timestamp is only set if user did not provide one
@smoczy123 smoczy123 force-pushed the implement-timestamp-generator branch from f0e5493 to 149844f Compare December 22, 2024 11:38
@smoczy123 smoczy123 requested a review from Lorak-mmk December 22, 2024 11:49
@Lorak-mmk
Copy link
Collaborator

@smoczy123 I see you requested a re-review. If you addressed my comments in the new version of the code, please mark them as resolved, so that I know which one to expect to be fixed.

@smoczy123
Copy link
Author

All of your comments should be addressed now, I've left two conversations open as I'm not sure about those two

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement timestamp generators
4 participants