Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Significantly decrease startup times for WAL #25643

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

mgattozzi
Copy link
Contributor

@mgattozzi mgattozzi commented Dec 11, 2024

feat: add startup time to logging output

This change adds a startup time counter to the output when starting up
a server. The main purpose of this is to verify whether the impact of
changes actually speeds up the loading of the server.

feat: Significantly decrease startup times for WAL

This commit does a few important things to speedup startup times:

  1. We avoid changing an Arc to a String with the series key as the
    From impl will call with_column which will then turn it into
    an Arc again. Instead we can just call with_column directly
    and pass in the iterator without also collecting into a Vec
  2. We switch to using bitcode as the serialization format for the WAL.
    This significantly reduces startup time as this format is faster to
    use instead of JSON, which was eating up massive amounts of time.
    Part of this change involves not using the tag feature of serde as
    it's currently not supported by bincode
  3. We also parallelize reading and deserializing the WAL files before
    we then apply them in order. This reduces time waiting on IO and we
    eagerly evaluate each spawned task in order as much as possible.

This gives us about a 189% speedup over what we were doing before.

Closes #25534

@mgattozzi mgattozzi force-pushed the mgattozzi/startup-times branch from 5e83ae9 to a32801b Compare December 11, 2024 18:49
@mgattozzi mgattozzi changed the title feat: Significantly increase startup times for WAL feat: Significantly decrease startup times for WAL Dec 11, 2024
@hiltontj
Copy link
Contributor

Curious why you chose bitcode. One of their project non-goals is:

Stable format across major versions

i.e., they do not guarantee stability of the format across major versions. I know we have a bit of leeway with WAL files since they are only kept around for relatively short amount of time, but I think we would need to have some kind of versioning and upgrade process in place if we ever need to update our bitcode version, based on that statement.

If any of the other serde-compatible formats offer stability and similar performance gains, it might be worth considering those (Avro, MSGpack, etc.).

@mgattozzi
Copy link
Contributor Author

Bitcode was the fastest according to these benchmarks which was the main draw. I think we can get around versioning issues by adding a version into the header that we check already to verify the wal is okay and then check which version to use to deserialize the code. We would need this also for breaking changes to the file format anyways if we wanted to do that and handle it without the user needing to know

@hiltontj
Copy link
Contributor

Bitcode was the fastest according to these benchmarks which was the main draw.

Seems like others that perform similarly in their benchmarks are a bit too obscure.

We would need this also for breaking changes to the file format anyways if we wanted to do that and handle it without the user needing to know

That is true. Also, looking at some others, e.g., bincode and apache_avro, it seems like they have similar statements about stability, so something we probably need to handle regardless of the format we choose.

Copy link
Contributor

@hiltontj hiltontj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had one comment on the profile change. Otherwise, the only thing to mention is that it would be good to track an issue for versioning the WAL file in the header.

Cargo.toml Outdated
@@ -182,6 +183,7 @@ inherits = "release"
codegen-units = 16
lto = false
incremental = true
debug = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the quick-bench below provides this profile, i.e., quick-release+debug=1, would you be okay to leave this out in favour of that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah no problem I meant to take this out

Copy link
Contributor

@praveen-influx praveen-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, maybe one for the future - is there any zero copy alternatives that could be used for specifically serializing/deserializing wal?

@pauldix
Copy link
Member

pauldix commented Dec 12, 2024

@praveen-influx Flatbuffers would be the thing, but I've found those generally very difficult to work with and not usually worth the effort. But we could certainly prototype it at some point in the future. But I don't think it's worth doing before the alpha.

@mgattozzi
Copy link
Contributor Author

@praveen-influx and @pauldix there's rkyv for zero copy which I've been keeping an eye on for awhile

https://docs.rs/rkyv/latest/rkyv/

The developer of it did a talk at RustConf this year as well -> https://www.youtube.com/watch?v=ON4z2LbTD-4

@mgattozzi
Copy link
Contributor Author

@hiltontj looks like we already have the version in the header! No need for a new issue

/// The first bytes written into a wal file to identify it and its version.
const FILE_TYPE_IDENTIFIER: &[u8] = b"idb3.001";

This change adds a startup time counter to the output when starting up
a server. The main purpose of this is to verify whether the impact of
changes actually speeds up the loading of the server.
This commit does a few important things to speedup startup times:
1. We avoid changing an Arc<str> to a String with the series key as the
   From<String> impl will call with_column which will then turn it into
   an Arc<str> again. Instead we can just call `with_column` directly
   and pass in the iterator without also collecting into a Vec<String>
2. We switch to using bitcode as the serialization format for the WAL.
   This significantly reduces startup time as this format is faster to
   use instead of JSON, which was eating up massive amounts of time.
   Part of this change involves not using the tag feature of serde as
   it's currently not supported by bincode
3. We also parallelize reading and deserializing the WAL files before
   we then apply them in order. This reduces time waiting on IO and we
   eagerly evaluate each spawned task in order as much as possible.

This gives us about a 189% speedup over what we were doing before.

Closes #25534
@mgattozzi mgattozzi force-pushed the mgattozzi/startup-times branch from a32801b to c9f4286 Compare December 12, 2024 16:19
@mgattozzi
Copy link
Contributor Author

I'll merge once CI passes again as all I needed to do is remove the debug option in quick-release.

@mgattozzi mgattozzi merged commit 9292a32 into main Dec 12, 2024
13 checks passed
@mgattozzi mgattozzi deleted the mgattozzi/startup-times branch December 12, 2024 16:27
Copy link
Member

@pauldix pauldix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, but not a blocker

@@ -206,6 +212,9 @@ where
let hybrid_make_service = hybrid(rest_service, grpc_service);

let addr = AddrIncoming::from_listener(server.listener)?;
let timer_end = Instant::now();
let startup_time = timer_end.duration_since(startup_timer);
info!("Server Startup Time: {}ms", startup_time.as_millis());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to output this using duration's formatting? I think it'll do things like 1m32s...

Copy link
Contributor Author

@mgattozzi mgattozzi Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pauldix it does not have a formatting available to it by default. It does not impl Display. This is why you need to specify how you want it, and mostly it seems to do fractional (e.g. 2.7 seconds) rather than absolute (e.g. 2s 700ms) for the various methods. I figured ms was the most accurate, but I'm willing to do this method instead to get something like 2.7s or 800.5s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make WAL load files in parallel on startup
4 participants