-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Significantly decrease startup times for WAL #25643
Conversation
5e83ae9
to
a32801b
Compare
Curious why you chose
i.e., they do not guarantee stability of the format across major versions. I know we have a bit of leeway with WAL files since they are only kept around for relatively short amount of time, but I think we would need to have some kind of versioning and upgrade process in place if we ever need to update our If any of the other |
Bitcode was the fastest according to these benchmarks which was the main draw. I think we can get around versioning issues by adding a version into the header that we check already to verify the wal is okay and then check which version to use to deserialize the code. We would need this also for breaking changes to the file format anyways if we wanted to do that and handle it without the user needing to know |
Seems like others that perform similarly in their benchmarks are a bit too obscure.
That is true. Also, looking at some others, e.g., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had one comment on the profile change. Otherwise, the only thing to mention is that it would be good to track an issue for versioning the WAL file in the header.
Cargo.toml
Outdated
@@ -182,6 +183,7 @@ inherits = "release" | |||
codegen-units = 16 | |||
lto = false | |||
incremental = true | |||
debug = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the quick-bench
below provides this profile, i.e., quick-release
+debug=1
, would you be okay to leave this out in favour of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah no problem I meant to take this out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, maybe one for the future - is there any zero copy alternatives that could be used for specifically serializing/deserializing wal?
@praveen-influx Flatbuffers would be the thing, but I've found those generally very difficult to work with and not usually worth the effort. But we could certainly prototype it at some point in the future. But I don't think it's worth doing before the alpha. |
@praveen-influx and @pauldix there's rkyv for zero copy which I've been keeping an eye on for awhile https://docs.rs/rkyv/latest/rkyv/ The developer of it did a talk at RustConf this year as well -> https://www.youtube.com/watch?v=ON4z2LbTD-4 |
@hiltontj looks like we already have the version in the header! No need for a new issue influxdb/influxdb3_wal/src/serialize.rs Lines 32 to 33 in b2f6c1a
|
This change adds a startup time counter to the output when starting up a server. The main purpose of this is to verify whether the impact of changes actually speeds up the loading of the server.
This commit does a few important things to speedup startup times: 1. We avoid changing an Arc<str> to a String with the series key as the From<String> impl will call with_column which will then turn it into an Arc<str> again. Instead we can just call `with_column` directly and pass in the iterator without also collecting into a Vec<String> 2. We switch to using bitcode as the serialization format for the WAL. This significantly reduces startup time as this format is faster to use instead of JSON, which was eating up massive amounts of time. Part of this change involves not using the tag feature of serde as it's currently not supported by bincode 3. We also parallelize reading and deserializing the WAL files before we then apply them in order. This reduces time waiting on IO and we eagerly evaluate each spawned task in order as much as possible. This gives us about a 189% speedup over what we were doing before. Closes #25534
a32801b
to
c9f4286
Compare
I'll merge once CI passes again as all I needed to do is remove the debug option in quick-release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment, but not a blocker
@@ -206,6 +212,9 @@ where | |||
let hybrid_make_service = hybrid(rest_service, grpc_service); | |||
|
|||
let addr = AddrIncoming::from_listener(server.listener)?; | |||
let timer_end = Instant::now(); | |||
let startup_time = timer_end.duration_since(startup_timer); | |||
info!("Server Startup Time: {}ms", startup_time.as_millis()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better to output this using duration's formatting? I think it'll do things like 1m32s...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pauldix it does not have a formatting available to it by default. It does not impl Display. This is why you need to specify how you want it, and mostly it seems to do fractional (e.g. 2.7 seconds) rather than absolute (e.g. 2s 700ms) for the various methods. I figured ms was the most accurate, but I'm willing to do this method instead to get something like 2.7s or 800.5s
feat: add startup time to logging output
This change adds a startup time counter to the output when starting up
a server. The main purpose of this is to verify whether the impact of
changes actually speeds up the loading of the server.
feat: Significantly decrease startup times for WAL
This commit does a few important things to speedup startup times:
From impl will call with_column which will then turn it into
an Arc again. Instead we can just call
with_column
directlyand pass in the iterator without also collecting into a Vec
This significantly reduces startup time as this format is faster to
use instead of JSON, which was eating up massive amounts of time.
Part of this change involves not using the tag feature of serde as
it's currently not supported by bincode
we then apply them in order. This reduces time waiting on IO and we
eagerly evaluate each spawned task in order as much as possible.
This gives us about a 189% speedup over what we were doing before.
Closes #25534