Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the node store writing perf #547

Merged
merged 5 commits into from
May 8, 2024

Conversation

jianoaix
Copy link
Contributor

@jianoaix jianoaix commented May 8, 2024

Why are these changes needed?

About half of the node store writing time was spent on unnecessary serialization of chunks.

See the "chunk serialization duration" v.s. "write batch duration", which are the two dominant components of db writing and are roughly the same. With this PR, the "chunk serialization duration" will become negligible.

Screenshot 2024-05-07 at 8 55 40 PM

Checks

  • I've made sure the lint is passing in this PR.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@pschork
Copy link
Contributor

pschork commented May 8, 2024

Nice optimization @jianoaix

Wrt Fiona, do you think that their problematic node had degraded IO that was amplified with redundant serialization?

Also, I feel like some of your profiling code that was removed in a65c410 (ie encode duration, serialization duration) could be useful as prometheus gauge in the operator's grafana dashboard for future debugging needs.

@jianoaix
Copy link
Contributor Author

jianoaix commented May 8, 2024

Nice optimization @jianoaix

Wrt Fiona, do you think that their problematic node had degraded IO that was amplified with redundant serialization?

Also, I feel like some of your profiling code that was removed in a65c410 (ie encode duration, serialization duration) could be useful as prometheus gauge in the operator's grafana dashboard for future debugging needs.

It is possible, but not certain. I think we could run another test after v0.7.0 (I'm waiting for the reachability check as an additional tool for this).
The metric was added separately so they were removed here to avoid conflict.

@jianoaix jianoaix merged commit ea558f7 into Layr-Labs:master May 8, 2024
6 checks passed
@jianoaix jianoaix mentioned this pull request May 8, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants