-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RONDB-820: Added parallelism in invalidating node from LCP in table #610
Open
mronstro
wants to merge
9
commits into
logicalclocks:24.10-main
Choose a base branch
from
mronstro:RONDB-820
base: 24.10-main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mronstro
force-pushed
the
RONDB-820
branch
2 times, most recently
from
December 25, 2024 19:57
2f41a2c
to
888e999
Compare
…to 1 hour using new -X parameter
- Added debugging for understanding table handling during recovery - Added parallelism in invalidating node from LCP in table With support of hundreds of thousands of tables it is important to speed up handling of many tables by parallelising some parts of the node recovery. This patch parallelises the invalidation of a failed node from the table files. This phase happens as part of node recovery in the live node before the node is permitted to continue the start process. Added parallelism for remove node from table at NF This patch adds parallelism to the code that removes a failed node from the table files. This code is executed every time a node fails and parallelising it will decrease the wait for our node id to be ready for restart. - Enable remove of massive amounts of log output - Local optimisation of COPY_TABREQ handling - Simplified result of rondb_big test case - Added restart log message about multi transporter setup - Move back to 300 tables with 11 indexes in rondb_big.ndb_many_tables A major part of the node recovery is waiting for LCP, avoid delaying LCP when node recovery is ongoing. Removed delays in LCP processing when node recovery is ongoing, optimised writing tables into pages Ensured table was initialised Improved sending DIH metadata during NR During NR we read the metadata in the master DIH and copy it over to the starting node using the signal COPY_TABREQ. This was previously done one table at a time. Now one can parallelise this and send over 8 tables in parallel. The older nodes can handle parallel COPY_TABREQs as well, but only 4 in parallel. Don't start new COPY_TABREQ's when already enough outstanding Fixes to avoid having outstanding signals but already decreased outstanding counter Finishing loop in copyNodeLab is not finishing an outstanding request Added more debugging and jam around queued LCP write info in DBDIH Needed to loop until c_end_tab_queued Initialised variables controlling delay of LCPs Delayed decrement outstanding to ensure that we avoid race conditions Needed to track all outstanding signals to ensure we don't quit too early Fix for compiling on GCC 13 Missed call to unreservePages when no need of removing table from node, added a bit of debugging Fixed test case ndbinfo_plans Fixed compiler warning Added debugging around Pause LCP and LCP ongoing flag - Fixed problem with c_lcp_id_while_copy_meta_data With parallel copying of tables using COPY_TABREQ it is no longer ok to use a common variable c_lcp_id_while_copy_meta_data to keep track of the current LCP id for the COPY_TABREQ. By moving this variable to the table record we ensure that we can handle any parallelism change that might occur in the future. Disable debugging Fixed optimisation of check of LCP compleltion - Only print logs about individual fragments if full restart logs are enabled - Optimise checkSchemaStatus - More printouts on stages in restarts
… fixed compilation issue with number of fragments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With support of hundreds of thousands of tables it is important to speed up handling of many tables by parallelising some parts of the node recovery. This patch parallelises the invalidation of a failed node from the table files. This phase happens as part of node recovery in the live node before the node is permitted to continue the start process.