RONDB-820: Added parallelism in invalidating node from LCP in table #610

mronstro · 2024-12-18T18:41:02Z

With support of hundreds of thousands of tables it is important to speed up handling of many tables by parallelising some parts of the node recovery. This patch parallelises the invalidation of a failed node from the table files. This phase happens as part of node recovery in the live node before the node is permitted to continue the start process.

…to 1 hour using new -X parameter

- Added debugging for understanding table handling during recovery - Added parallelism in invalidating node from LCP in table With support of hundreds of thousands of tables it is important to speed up handling of many tables by parallelising some parts of the node recovery. This patch parallelises the invalidation of a failed node from the table files. This phase happens as part of node recovery in the live node before the node is permitted to continue the start process. Added parallelism for remove node from table at NF This patch adds parallelism to the code that removes a failed node from the table files. This code is executed every time a node fails and parallelising it will decrease the wait for our node id to be ready for restart. - Enable remove of massive amounts of log output - Local optimisation of COPY_TABREQ handling - Simplified result of rondb_big test case - Added restart log message about multi transporter setup - Move back to 300 tables with 11 indexes in rondb_big.ndb_many_tables A major part of the node recovery is waiting for LCP, avoid delaying LCP when node recovery is ongoing. Removed delays in LCP processing when node recovery is ongoing, optimised writing tables into pages Ensured table was initialised Improved sending DIH metadata during NR During NR we read the metadata in the master DIH and copy it over to the starting node using the signal COPY_TABREQ. This was previously done one table at a time. Now one can parallelise this and send over 8 tables in parallel. The older nodes can handle parallel COPY_TABREQs as well, but only 4 in parallel. Don't start new COPY_TABREQ's when already enough outstanding Fixes to avoid having outstanding signals but already decreased outstanding counter Finishing loop in copyNodeLab is not finishing an outstanding request Added more debugging and jam around queued LCP write info in DBDIH Needed to loop until c_end_tab_queued Initialised variables controlling delay of LCPs Delayed decrement outstanding to ensure that we avoid race conditions Needed to track all outstanding signals to ensure we don't quit too early Fix for compiling on GCC 13 Missed call to unreservePages when no need of removing table from node, added a bit of debugging Fixed test case ndbinfo_plans Fixed compiler warning Added debugging around Pause LCP and LCP ongoing flag - Fixed problem with c_lcp_id_while_copy_meta_data With parallel copying of tables using COPY_TABREQ it is no longer ok to use a common variable c_lcp_id_while_copy_meta_data to keep track of the current LCP id for the COPY_TABREQ. By moving this variable to the table record we ensure that we can handle any parallelism change that might occur in the future. Disable debugging Fixed optimisation of check of LCP compleltion - Only print logs about individual fragments if full restart logs are enabled - Optimise checkSchemaStatus - More printouts on stages in restarts

… fixed compilation issue with number of fragments

mronstro requested a review from svenssonaxel December 18, 2024 18:41

mronstro force-pushed the RONDB-820 branch 2 times, most recently from 2f41a2c to 888e999 Compare December 25, 2024 19:57

mikaelronstrom added 6 commits December 27, 2024 00:30

Update result for ndb_tls.clusterj

e872a2f

RONDB-809: Extended timeout of stop and restart command from 45 mins …

90c6283

…to 1 hour using new -X parameter

RONDB-809: Fixed issues in extending timeout

38c9a5b

Try to solve timing issue in test case

d89da9b

Upped REDO log file size for safer tests

35f7198

Fix error in fix of ndb_rpl_circular_apply_status

20bcb18

mronstro force-pushed the RONDB-820 branch from d19e7db to 4ecd94c Compare December 26, 2024 23:38

mronstro force-pushed the RONDB-820 branch from 4811815 to 799eae5 Compare December 27, 2024 23:34

mikaelronstrom added 2 commits December 28, 2024 15:15

RONDB-820: Fix delay calculation for many fragments during LCP

f10c122

RONDB-780: Fix bug in write_attr instruction in DBTUP interpreter and…

674bef3

… fixed compilation issue with number of fragments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RONDB-820: Added parallelism in invalidating node from LCP in table #610

RONDB-820: Added parallelism in invalidating node from LCP in table #610

mronstro commented Dec 18, 2024

RONDB-820: Added parallelism in invalidating node from LCP in table #610

Are you sure you want to change the base?

RONDB-820: Added parallelism in invalidating node from LCP in table #610

Conversation

mronstro commented Dec 18, 2024