Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(resharding): fix Chain::get_shards_to_state_sync() #12617

Merged
merged 12 commits into from
Jan 6, 2025

Conversation

marcelo-gonzalez
Copy link
Contributor

@marcelo-gonzalez marcelo-gonzalez commented Dec 13, 2024

This function was left unimplemented for the resharding v3 case, so here we implement it. For each shard ID, we say that we want to state sync it if we'll be tracking it next epoch and don't currently track it in this or the previous epoch. If we don't track it in the current epoch but did track it in the previous, then there's no need to state sync, since we can just keep applying chunks to stay caught up.

This logic doesn't make reference to resharding since it should be the same in either case, but it fixes the bug where both state sync and the resharding code might believe they're responsible for generating the child shard state after a resharding if we're in this rare case where we track the parent in epoch T-1, don't track the child in T, and will track the child in T+1. In that case, start_resharding() will be called at the end of epoch T-1 and will initiate the child shard splitting (this is true even if we don't track either child in the current epoch). And then resharding will proceed as usual and we'll start applying chunks for the child when it's ready, and no state sync has to be involved.

This function was left unimplemented for the resharding v3 case,
so here we implement it. For each shard ID, if there hasn't been a resharding,
we say that we want to state sync it if we'll be tracking it next epoch and don't currently
track it. If there has been a resharding, we state sync it if the previous condition is true
and we aren't currently generating the shard itself as part of resharding.

Since this shards to state sync calculation is made when we preprocess the first block of an epoch,
there are 3 relevant epochs here: prev_epoch, current_epoch, next_epoch. When we process the last block of prev_epoch,
start_resharding() will be called and will initiate the child shard splitting if we were tracking the parent
of the split shards during prev_epoch (this is true even if we don't track either child in current_epoch). So if we aren't
tracking a child shard during current_epoch, we still actually don't want to state sync it if we were tracking the parent previously
@marcelo-gonzalez marcelo-gonzalez requested a review from a team as a code owner December 13, 2024 08:21
@marcelo-gonzalez
Copy link
Contributor Author

few things:

  • After landing fix(resharding): wait until child flat storages are split to take snapshots #12589, we can add a test that fails without this PR, where we can just set the epoch length to 10 in test_resharding_v3_shard_shuffling_slower_post_processing_tasks, which makes it fail because one of the nodes (account6) is tracking shard 6 (parent of split shards) in the first 2 epochs, then switches to tracking shard 3 in the next epoch (but will still do the resharding), then wants to track shard 7 in the epoch after that.
  • In theory we could just have such a node not do the resharding and do state sync instead, but it feels kind of silly. We have the parent state, so might as well calculate the state of child we know we're going to track in the future
  • On that note, we prob actually shouldn't do the resharding if we know we're not going to track the children next or next next epoch. And maybe same with flat storage and memtries? Basically on the last block of the epoch, do we need to do any of this stuff if we're not going to track any involved shards in the future?
  • do we have some canonical way of asking "is this shard the child of a resharding?" In this PR thats what im basically trying to get w/ the stuff at the end of should_catch_up_shard()

Copy link

codecov bot commented Dec 13, 2024

Codecov Report

Attention: Patch coverage is 86.11111% with 10 lines in your changes missing coverage. Please review.

Project coverage is 70.38%. Comparing base (445ee6f) to head (4f6067b).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
chain/chain/src/chain.rs 86.11% 2 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12617      +/-   ##
==========================================
- Coverage   70.53%   70.38%   -0.15%     
==========================================
  Files         847      847              
  Lines      172839   172855      +16     
  Branches   172839   172855      +16     
==========================================
- Hits       121904   121656     -248     
- Misses      45833    46116     +283     
+ Partials     5102     5083      -19     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (-0.01%) ⬇️
genesis-check ?
linux 69.12% <83.33%> (-0.10%) ⬇️
linux-nightly 70.14% <86.11%> (+0.02%) ⬆️
pytests 0.16% <0.00%> (-1.51%) ⬇️
sanity-checks ?
unittests 70.38% <86.11%> (+0.01%) ⬆️
upgradability ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because I think it can be simplified, let me know if I'm missing something.

chain/chain/src/chain.rs Show resolved Hide resolved
@@ -2424,56 +2426,69 @@ impl Chain {
shard_tracker: &ShardTracker,
me: &Option<AccountId>,
parent_hash: &CryptoHash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<unrelated rant> Why do we have two names - prev and parent - for the same thing? 😢 </unrelated rant>

shard_tracker.will_care_about_shard(me.as_ref(), parent_hash, shard_id, true);
let does_care_about_shard =
shard_tracker.care_about_shard(me.as_ref(), parent_hash, shard_id, true);
) -> Result<bool, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the old condition is no longer relevant but the logic here seems too complicated.

In V2 we needed to get a second copy of the parent, even the node tracks it, to perform the preprocessing.
will_care_about_shard && (will_shard_layout_change || !does_care_about_shard)

In V3 this should be needed so it feels like the following should be sufficient:
will_care_about_shard && !does_care_about_shard

Please keep in mind that will_care_about_shard is resharding aware. We should aim to keep the resharding specific part of this logic in one place if possible.

Copy link
Contributor Author

@marcelo-gonzalez marcelo-gonzalez Dec 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In V3 this should be needed so it feels like the following should be sufficient: will_care_about_shard && !does_care_about_shard

So, relative to the current PR, something like:

diff --git a/chain/chain/src/chain.rs b/chain/chain/src/chain.rs
index 931f711f8..ac6b3d650 100644
--- a/chain/chain/src/chain.rs
+++ b/chain/chain/src/chain.rs
@@ -2465,32 +2465,7 @@ impl Chain {
         if shard_tracker.care_about_shard(me.as_ref(), parent_hash, shard_id, true) {
             return Ok(false);
         }
-        // Now we need to state sync it unless this is a post-resharding child shard whose parent we were tracking in the
-        // previous epoch, in which case we don't need to state sync because we'll generate the child when we do the resharding
-
-        let shard_layout = epoch_manager.get_shard_layout(epoch_id)?;
-        let prev_epoch_id = epoch_manager.get_prev_epoch_id_from_prev_block(parent_hash)?;
-        let prev_shard_layout = epoch_manager.get_shard_layout(&prev_epoch_id)?;
-
-        let resharded = shard_layout != prev_shard_layout;
-        if !resharded {
-            return Ok(true);
-        }
-        let Some(parent_shard_id) = shard_layout.try_get_parent_shard_id(shard_id)? else {
-            return Ok(true);
-        };
-        let was_split = parent_shard_id != shard_id;
-        if !was_split {
-            return Ok(true);
-        }
-
-        // Note that here passing `prev_prev_hash` to care_about_shard() will have us check whether we were tracking it in
-        // the previous epoch, because the current block is the first block of an epoch, so prev_prev_hash is the "parent_hash"
-        // of the last block of the previous epoch. TODO: consider refactoring these ShardTracker functions to accept an epoch_id
-        // to make this less tricky.
-        let splitting_child =
-            shard_tracker.care_about_shard(me.as_ref(), prev_prev_hash, parent_shard_id, true);
-        Ok(!splitting_child)
+        Ok(true)
     }
 
     /// Check if any block with missing chunk is ready to be processed and start processing these blocks

Now that the other PR is merged, we can try the modification to test_resharding_v3_shard_shuffling_slower_post_processing_tasks that I mentioned caused a crash:

diff --git a/integration-tests/src/test_loop/tests/resharding_v3.rs b/integration-tests/src/test_loop/tests/resharding_v3.rs
index c04d89ba6..457db7723 100644
--- a/integration-tests/src/test_loop/tests/resharding_v3.rs
+++ b/integration-tests/src/test_loop/tests/resharding_v3.rs
@@ -985,6 +985,7 @@ fn test_resharding_v3_shard_shuffling_slower_post_processing_tasks() {
         .track_all_shards(false)
         .all_chunks_expected(false)
         .delay_flat_state_resharding(2)
+        .epoch_length(10)
         .build();
     test_resharding_v3_base(params);
 }

Now unfortunately this passes because something about the order of the test loop task processing for state sync and resharding catchup has changed between when I wrote that and the current head of master. Without trying to figure it out, we can just hack it to run in the same order as before:

diff --git a/chain/chain/src/flat_storage_resharder.rs b/chain/chain/src/flat_storage_resharder.rs
index 83bd481f3..a401c51ce 100644
--- a/chain/chain/src/flat_storage_resharder.rs
+++ b/chain/chain/src/flat_storage_resharder.rs
@@ -609,6 +609,16 @@ impl FlatStorageResharder {
         if self.controller.is_cancelled() {
             return FlatStorageReshardingTaskResult::Cancelled;
         }
+        let head = chain_store.head().unwrap();
+        let Some(sync_hash) = chain_store.get_current_epoch_sync_hash(&head.epoch_id).unwrap() else {
+            return FlatStorageReshardingTaskResult::Postponed;
+        };
+        let key = borsh::to_vec(&near_primitives::state_sync::StatePartKey(sync_hash, shard_uid.shard_id(), 0)).unwrap();
+        let part = chain_store.store().get(near_store::DBCol::StateParts, &key).unwrap();
+        if !part.is_some() {
+            return FlatStorageReshardingTaskResult::Postponed;
+        }
+
         info!(target: "resharding", ?shard_uid, "flat storage shard catchup task started");
         let metrics = FlatStorageReshardingShardCatchUpMetrics::new(&shard_uid);
         // Apply deltas and then create the flat storage.

Then (with the above epoch length change), run it and you get:

thread 'test_loop::tests::resharding_v3::test_resharding_v3_shard_shuffling_slower_post_processing_tasks' panicked at chain/chain/src/resharding/resharding_actor.rs:93:17:
impossible to recover from a flat storage shard catchup failure!

with this error message at the end of the logs:

 5.661s ERROR resharding: flat storage shard catchup delta application failed! shard_uid=s7.v3 err=Other("unexpected resharding catchup flat storage status for s7.v3: Ready(FlatStorageReadyStatus { flat_head: BlockInfo { hash: Gposh9D271WMUy3wt4Re1pXtFViNEdiMEgWyjm8eoYHH, height: 24, prev_hash: 7d7uMfUA8dXHjGcixM88gVVnt7jaak8N5y7gfv7C58af } })")

What's happening here is the problem with account6 I mentioned in the other comment. It's not tracking the child in the current epoch, will track it in the next, but is performing the split because it tracked the parent in the previous epoch.

Please keep in mind that will_care_about_shard is resharding aware. We should aim to keep the resharding specific part of this logic in one place if possible.

Yeah it's a bit messy... maybe we should split somethiing out of this?

@wacban
Copy link
Contributor

wacban commented Dec 19, 2024

will initiate the child shard splitting if we were tracking the parent of the split shards during prev_epoch (this is true even if we don't track either child in current_epoch). So if we aren't tracking a child shard during current_epoch, we still actually don't want to state sync it if we were tracking the parent previously

Just for clarity I think you're talking about the following scenario:
T-1 - prev - tracking parent
T+0 - curr - tracking neither of the children
T+1 - next - tracking one or both of the children

That is an interesting case to think about and same even without resharding actually.

The two options we have are:

  1. Discard the state from T-1 and state sync for T+1 as usual. This is the same as if the "hole" of not tracking was longer than 1 epoch. In this case we don't look back and only ever consider curr and next epochs.
  2. Reuse the state that we have from T-1. It feels a bit risky because, as you said, we need to look at prev epoch as well. This is less typical of a pattern in our code and it may potentially lead to some subtle issues imo.

What is your intuition about how those compare? Approach 2 is what you've implemented here. What would it take to make 1) work? Are there any drawbacks to this approach that I overlooked? Which approach would have simpler and cleaner code?

@marcelo-gonzalez
Copy link
Contributor Author

That is an interesting case to think about and same even without resharding actually.

Wait that is a good point, actually didn't even realize that. What do you think about just changing this to not care about the resharding case specifically, but just in general checking whether we were tracking the parent (usually the same shard ID) in the prev epoch. It gets a lot simpler that way and I think it just makes more sense (see below). something like this relative to the current PR:

diff --git a/chain/chain/src/chain.rs b/chain/chain/src/chain.rs
index f712c6081..b2c0c5abb 100644
--- a/chain/chain/src/chain.rs
+++ b/chain/chain/src/chain.rs
@@ -2472,28 +2472,17 @@ impl Chain {
         // previous epoch, in which case we don't need to state sync because we'll generate the child when we do the resharding
 
         let shard_layout = epoch_manager.get_shard_layout(epoch_id)?;
-        let prev_epoch_id = epoch_manager.get_prev_epoch_id_from_prev_block(parent_hash)?;
-        let prev_shard_layout = epoch_manager.get_shard_layout(&prev_epoch_id)?;
-
-        let resharded = shard_layout != prev_shard_layout;
-        if !resharded {
-            return Ok(true);
-        }
         let Some(parent_shard_id) = shard_layout.try_get_parent_shard_id(shard_id)? else {
             return Ok(true);
         };
-        let was_split = parent_shard_id != shard_id;
-        if !was_split {
-            return Ok(true);
-        }
 
         // Note that here passing `prev_prev_hash` to care_about_shard() will have us check whether we were tracking it in
         // the previous epoch, because the current block is the first block of an epoch, so prev_prev_hash is the "parent_hash"
         // of the last block of the previous epoch. TODO: consider refactoring these ShardTracker functions to accept an epoch_id
         // to make this less tricky.
-        let splitting_child =
+        let cared_before =
             shard_tracker.care_about_shard(me.as_ref(), prev_prev_hash, parent_shard_id, true);
-        Ok(!splitting_child)
+        Ok(!cared_before)
     }
 
     /// Check if any block with missing chunk is ready to be processed and start processing these blocks

What would it take to make 1) work?

I could be missing something, but I think really all we would have to do would be to fix the inconsistency between the state sync and the resharding code, since we don't want both of them to believe they should be the one setting the child shard's state. The fact that 1) is basically already implemented is sort of why this crash happens at all

What is your intuition about how those compare?
...
Are there any drawbacks to this approach that I overlooked?

Yeah, in my opinion it's a bit strange to discard state we know we're going to want, then ask others for that state, then proceed as we would have if we had never discarded it (i.e. apply chunks for that shard again post-state sync).

So suppose we're in this situation where we're going to track a shard in T+1 that we don't track in T but whose parent we tracked in T-1. And suppose we're taking this approach of discarding the state from T-1 and state syncing for T+1 as usual. It's not even really super relevant whether there was a resharding, as you pointed out. In either case, when we see the first block of epoch T and notice that we're going to be tracking the shard in T+1, we start state syncing for it. We also stop applying chunks until state sync is finished. Then when state sync is done, we begin applying chunks again so that we're all caught up by the time T+1 starts.

So there are three points of interest here:

  1. start of T when we stop requesting/applying chunks for the shard
  2. state finalization after state sync is finished, after which we start applying chunks again.
  3. Start of epoch T+1, when we must be applying chunks

Suppose we're on a localnet with small state and, say, an epoch length of 100, and that the starting block heights are T-1: 100, T: 200, T+1: 300. Then concretely, what we're doing is this: Apply chunks from blocks 100 to 199 since we track the shard. At height 200, stop applying chunks and start state sync. State sync is super fast, done by block 202. Then apply chunks for blocks 202 and onward.

On mainnet the picture is similar except that instead of not applying chunks for 2 blocks in favor of asking our peers to give us the state, we stop applying chunks for an hour or so.

Feels to me like we don't even gain a whole lot by intentionally ceasing to apply state updates just to start doing it again an hour later. Like, we have the state and the means to keep it updated, and we know we'll need it to be updated in 12 hours, but we decide to stop updating it ourselves for one small part of the beginning of the epoch? It feels like sort of bad behavior to place a burden on your peers and ask them to provide state parts (and in the case of resharding, perform it) for you when you could have just been applying chunks yourself for that hour or so. Not to mention that it also introduces this dependency on state sync in a case where it shouldn't have been necessary for the node to keep up.

Which approach would have simpler and cleaner code?

Actually I think maybe 2! Especially with the diff I posted above since we actually dont rly care about whether there's a reasharding or not. If we go with 1), we still have to figure out how to not perform the resharding if state sync is going to take care of downloading the child shard's state

This is less typical of a pattern in our code and it may potentially lead to some subtle issues imo.

Yea this is clear here in the usage of the ShardTracker functions, where the prev_hash/prev_prev_hash arguments feel very error prone. I think a refactor there in the future to make it more foolproof would be nice

@wacban
Copy link
Contributor

wacban commented Jan 3, 2025

Feels to me like we don't even gain a whole lot by intentionally ceasing to apply state updates just to start doing it again an hour later.

I totally agree that 2) is the less wasteful implementation in terms of actual resources. What I'm asking is whether it's premature optimization, and if there is any tradeoff in terms of code simplicity and robustness that we have to consider. If 2) is as easy and nice (or easier and nicer) than 1) then by all means we should do it.

Which approach would have simpler and cleaner code?

Actually I think maybe 2

Awesome, let's do it then, please update the PR and re-request review and I'll have a look. I still think it's less future proof but there isn't any reason not to do it now. We can always fix it later if we actually find any issues.

@marcelo-gonzalez
Copy link
Contributor Author

Updated to get rid of the logic related to resharding, as discussed above and updated the PR description. Something to pay attention to is that in the code that says: let parent_shard_id = if shard_layout == prev_shard_layout { ... } else { ... };, I'm trying to get the parent, whether there was a resharding or not. Just whatever the predecessor was in the prev epoch. Feels like there are maybe a few other places where we want this logic? Let me know if this is correct (the parent is the same if the shard layouts are the same, otherwise get_parent_shard_id() had better return Ok()).

In addition to the integration tests, I ran this:

diff --git a/pytest/tests/sanity/transactions.py b/pytest/tests/sanity/transactions.py
index bdc72f68f..05de6f799 100755
--- a/pytest/tests/sanity/transactions.py
+++ b/pytest/tests/sanity/transactions.py
@@ -24,8 +24,8 @@ nodes = start_cluster(
     config=None,
     extra_state_dumper=True,
     genesis_config_changes=[["min_gas_price",
-                             0], ["max_inflation_rate", [0, 1]],
-                            ["epoch_length", 10],
+                             0], ["max_inflation_rate", [0, 1]], ['shuffle_shard_assignment_for_chunk_producers', True],
+                            ["epoch_length", 20],
                             ["block_producer_kickout_threshold", 70]],
     client_config_changes={
         0: {
@@ -146,16 +146,16 @@ for height, hash in utils.poll_blocks(nodes[4], timeout=TIMEOUT):
         last_balances = [x for x in ctx.expected_balances]
         ctx.send_moar_txs(hash, 10, use_routing=True)
         sent_height = height
-    else:
-        assert height <= sent_height + 10, ('Balances before: {before}\n'
-                                            'Expected balances: {expected}\n'
-                                            'Current balances: {current}\n'
-                                            'Sent at height: {sent_at}\n'
-                                            'Current height: {height}').format(
-                                                before=last_balances,
-                                                expected=ctx.expected_balances,
-                                                current=ctx.get_balances(),
-                                                sent_at=sent_height,
-                                                height=height)
-    if height >= 100:
+    # else:
+    #     assert height <= sent_height + 10, ('Balances before: {before}\n'
+    #                                         'Expected balances: {expected}\n'
+    #                                         'Current balances: {current}\n'
+    #                                         'Sent at height: {sent_at}\n'
+    #                                         'Current height: {height}').format(
+    #                                             before=last_balances,
+    #                                             expected=ctx.expected_balances,
+    #                                             current=ctx.get_balances(),
+    #                                             sent_at=sent_height,
+    #                                             height=height)
+    if height >= 150:
         break

which ends up being a decent sanity check because test2 goes thru this sequence of tracked shards for each epoch: 2 -> 2 -> 1 -> 2 -> 1 -> 0 -> 1 -> 3 -> 3

Copy link
Contributor

@wacban wacban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Can you add an integration test for the interesting case? I think @staffik added some tools to configure with what shards it should track in what epoch, in the resharding testloop.

Comment on lines 2389 to 2391
if self.epoch_manager.is_next_block_epoch_start(prev_hash)? {
debug!(target: "chain", %block_hash, "block is the first block of an epoch");
if !self.prev_block_is_caught_up(prev_prev_hash, prev_hash)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini nit: Maybe consider inverting the logic and early returns instead of elses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

debug!(target: "chain", block_hash=?header.hash(), "block is the first block of an epoch");
if !self.prev_block_is_caught_up(&prev_prev_hash, &prev_hash)? {
if self.epoch_manager.is_next_block_epoch_start(prev_hash)? {
debug!(target: "chain", %block_hash, "block is the first block of an epoch");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini nit: The log is not wrong but the message is a bit out of place. It's very generic but it's logged from within get_catchup_and_state_sync_infos. Maybe log something like get_catchup_and_state_sync_infos

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it to the bottom to log what shards were gonna state sync

Comment on lines +2437 to +2438
let mut shards_to_sync = Vec::new();
for shard_id in epoch_manager.shard_ids(&epoch_id)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to for loop

Comment on lines 2455 to 2456
/// beginning after the block `epoch_last_block`. If that epoch is epoch T, the logic is: (will
/// track the shard in epoch T+1) && (not tracking it in T) && (didn't track it in T-1).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you multiline the condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 2483 to 2490
let shard_layout = epoch_manager.get_shard_layout(epoch_id)?;
let prev_epoch_id = epoch_manager.get_epoch_id(epoch_last_block)?;
let prev_shard_layout = epoch_manager.get_shard_layout(&prev_epoch_id)?;
let parent_shard_id = if shard_layout == prev_shard_layout {
shard_id
} else {
shard_layout.get_parent_shard_id(shard_id)?
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a method called something like get_prev_shard_id that's used when going back chain and collecting receipts. It should do this for you if my memory serves me well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good call, I didn't notice that. Looks like that API is a little more general than what we want here, since we're just interested in one shard ID, and it's more direct to only deal with that one than looking thru the Vec return value. Feels a little error-prone and like it adds some burden on the caller to check that the Vec only has one element. I'll follow up with another PR adding/modifying the API for that simpler case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need just one shard get_prev_shard_id_from_prev_hash should do the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh true. embarrassing I missed that function right below the other one :D

@staffik
Copy link
Contributor

staffik commented Jan 6, 2025

LGTM Can you add an integration test for the interesting case? I think @staffik added some tools to configure with what shards it should track in what epoch, in the resharding testloop.

Yes, you can find it in test_resharding_v3_state_cleanup. Although that does not change chunk producer assignment (on the protocol level) I think it still would allow to test this case. Especially, please test if memtrie would work correctly (are they unloaded if we do not track a shard for just 1 epoch?).

@marcelo-gonzalez
Copy link
Contributor Author

Can you add an integration test for the interesting case?

added test_resharding_v3_shard_shuffling_untrack_then_track. This test passes on master because of what I said above where the order of events is not what it used to be, but fails with impossible to recover from a flat storage shard catchup failure! with this diff (modified from the one I posted above which is not quite right. still hacky but this is more correct):

diff --git a/chain/chain/src/flat_storage_resharder.rs b/chain/chain/src/flat_storage_resharder.rs
index 83a943411..14cfc013b 100644
--- a/chain/chain/src/flat_storage_resharder.rs
+++ b/chain/chain/src/flat_storage_resharder.rs
@@ -597,6 +597,23 @@ impl FlatStorageResharder {
         Ok(iter)
     }
 
+    fn state_sync_first(&self, shard_uid: ShardUId, chain_store: &ChainStore) -> bool {
+        let head = chain_store.head().unwrap();
+        let mut syncing = false;
+        for _ in chain_store.store().iter(near_store::DBCol::StateDlInfos) {
+            syncing = true;
+        }
+        if !syncing {
+            return false;
+        }
+        let Some(sync_hash) = chain_store.get_current_epoch_sync_hash(&head.epoch_id).unwrap() else {
+            return true;
+        };
+        let key = borsh::to_vec(&near_primitives::state_sync::StatePartKey(sync_hash, shard_uid.shard_id(), 0)).unwrap();
+        let part = chain_store.store().get(near_store::DBCol::StateParts, &key).unwrap();
+        !part.is_some()
+    }
+
     /// Task to perform catchup and creation of a flat storage shard spawned from a previous
     /// resharding operation. May be a long operation time-wise. This task can't be cancelled
     /// nor postponed.
@@ -609,6 +626,12 @@ impl FlatStorageResharder {
         if self.controller.is_cancelled() {
             return FlatStorageReshardingTaskResult::Cancelled;
         }
+
+        if self.state_sync_first(shard_uid, chain_store) {
+            tracing::info!("catchup wait for state sync");
+            return FlatStorageReshardingTaskResult::Postponed;
+        }
+
         info!(target: "resharding", ?shard_uid, "flat storage shard catchup task started");
         let metrics = FlatStorageReshardingShardCatchUpMetrics::new(&shard_uid);
         // Apply deltas and then create the flat storage.

On top of this PR, the test still fails, but with a different error: No trie comparison checks made for account account0 epoch FxXWgXiBzsnj58G8Bd72EYmiQghnfjSYub7LZ5WSDU8W shard s8.v3. Not sure if that's because this hacky thing is not quite right, but will investigate

@marcelo-gonzalez
Copy link
Contributor Author

On top of this PR, the test still fails, but with a different error: No trie comparison checks made for account account0 epoch FxXWgXiBzsnj58G8Bd72EYmiQghnfjSYub7LZ5WSDU8W shard s8.v3. Not sure if that's because this hacky thing is not quite right, but will investigate

Yeah it's not a real bug, that diff postpones it if theres any state sync going on, and account0 state sync shard 3 during the post-resharding epoch, so it postpones it unnecessarily and then isnt finished till too late

@marcelo-gonzalez
Copy link
Contributor Author

Especially, please test if memtrie would work correctly (are they unloaded if we do not track a shard for just 1 epoch?).

Yeah I think it works here because we only unload it if we're not tracking it both in the current and the next epoch:

self.runtime_adapter.get_tries().retain_mem_tries(&shards_cares_this_or_next_epoch);

@marcelo-gonzalez marcelo-gonzalez added this pull request to the merge queue Jan 6, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 6, 2025
@marcelo-gonzalez marcelo-gonzalez added this pull request to the merge queue Jan 6, 2025
Merged via the queue into near:master with commit 2d9de9b Jan 6, 2025
27 of 28 checks passed
@marcelo-gonzalez marcelo-gonzalez deleted the resharding-sync-shards branch January 6, 2025 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants