Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FileCopy Protocol] Bootstrap Controller #2974

Merged
merged 10 commits into from
Dec 20, 2024
Merged

Conversation

DevenAhluwalia
Copy link
Contributor

@DevenAhluwalia DevenAhluwalia commented Dec 19, 2024

This PR implements this design doc :- https://docs.google.com/document/d/1u57C0BU8oMMuMHC6Fh_B-6R612EaQb2AxtnBDlN6sAs/edit?tab=t.0

Change log :-

  1. Base init for BootstrapController
  2. BootstrapController registered as a StatePartitionlistner
  3. Test cases for BootstrapController
  4. Some utils like isAnyLogSegmentExists

(cherry picked from commit 1ce57b3)
(cherry picked from commit e991679)
@DevenAhluwalia DevenAhluwalia marked this pull request as ready for review December 20, 2024 06:19
listenerToInvoke.onPartitionBecomeBootstrapFromOffline(partitionName);
if (listenerToInvoke == fileCopyManagerListener) {
try {
storeManager.getPrimaryClusterParticipant().getReplicaSyncUpManager().waitForFileCopyCompleted(partitionName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass replica synup manager object to this class directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

@DevenAhluwalia DevenAhluwalia Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replicaSyncUpManager is a prop of HelixParticipant class.
And we need to invoke replicaSyncUpManager. waitForFileCopyCompleted() for the storeManager's ClusterParticipant.
How can we pass replicaSyncUpManager object to BootstrapController class?

Update; as part of another comment, Iv currently removed getPrimaryClusterParticipant from StoreManager interface. I am now using the cluster-participant sent from AmbryServer and I am using cluster-participant.getReplicaSyncUpManager() in the latest diff.

// An optimisation could be explored to only delete incomplete datasets
// More about this can be found in this comment :-
// https://docs.google.com/document/d/1u57C0BU8oMMuMHC6Fh_B-6R612EaQb2AxtnBDlN6sAs/edit?disco=AAABZyj2rHo
storeManager.removeBlobStore(partitionId);
Copy link
Contributor

@aga9900 aga9900 Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot delete the directory in this case. This will break complete flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is FC.Incomplete -> R case as we had discussed in this doc :- https://docs.google.com/document/d/1u57C0BU8oMMuMHC6Fh_B-6R612EaQb2AxtnBDlN6sAs/edit?tab=t.0#heading=h.seiobqi3qdu3

If we don't remove the BlobStore, should we then instead explicitly delete FC_files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @aga9900
We will need to evaluate the impact of removeBlobStore call here.
Added a TODO.

Copy link
Contributor

@aga9900 aga9900 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass of review.

StateModelListenerType.BootstrapControllerListener, new BootstrapControllerImpl());

Map<StateModelListenerType, PartitionStateChangeListener> partitionStateChangeListeners =
storeManager.getPrimaryClusterParticipant().getPartitionStateChangeListeners();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add null handling for storeManager.getPrimaryClusterParticipant()

Copy link
Contributor Author

@DevenAhluwalia DevenAhluwalia Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of another comment, Iv currently removed getPrimaryClusterParticipant from StoreManager interface. I am now using the cluster-participant sent from AmbryServer.

Have added another generic-exception catch block anyway and we now throw StateTransitionException with type BootstrapControllerFailure.

* https://docs.google.com/document/d/1u57C0BU8oMMuMHC6Fh_B-6R612EaQb2AxtnBDlN6sAs
* The decision branches can be better understood with the doc.
*/
if (null == replica) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: replica==null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storeManager.removeBlobStore(partitionId);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant newline at the EoF?

}

@Override
public FileVisitResult postVisitDirectory(Path dir, IOException exc) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method?

@@ -449,6 +450,22 @@ public boolean controlCompactionForBlobStore(PartitionId id, boolean enabled) {
return diskManager != null && diskManager.controlCompactionForBlobStore(id, enabled);
}

@Override
public ClusterParticipant getPrimaryClusterParticipant() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass cluster participant object to Bootstrap Controller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
While AmbryServer is passing cluster-participant while initiating BootstrapController, it passed clusterParticipants to StorageManager and a lot of other classes.
Then, StorageManager extracts primaryClusterParticipant exactly the same way as AmbryServer does.

Looks like, it should be fine to just use the primaryClusterParticipant from AmbryServer directly.

Change incorporated.



// Permits Replica to be constructed with a null Partition
public class TestReplica extends Replica {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Info] Needed this to be able to mock this from BootstrapController tests.

@@ -0,0 +1,73 @@
package com.github.ambry.store;

public enum ReplicationProtocolTransitionType {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Info] Proposing this way to thoroughly test BootstrapController's logic.
Tests are making use of these enum values and asserting if the logic takes an unexpected path.

Copy link
Collaborator

@justinlin-linkedin justinlin-linkedin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor stuff.

Copy link
Contributor

@aga9900 aga9900 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DevenAhluwalia DevenAhluwalia merged commit 31f2dc3 into master Dec 20, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants