-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the medusa standalone pod #1304
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few comments where I'm unclear how the logic is supposed to work. I think there are a few problems in there, but I'd like to be mistaken.
More generally, is there a test for this functionality? I can see a lot of stuff removed from the tests, but a straight forward (ideally e2e) test that we can indeed retrieve a restore mapping would be nice to see.
if medusaClient, err := r.ClientFactory.NewClient(ctx, addr); err != nil { | ||
request.Log.Error(err, "Failed to create Medusa client", "address", addr) | ||
} else { | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: This logic doesn't appear to make very much sense. You're checking for an error, then logging the error, then checking for the error again and returning it?? What am I missing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: this all really needs a refactor too, e.g. why is this logger sitting on the request struct not the reconciler struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed the logger and the method signature 👍
return nil, err | ||
} | ||
for _, pod := range pods { | ||
addr := net.JoinHostPort(pod.Status.PodIP, fmt.Sprint(shared.BackupSidecarPort)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: I'm not sure how this is supposed to work, on the one hand the spec in the ticket calls for there to be logic which sequentially tries different pods until it finds one to whose Medusa container it can connect.
But that isn't what is happening here. If I'm not mistaken, you're going to error out if you can't connect to the first pod you try. Moreover, if the ordering of the pods is deterministic, then every time you hit this loop you'll obtain the same error, and you'll never progress to trying other sidecar containers in other pods which may be available.
If you have to do it this way (rather than having some real retry logic) at least consider randomising the order of the pods in this slice so that you're trying more than just the first one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first check of err != nil
in the else
clause needs to be removed for this to work.
Not sure how it ended up there, good catch 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds about right. Suggestion: I think you should have a test here and have a mock where the first pod fails and the logic has to move on to another pod. Otherwise we aren't checking that this actually meets the specification.
Quality Gate failedFailed conditions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with the required changes, could you do another round @Miles-Garnsey ?
return nil, err | ||
} | ||
for _, pod := range pods { | ||
addr := net.JoinHostPort(pod.Status.PodIP, fmt.Sprint(shared.BackupSidecarPort)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first check of err != nil
in the else
clause needs to be removed for this to work.
Not sure how it ended up there, good catch 👍
if medusaClient, err := r.ClientFactory.NewClient(ctx, addr); err != nil { | ||
request.Log.Error(err, "Failed to create Medusa client", "address", addr) | ||
} else { | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed the logger and the method signature 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more issues here, I think you might want to test more thoroughly, and also think more through when you want to return and when you want to move to the next pod.
return nil, err | ||
} | ||
for _, pod := range pods { | ||
addr := net.JoinHostPort(pod.Status.PodIP, fmt.Sprint(shared.BackupSidecarPort)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds about right. Suggestion: I think you should have a test here and have a mock where the first pod fails and the logic has to move on to another pod. Otherwise we aren't checking that this actually meets the specification.
if medusaClient, err := r.ClientFactory.NewClient(ctx, addr); err != nil { | ||
logger.Error(err, "Failed to create Medusa client", "address", addr) | ||
} else { | ||
restoreHostMap, err := medusa.GetHostMap(request.Datacenter, *request.RestoreJob, medusaClient, ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably another issue here. If GetHostMap fails due to network issues, you surely want to continue onto the next pod again don't you? Right now this returns...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're doing some refactoring later, so I'm approving this for now.
What this PR does:
Remove the medusa standalone pod which serves no real purpose today and can be traded for the medusa containers running in the Cassandra pods
Which issue(s) this PR fixes:
Fixes #1066
Checklist