-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of out of order messages from direct connected hosts #305
Improve handling of out of order messages from direct connected hosts #305
Conversation
/retest |
7ef4b15
to
2e0501c
Compare
func createRecord(ctx context.Context, tx *gorm.DB, toCreate []db.RunHost) error { | ||
|
||
successOrFailure := clause.OrConditions{Exprs: []clause.Expression{ | ||
clause.Eq{Column: "run_hosts.status", Value: db.RunStatusSuccess}, | ||
clause.Eq{Column: "run_hosts.status", Value: db.RunStatusFailure}, | ||
}} | ||
|
||
notMarkedAsComplete := clause.Where{Exprs: []clause.Expression{clause.Not(successOrFailure)}} | ||
|
||
createResult := tx.Model(db.RunHost{}). | ||
Clauses(clause.OnConflict{ | ||
Where: notMarkedAsComplete, | ||
Columns: []clause.Column{{Name: "run_id"}, {Name: "host"}}, | ||
DoUpdates: clause.AssignmentColumns([]string{"status", "log"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dehort Looking good! Any idea if this notMarkedAsComplete
check might be redundant, since we are checking for it through the where clause in the main function?
Also, any idea if we need to do the same for satellite requests? I think satellite payloads have a sat_sequence
number, which kind of protects us against out of order messages, but might be good to double check. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the notMarkedAsComplete
where clause is needed here because we are only passing the transaction object to the createRecord()
method. We don't actually pass the query object that we built in the calling function.
Yeah, the sat_sequence
field protect us from out of order messages at the host level. But... I think the change in this PR protects us from marking a top level run as complete
or failed
and then marking it as running
if we received a satellite message out of order. I added another test for this.
Thank you, @tahmidefaz !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thank you!
2e0501c
to
50b9ba3
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
What?
Improve handling of out of order messages from direct connected hosts
RHCLOUD-28028
Why?
It is possible that the response consumer receives a run complete message (success or failure) before receiving a
run incomplete
message. When this happens, then run will go from having a status ofsuccess
orfailure
(run complete) to having a status ofrunning
. The job will eventually timeout and data (run output, status complete, etc) is lost.How?
Describe how the change is implemented. Any noteable new libaries, APIs, or features.
Testing
A couple of unit tests were added.
Anything Else?
Any other notes about the PR that would be useful for the reviewer.
Secure Coding Practices Checklist Link
Secure Coding Checklist