Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: Handle panics from agent requests #785

Merged
merged 1 commit into from
Feb 12, 2024

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented Feb 2, 2024

Closes #760.

AFAIK this hasn't been an issue in the past, but as we're trying to improve reliability, it's good to get this out of the way before it becomes an issue.

Note that this PR is quite minimal - expanding the existing tech debt we have around how the scheduler plugin handles HTTP requests. It's probably ok enough for now. I don't expect we'll be making too many changes to it in the near future. See also: #13.

Tested locally by forcing it to panic on every request:

diff --git a/pkg/plugin/run.go b/pkg/plugin/run.go index 007554a..6da7728 100644
--- a/pkg/plugin/run.go
+++ b/pkg/plugin/run.go
@@ -262,8 +262,10 @@ func (e *AutoscaleEnforcer) handleAgentRequest(
 		}
 	}

-	pod.vm.mostRecentComputeUnit = &e.state.conf.ComputeUnit
-	return &resp, 200, nil
+	panic(errors.New("test panic!")) +
+	// pod.vm.mostRecentComputeUnit = &e.state.conf.ComputeUnit
+	// return &resp, 200, nil }

 // getComputeUnitForResponse tries to return compute unit that the agent supports

The change appears to work as intended.

Closes #760.

AFAIK this hasn't been an issue in the past, but as we're trying to
improve reliability, it's good to get this out of the way before it
becomes an issue.

Note that this PR is quite minimal - expanding the existing tech debt we
have around how the scheduler plugin handles HTTP requests.
It's probably ok *enough* for now. I don't expect we'll be making too
many changes to it in the near future. See also: #13.

Tested locally by forcing it to panic on every request:

diff --git a/pkg/plugin/run.go b/pkg/plugin/run.go
index 007554a..6da7728 100644
--- a/pkg/plugin/run.go
+++ b/pkg/plugin/run.go
@@ -262,8 +262,10 @@ func (e *AutoscaleEnforcer) handleAgentRequest(
 		}
 	}

-	pod.vm.mostRecentComputeUnit = &e.state.conf.ComputeUnit
-	return &resp, 200, nil
+	panic(errors.New("test panic!"))
+
+	// pod.vm.mostRecentComputeUnit = &e.state.conf.ComputeUnit
+	// return &resp, 200, nil
 }

 // getComputeUnitForResponse tries to return compute unit that the agent supports

The change appears to work as intended.
logger.Error(msg, zap.String("error", fmt.Sprint(err)))
finalStatus = 500
w.WriteHeader(finalStatus)
_, _ = w.Write([]byte(msg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about sending not just msg here, but msg: err?

Copy link
Member Author

@sharnoff sharnoff Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah. My impression was that it was better from a security perspective to not include the full error message (especially considering it's already visible in the logs), but tbh maybe it doesn't matter.
wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see your concern. Although this is our internal API, so risk is not high.

But I guess, it can be left as is, it is easy to grep for logs. Maybe make the message more distinct, like "permit handler panicked" or else.

@areyou1or0 do you have an opinion on whether we should return the full error message, from catch-all exception handler?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Omrigan somehow I didn't get the notification for the mention, just seeing this now.
Yes, it's best practice to not reveal non-customised error message as this can disclose the details of underlying technology. Low severity but valid.

@sharnoff sharnoff merged commit d46b308 into main Feb 12, 2024
16 checks passed
@sharnoff sharnoff deleted the sharnoff/plugin-panic-handling branch February 12, 2024 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

plugin panic handling (convert panics to handled errors)
3 participants