-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How does this work with startup, readiness and liveness probes? #34
Comments
Thank you for checking it out!
Yes, this is exactly what happens currently. The probes are being sent by kubelet so they wake up the process. As Zeropod is redirecting the port of the app when scaled down, probes could be intercepted and replied but the tricky part is that now the shim needs to be aware of very specific pod configuration to know how to respond to these probes. The current way to pass configuration via annotations could also be used for this but I fear it would get pretty hairy for more complex probes. Additionally you would be mostly checking that the shim is still running while the app is scaled down and containerd already has some ways to ensure that. So that's why it hasn't really been the top priority so far.
Yeah I also had similar thoughts and already did some prototyping with this a while back. Would be pretty cool but it would also expand the scope of the project quite a bit 😄 |
Thanks for the reply. Is there an example of intercepting the kubelet traffic and responding to the probes from kubelet? If not, and you have pointers, I could try to contribute if i am able to get it working. |
So all traffic that goes to the container(s) that Zeropod manages is being intercepted (redirected) already in scaled down state. Zeropod never drops any traffic to an app but it will "delay" it if the app is scaled down. You can see how that works in the activation sequence. I gave this some more thought and I think first we would need to define what probes are useful to us.
If the goal is to just be compatible with existing Pod manifests that define readiness/liveness probes and essentially fake them to be successful, I'm not really sure that's worth implementing. |
I understand. However the usefulness imo would be reduced as most apps would be configured with liveness and readiness probes in sub mins which sort of defeats the purpose of using zeropod. Would you be comfortable for users to configure the behavior to fake the responses from k8s probes (we could use user-agents set in requests ) to determine internal vs external traffic and fake the response by using the last reponse sent? could it be a configuration here ? |
The problem is that Zeropod needs to be aware of the probes that are configured in order to reply to them as expected. As already mentioned, we don't have access to the Pod spec, so the user would still need to change the Pod to tell Zeropod how to reply to their probes. This means adding a bunch of annotations to the Pod. At that point, wouldn't it be just easier to adjust the probes themselves?
We could probably support HTTP probes with this (detected by the user agent) but for raw TCP it would be way harder to differentiate from normal traffic. It will mean that the activator needs to support HTTP when a probe is configured. Currently the activator is just a TCP proxy so HTTP support would need to be added while making sure to not break the existing functionality of waking raw TCP apps. In addition, I'd like to keep the shim as slim as possible, I already had some issues in the past where memory usage exploded by just importing some simple things. So that's another reason I'm a bit careful to add more functionality to the shim itself. |
What if we have the probes hitting the configured ports? Wouldnt this sort of simulate traffic and hence keep the pod up at all times? If this could work bypassing the probe checks, would be great!
Also is the checkpoint saved in the same node, what if when there it tries to come up due to resource contention is scheduled onto a different node? Can the checkpointed image be pushed to say ECR so that it can be pulled into any node?
Btw, great project and great idea!
The text was updated successfully, but these errors were encountered: