How does this work with startup, readiness and liveness probes? #34

anoop2811 · 2024-12-06T19:35:52Z

What if we have the probes hitting the configured ports? Wouldnt this sort of simulate traffic and hence keep the pod up at all times? If this could work bypassing the probe checks, would be great!
Also is the checkpoint saved in the same node, what if when there it tries to come up due to resource contention is scheduled onto a different node? Can the checkpointed image be pushed to say ECR so that it can be pulled into any node?

Btw, great project and great idea!

ctrox · 2024-12-07T07:55:58Z

Thank you for checking it out!

What if we have the probes hitting the configured ports? Wouldnt this sort of simulate traffic and hence keep the pod up at all times?

Yes, this is exactly what happens currently. The probes are being sent by kubelet so they wake up the process. As Zeropod is redirecting the port of the app when scaled down, probes could be intercepted and replied but the tricky part is that now the shim needs to be aware of very specific pod configuration to know how to respond to these probes. The current way to pass configuration via annotations could also be used for this but I fear it would get pretty hairy for more complex probes. Additionally you would be mostly checking that the shim is still running while the app is scaled down and containerd already has some ways to ensure that. So that's why it hasn't really been the top priority so far.

Also is the checkpoint saved in the same node, what if when there it tries to come up due to resource contention is scheduled onto a different node? Can the checkpointed image be pushed to say ECR so that it can be pulled into any node?

Yeah I also had similar thoughts and already did some prototyping with this a while back. Would be pretty cool but it would also expand the scope of the project quite a bit 😄

anoop2811 · 2024-12-08T02:58:34Z

Thanks for the reply. Is there an example of intercepting the kubelet traffic and responding to the probes from kubelet? If not, and you have pointers, I could try to contribute if i am able to get it working.

ctrox · 2024-12-08T09:36:02Z

So all traffic that goes to the container(s) that Zeropod manages is being intercepted (redirected) already in scaled down state. Zeropod never drops any traffic to an app but it will "delay" it if the app is scaled down. You can see how that works in the activation sequence. I gave this some more thought and I think first we would need to define what probes are useful to us.

Startup probe: This already works and I use it in e2e tests. Since it's just being tested during startup, we don't need to worry about it waking the application after the initial success.
Liveness/Readiness probes: These also technically already work but they of course keep the container in constant running state depending on the interval. Usually they are used to determine if the container is still ready/healthy after starting but this gets tricky for us since we can't know if the container will be healthy before we actually restore it. Also we can't really determine if incoming traffic is a probe check from kubelet or just a normal request to the app. But even if we could tell, what would be the point in essentially faking these probes? IMO the one thing I think is useful right now is a Liveness probe that checks in relatively large intervals, e.g. setting periodSeconds: 300 will restore the container every 5 minutes to ensure restoring still works.

If the goal is to just be compatible with existing Pod manifests that define readiness/liveness probes and essentially fake them to be successful, I'm not really sure that's worth implementing.

anoop2811 · 2024-12-09T01:26:01Z

I understand. However the usefulness imo would be reduced as most apps would be configured with liveness and readiness probes in sub mins which sort of defeats the purpose of using zeropod. Would you be comfortable for users to configure the behavior to fake the responses from k8s probes (we could use user-agents set in requests ) to determine internal vs external traffic and fake the response by using the last reponse sent? could it be a configuration here ?

ctrox · 2024-12-11T07:54:45Z

However the usefulness imo would be reduced as most apps would be configured with liveness and readiness probes in sub mins which sort of defeats the purpose of using zeropod.

The problem is that Zeropod needs to be aware of the probes that are configured in order to reply to them as expected. As already mentioned, we don't have access to the Pod spec, so the user would still need to change the Pod to tell Zeropod how to reply to their probes. This means adding a bunch of annotations to the Pod. At that point, wouldn't it be just easier to adjust the probes themselves?

Would you be comfortable for users to configure the behavior to fake the responses from k8s probes (we could use user-agents set in requests ) to determine internal vs external traffic and fake the response by using the last reponse sent?

We could probably support HTTP probes with this (detected by the user agent) but for raw TCP it would be way harder to differentiate from normal traffic. It will mean that the activator needs to support HTTP when a probe is configured. Currently the activator is just a TCP proxy so HTTP support would need to be added while making sure to not break the existing functionality of waking raw TCP apps. In addition, I'd like to keep the shim as slim as possible, I already had some issues in the past where memory usage exploded by just importing some simple things. So that's another reason I'm a bit careful to add more functionality to the shim itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does this work with startup, readiness and liveness probes? #34

How does this work with startup, readiness and liveness probes? #34

anoop2811 commented Dec 6, 2024

ctrox commented Dec 7, 2024

anoop2811 commented Dec 8, 2024

ctrox commented Dec 8, 2024

anoop2811 commented Dec 9, 2024

ctrox commented Dec 11, 2024

How does this work with startup, readiness and liveness probes? #34

How does this work with startup, readiness and liveness probes? #34

Comments

anoop2811 commented Dec 6, 2024

ctrox commented Dec 7, 2024

anoop2811 commented Dec 8, 2024

ctrox commented Dec 8, 2024

anoop2811 commented Dec 9, 2024

ctrox commented Dec 11, 2024