-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFP-30984 : Add CFP for DNS proxy HA - V2 #54
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Hemanth Malla <[email protected]>
**Cilium Release:** 1.17 | ||
|
||
**Authors:** Hemanth Malla <[email protected]>, Vipul Singh <[email protected]> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Administrative point:
If we think we agree on the basics here, but would like more of the design to be fleshed out:
Status: Provisional | |
(If provisional, please highlight anything that still needs to be expanded with the text "UNRESOLVED: description of unresolved design point").
Or alternatively if we think the design is satisfactorily complete:
Status: Implementable | |
See status in the readme for more details. https://github.com/cilium/design-cfps?tab=readme-ov-file#status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the point I made at the end of the discussion with @cilium/sig-policy today. Whatever we agree upon, we should be able to merge in. If there are points left to iterate on, we can highlight those points. If we've resolved the points of difference we can mark it implementable. Goal is to make iteration easier so we don't have to overthink an entire design that is subject to change during implementation, but rather we can more incrementally change the design in-tree as we discuss, debate, implement and learn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a status field once we close out the open discussion items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looks like we haven't documented Provisional
in the readme yet ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also looks like we haven't documented Provisional
in the readme yet ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, provisional is just under discussion in #48 but not documented yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's mark this as implementable.
Signed-off-by: Hemanth Malla <[email protected]>
811a464
to
1980cad
Compare
@squeed / @joestringer we addressed your comments and appreciate if you can another look. Haven't decided on status of CFP though. Do you think we can set to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine with merging in "Provisional" status with these changes. Given how this feature interacts with availability during upgrade, I think we'd need to align on what upgrade should look like before really considering this as implementable.
The danger I see for not aligning around upgrade would be that this feature could either block initiatives from @cilium/sig-datapath / @cilium/sig-policy to optimize the dataplane, or alternatively such initiatives could end up breaking this feature which we don't really want either.
Either this or we say that figuring out upgrade is part of moving this feature from alpha->beta or beta->stable. We haven't outlined criteria for alpha/beta/stable features in the project yet but I'd be open to discussions about how we set expectations around that.
Agreed. we can create separate doc/CFP just to brainstorm about how upgrade works with SDP and in-agent dnsproxy. we can note
That's true. For version compatibility, if required, we can follow same path as envoy by tying SDP version with cilium versions https://github.com/cilium/proxy#version-compatibility-matrix. The motive should be try to support upgrade without breaking but it may become necessary to break in some cases where we can come up with something like version compatibility. |
AFAIK cilium-proxy doesn't try to solve upgrade, the link is just stating which version of Cilium ships with which Envoy version. |
Signed-off-by: Hemanth Malla <[email protected]>
@joestringer hope you are talking about minor upgrades. In my opinion, upgrade should be backward compatible atleast with immediate previous minor. Old SDP <-> New cilium or New SDP <-> old cilium should work. It might require for us to support both old/new api/fields in new version for some time. Subsequently in later releases, we can remove deprecated api/fields after we make wide announcement. |
Signed-off-by: Hemanth Malla <[email protected]>
e915d0d
to
dadacae
Compare
Signed-off-by: Hemanth Malla <[email protected]>
11fee60
to
8d7459c
Compare
@@ -23,10 +23,11 @@ Users rely on toFQDN policies to enforce network policies against traffic to des | |||
* Introduce a streaming gRPC API for exchanging FQDN policy related information. | |||
* Introduce a standalone DNS proxy (SDP) that binds on the same port as built-in proxy with SO_REUSEPORT. | |||
* Enforce L7 DNS policy via SDP. | |||
* When an endpoint's DNS traffic is selected by an L7 policy, DNS requests and responses will be forwarded to their destinations via SDP even if cilium-agent is not running. So, clients re-resolving DNS to establish new connections will not be blocked anymore if the IP addresses from the new resolution are unchanged. Note that the L3/L4 policy for the resolved names should have already been plumbed when the agent was running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a goal, this is implementation detail. Should this be dropped or moved under the proposal section?
|
||
* SDP will not listen on the DNS proxy port until a connection is established with cilium agent and initial L7 DNS policy rules are received. Meanwhile, built-in DNS proxy will continue to serve requests. SDP relies on cilium agent for initial bootstrap. In future, we could make SDP retrieve initial policy information from other sources, but this is not in scope for this CFP. | ||
|
||
### Handling Upgrades |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't feel that this section adequately explains upgrades. The text is capturing some of the key discussions we have had though, which helps. What I have in mind is something a bit more like a test plan that outlines the bringing up / shutting down of different components and exactly what's expected in terms of forwarding state around.
Overall though I don't feel that continuing this path by debating back and forth in this design doc is really helping drive this upgrades discussion. I think that at this point it makes sense to just accept that some aspects of this CFP are underspecified / not agreed upon, but the overall design is still "implementable" and ultimately subject to review of the actual implementation. As we learn more through implementation we can iron out any remaining differences, and figure out how to best test the functionality.
EDIT 2024-11-23: Better wording, more helpful suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions:
- Add the status
- Move the new goal into the proposal section
Then let's move forward with this. I think that developing and testing the implementation will do more practical good than continuing discussion on the CFP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the late comments. As Joe said, let's discuss more on an actual PR - please do include me on review.
} | ||
``` | ||
|
||
Method : SubscribeToDNSPolicies ( Invoked from agent to SDP via bi-directional stream ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invoked from agent to SDP
Should probably be "invoked from SDP to agent" right?
Otherwise I think this doesn't match how the RPC is specified, and also would mean that the SDP exposes a gRPC API itself, which seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's correct. its invoked by SDP to agent and that opens up a stream which used by Cilium Agent to send updates
|
||
*Note: `dns_pattern` follows the same convention used in CNPs. See https://docs.cilium.io/en/stable/security/policy/language/#dns-based for more details* | ||
|
||
*Note: `DNSPolicies` is a snapshot of the latest known policy information for all endpoints on the host. Sending a snapshot allows for dealing with deletions automatically* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you intend to send a full snapshot each time? It seems possible to match k8s with a ListAndWatch
approach to avoid sending a full snapshot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we intend to send the full snapshot of the policies applied. Not familiar with the ListAndWatch
approach, just curious if that will require permissions for SDP to read the CRD ? Then SDP will map the policy rules (labels<>identity) too ?
Add CFP for cilium/cilium#30984
Follow up of #32