-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: Support denying serving Ignition to active nodes and pods #784
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
To test this, I did:
And in the MCS logs:
|
|
8f0585d
to
3693948
Compare
From a disaster recovery perspective, I think if you have e.g. master node with a static IP that you want to reprovision, in order to allow access you'd need to edit the Or we could add a config flag to allow this. In practice I think most people are going to start with trying to recover a master in-place rather than completely re-set it, and that path won't be affected by this. |
OK, passing the main tests now, though upgrade looks stuck but I doubt it's related. I also verified that scaling up the worker machineset still works, i.e. it doesn't deny the request. |
Maybe a better architecture would be for the MCS to make an HTTP request to the MCC asking if a given IP is OK? Would increase the availability requirement for the MCC. Eh, this is probably fine. If there's some transient apiserver issue it will mean a node's Ignition request will fail, but Ignition will keep retrying. |
/retest |
Do we have concrete data from Dr team on this. Seems like Amazon taking a VM away might also be common.... How do we know that the interface that acts as source ip is the one node is reporting as it's internal/public IP? |
Bear in mind this PR is not claiming to be a complete solution to the problem. It's adding a layer of defense, much like the other layers we added. For example, it does nothing about external access. The auth key approach would be much stronger. I'll check with the network team about this but remember the primary thing we're trying to prevent with this is in-cluster pods accessing the endpoint. I have some confidence that that access will appear to come from the IP address associated with the node that the kubelet reports. But again let's see what the SDN team says.
Right, instance retirement definitely happens. As I said, in that case if a newly provisioned master happens to get the same IP you'd need to explictly drop out the node object. Or alternatively, we could tweak this PR to only disallow reachable nodes which would be pretty easy. |
Assuming the pod isn't using an egress IP or egress router, then traffic from a pod on node A addressed to node B's primary node IP will appear to come from node A's primary node IP. But we didn't change the MCS to not listen on Also, in some environments, if nodes have multiple IPs, then a connection from a pod on node A to a non-primary IP on node B might appear to come from node A's non-primary IP rather than its primary IP.
VMware at least definitely leaves stale node objects around when dynamically scaling nodes. (I don't know if that would ever effect masters though.) |
Ah, OK. Hmm though I'm not sure we can get that from the golang side HTTP request... but thinking about this, why don't we just add iptables rules to the masters that require that the destination IP for the MCS is not |
iptables doesn't see the packet because it's delivered by OVS |
/hold |
3693948
to
1cc5062
Compare
1cc5062
to
d0fcc92
Compare
/hold cancel OK, updated 🆕 to also deny requests coming from
We also now only deny nodes that are And I verified with this patch that scaling up a machineset still works. |
d0fcc92
to
d408e61
Compare
pkg/server/cluster_server.go
Outdated
} | ||
|
||
for _, node := range nodes.Items { | ||
// Ignore NodeReady=false nodes; this allows reprovisioning them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node not ready doesn't mean node is getting re-provisioned. where did we get that assumption?
This is still not solved. |
Right, but...again, not claiming a comprehensive solution here. The issue is important enough to have layers of defenses. That said...one approach that we could take (derived from an approach @ericavonb mentioned) is run a daemonset service on each node (probably as part of the MCD) that listens on a well-known port and provides a static reply. Then we could have the MCS "call back" to the requesting IP - if it gets a reply it knows to deny the request. |
Or...maybe way simpler, just try to connect to port 22. |
OK this came up in a conversation again; rebased 🏄 - only compile tested now though. |
Since this is now informational only, I think what we need to do now is export a Prometheus metric for denials, and the roll that up into telemetry ? |
I think the way this works is broken. All I (as an attacker) need to do to get the ignition is to block SSH on my local machine, then request the ignition. |
This isn't intending to block access external to the cluster. I think one should use IaaS firewalling for that to start, and the default OpenShift installs do so. But I know this is a "gotcha" in various UPI scenarios - we should absolutely consider a way to address that better. Maybe a simple "provisioning enabled" boolean that would turn off the MCS entirely - in non-machineAPI scenarios booting new machines is an unusual scenario. (Only privileged code running on the cluster could block SSH, that's why it works in-cluster) |
What I think we should do is secure the endpoint with authentication. A firewall or iptables rule is not a subsitute for authentication. The machine-api will be available in baremetal clusters soon, as well as vmware, and there's no telling what the network topology of those is going to look like. It's very well possible that the MCS is behind a routable IP. |
Yep, that was #736 but where it died is requiring it would be a huge UX issue for many baremetal flows. That said...I'm increasingly feeling like a "good enough" mitigation would be something like switching to requiring an auth token after the cluster is initialized. |
Consider the reverse of optionally disabling it for baremetal flows. |
We can access the ignition files from any PC in our network that doesn't belong to the vSphere UPI OKD 4 cluster: The curl command described there doesn't work out of the box in pods running in the cluster but I'm not sure if this can be enabled somehow by a potential attacker. I assume that also vSphere credentials are inside of the ignition files? This would be a major security leak. As proposed earlier in this PR: is it possible to secure the API endpoint? If that's not possible in short term what is the proposed workaround to secure the ignition files with a firewall? Could you provide a best practice network layout for that? Our loadbalancer for port 22623 is in a different network than our cluster VMs. And it might be a little bit cumbersome to configure that. So any best practice setup hint is welcome. |
/hold This needs an enhancement. Casually skimming the history, it's clear that there are still open questions. |
So in the middle of this epic PR discussion, the change turned from "deny" to "warn and add opt-in mechanism to deny". I forgot to change the PR title which probably led to a lot of confusion. I completely agree we need an enhancement if we try to do anything that would deny (and particularly anything that ties together machineAPI and MCO or affects bare metal provisioning flow, disaster recovery etc.) I'm less in agreement that we need an enhancement to log by default. If it helps I can remove the ability to deny. (But probably instead of logging to the pod we really want saner observability like an event and prometheus metrics, I need to look at that) |
Is there any hint in the docs that a firewall should be set up to prevent anyone from pulling the ignition files with cloud credentials contained in it? Or can this deny Switch be configured during installation of the cluster? By providing a switch in the install-config.yaml for example. |
After openshift/enhancements#368 lands we'll be in a better place to enforce an auth token for MAO managed setups. I do agree with Alex we want an enhancement for this but it should basically be:
It's harder to do better than that unless we go to per machine user data, but this would suffice to start. In MAO managed scenarios we should be able to iteratively upgrade later. But if we start requiring this token e.g. to be specified at the PXE commandline on UPI metal then it will become a bit of an "API". |
Working on an enhancement for this: https://hackmd.io/k7Mfb1lpSIWTzRFvo6o9ig |
See openshift/machine-config-operator#784 The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
See openshift/machine-config-operator#784 The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
See openshift/machine-config-operator#784 The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
See openshift/machine-config-operator#784 The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
See openshift/machine-config-operator#784 The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Will be obsoleted by #2223 |
See openshift/machine-config-operator#784 The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
See openshift/machine-config-operator#784 The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
Ignition may contain secret data; pods running on the cluster
shouldn't have access.
This PR closes of access to any IP that responds on port 22, as that
is a port that is:
A previous attempt at this was to have an auth token;
but this fix doesn't require changing the installer and people's PXE setups.
In the future we may reserve a port in the 9xxx range and have the
MCD respond on it so that admins who disable/firewall SSH don't
have indirectly reduced security.