Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: Support denying serving Ignition to active nodes and pods #784

Closed
wants to merge 1 commit into from

Conversation

cgwalters
Copy link
Member

@cgwalters cgwalters commented May 21, 2019

Ignition may contain secret data; pods running on the cluster
shouldn't have access.

This PR closes of access to any IP that responds on port 22, as that
is a port that is:

  • Known to be active by default
  • Not firewalled

A previous attempt at this was to have an auth token;
but this fix doesn't require changing the installer and people's PXE setups.

In the future we may reserve a port in the 9xxx range and have the
MCD respond on it so that admins who disable/firewall SSH don't
have indirectly reduced security.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 21, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2019
@cgwalters
Copy link
Member Author

cgwalters commented May 21, 2019

To test this, I did:
oc debug node/<worker> and then did chroot /host iptables -I OPENSHIFT-BLOCK-OUTPUT -p tcp --dport 22623 -j ACCEPT to undo the SDN filtering, then from that same pod, I see:

# curl -k --head https://10.0.131.197:22623/config/master
HTTP/2 403 
content-length: 0
date: Tue, 21 May 2019 15:34:35 GMT

And in the MCS logs:

oc logs pods/machine-config-server-6snf2
I0521 15:27:25.572401       1 start.go:37] Version: 4.0.0-alpha.0-437-g307e6bea-dirty (307e6bea9ed1025202d431be21184fe9ea4f6066)
I0521 15:27:25.574374       1 api.go:52] launching server
I0521 15:27:25.574434       1 api.go:52] launching server
I0521 15:27:42.202818       1 api.go:106] Denying unauthorized request: Node ip-10-0-134-102.us-east-2.compute.internal with address 10.0.134.102 is already provisioned

@ashcrow
Copy link
Member

ashcrow commented May 21, 2019

 func manifestsMachineconfigserverClusterroleYamlBytes() ([]byte, error) {
make: *** [verify] Error 1
hack/../pkg is out of date. Please run make update

@cgwalters cgwalters force-pushed the mcs-check-machines branch from 8f0585d to 3693948 Compare May 21, 2019 15:43
@cgwalters
Copy link
Member Author

From a disaster recovery perspective, I think if you have e.g. master node with a static IP that you want to reprovision, in order to allow access you'd need to edit the node object to drop its status/addresses data.

Or we could add a config flag to allow this.

In practice I think most people are going to start with trying to recover a master in-place rather than completely re-set it, and that path won't be affected by this.

@cgwalters
Copy link
Member Author

OK, passing the main tests now, though upgrade looks stuck but I doubt it's related.

I also verified that scaling up the worker machineset still works, i.e. it doesn't deny the request.

@cgwalters
Copy link
Member Author

Maybe a better architecture would be for the MCS to make an HTTP request to the MCC asking if a given IP is OK? Would increase the availability requirement for the MCC. Eh, this is probably fine. If there's some transient apiserver issue it will mean a node's Ignition request will fail, but Ignition will keep retrying.

@cgwalters
Copy link
Member Author

/retest

@abhinavdahiya
Copy link
Contributor

In practice I think most people are going to start with trying to recover a master in-place rather than completely re-set it, and that path won't be affected by this.

Do we have concrete data from Dr team on this. Seems like Amazon taking a VM away might also be common....

How do we know that the interface that acts as source ip is the one node is reporting as it's internal/public IP?

@cgwalters
Copy link
Member Author

How do we know that the interface that acts as source ip is the one node is reporting as it's internal/public IP?

Bear in mind this PR is not claiming to be a complete solution to the problem. It's adding a layer of defense, much like the other layers we added. For example, it does nothing about external access. The auth key approach would be much stronger.

I'll check with the network team about this but remember the primary thing we're trying to prevent with this is in-cluster pods accessing the endpoint. I have some confidence that that access will appear to come from the IP address associated with the node that the kubelet reports. But again let's see what the SDN team says.

Seems like Amazon taking a VM away might also be common....

Right, instance retirement definitely happens. As I said, in that case if a newly provisioned master happens to get the same IP you'd need to explictly drop out the node object.

Or alternatively, we could tweak this PR to only disallow reachable nodes which would be pretty easy.

@danwinship
Copy link
Contributor

the primary thing we're trying to prevent with this is in-cluster pods accessing the endpoint. I have some confidence that that access will appear to come from the IP address associated with the node that the kubelet reports.

Assuming the pod isn't using an egress IP or egress router, then traffic from a pod on node A addressed to node B's primary node IP will appear to come from node A's primary node IP.

But we didn't change the MCS to not listen on 0.0.0.0 did we? So a pod on node A could connect to node B's tun0 IP instead, and that would appear to come from the pod's IP directly. (So you'd want to filter out all connections with source IPs in the pod network as well. Or more simply, filter out connections if the destination IP is the tun0 IP.)

Also, in some environments, if nodes have multiple IPs, then a connection from a pod on node A to a non-primary IP on node B might appear to come from node A's non-primary IP rather than its primary IP.

if a newly provisioned master happens to get the same IP you'd need to explictly drop out the node object.

VMware at least definitely leaves stale node objects around when dynamically scaling nodes. (I don't know if that would ever effect masters though.)

@cgwalters
Copy link
Member Author

(So you'd want to filter out all connections with source IPs in the pod network as well. Or more simply, filter out connections if the destination IP is the tun0 IP.)

Ah, OK. Hmm though I'm not sure we can get that from the golang side HTTP request... but thinking about this, why don't we just add iptables rules to the masters that require that the destination IP for the MCS is not tun0?

@danwinship
Copy link
Contributor

iptables doesn't see the packet because it's delivered by OVS

@cgwalters
Copy link
Member Author

/hold
Per discussion above, this needs more work to implement the suggestions in #784 (comment)

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 22, 2019
@cgwalters cgwalters force-pushed the mcs-check-machines branch from 3693948 to 1cc5062 Compare May 24, 2019 18:12
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 24, 2019
@cgwalters cgwalters force-pushed the mcs-check-machines branch from 1cc5062 to d0fcc92 Compare May 24, 2019 18:33
@cgwalters
Copy link
Member Author

/hold cancel

OK, updated 🆕 to also deny requests coming from tun0. I verified both approaches work, doing a request to the main master IP directly, as well as targeting its tun0.

oc logs pods/machine-config-server-l7wd2
I0523 22:27:09.952731       1 start.go:37] Version: 4.0.0-alpha.0-444-g9eebd1d4-dirty (9eebd1d4a17eb2d26ae74709252cf6ea77330703)
I0523 22:27:09.955394       1 api.go:52] launching server
I0523 22:27:09.955474       1 api.go:52] launching server
I0524 18:00:24.919941       1 api.go:99] Pool master requested by 10.0.137.206:32870
I0524 18:00:24.936254       1 api.go:106] Denying unauthorized request: Node ip-10-0-137-206.us-east-2.compute.internal with address 10.0.137.206 is already provisioned
I0524 18:06:39.321439       1 api.go:99] Pool master requested by 10.131.0.1:59360
I0524 18:06:39.330104       1 api.go:106] Denying unauthorized request: Requesting host 10.131.0.1 is within pod network CIDR 10.128.0.0/14

We also now only deny nodes that are NodeReady=true.

And I verified with this patch that scaling up a machineset still works.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2019
@cgwalters cgwalters force-pushed the mcs-check-machines branch from d0fcc92 to d408e61 Compare May 24, 2019 18:39
}

for _, node := range nodes.Items {
// Ignore NodeReady=false nodes; this allows reprovisioning them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node not ready doesn't mean node is getting re-provisioned. where did we get that assumption?

@abhinavdahiya
Copy link
Contributor

Also, in some environments, if nodes have multiple IPs, then a connection from a pod on node A to a non-primary IP on node B might appear to come from node A's non-primary IP rather than its primary IP.

This is still not solved.

@cgwalters
Copy link
Member Author

[multi-NIC] is still not solved.

Right, but...again, not claiming a comprehensive solution here. The issue is important enough to have layers of defenses.

That said...one approach that we could take (derived from an approach @ericavonb mentioned) is run a daemonset service on each node (probably as part of the MCD) that listens on a well-known port and provides a static reply. Then we could have the MCS "call back" to the requesting IP - if it gets a reply it knows to deny the request.

@cgwalters
Copy link
Member Author

Or...maybe way simpler, just try to connect to port 22.

@cgwalters
Copy link
Member Author

OK this came up in a conversation again; rebased 🏄 - only compile tested now though.

@cgwalters
Copy link
Member Author

Since this is now informational only, I think what we need to do now is export a Prometheus metric for denials, and the roll that up into telemetry ?

@jamescassell
Copy link

This PR closes of access to any IP that responds on port 22, as that
is a port that is:

Known to be active by default
Not firewalled

I think the way this works is broken. All I (as an attacker) need to do to get the ignition is to block SSH on my local machine, then request the ignition.

@cgwalters
Copy link
Member Author

All I (as an attacker) need to do to get the ignition is to block SSH on my local machine, then request the ignition.

This isn't intending to block access external to the cluster. I think one should use IaaS firewalling for that to start, and the default OpenShift installs do so.

But I know this is a "gotcha" in various UPI scenarios - we should absolutely consider a way to address that better. Maybe a simple "provisioning enabled" boolean that would turn off the MCS entirely - in non-machineAPI scenarios booting new machines is an unusual scenario.

(Only privileged code running on the cluster could block SSH, that's why it works in-cluster)

@michaelgugino
Copy link
Contributor

What I think we should do is secure the endpoint with authentication. A firewall or iptables rule is not a subsitute for authentication. The machine-api will be available in baremetal clusters soon, as well as vmware, and there's no telling what the network topology of those is going to look like. It's very well possible that the MCS is behind a routable IP.

@cgwalters
Copy link
Member Author

What I think we should do is secure the endpoint with authentication.

Yep, that was #736 but where it died is requiring it would be a huge UX issue for many baremetal flows.

That said...I'm increasingly feeling like a "good enough" mitigation would be something like switching to requiring an auth token after the cluster is initialized.

@michaelgugino
Copy link
Contributor

What I think we should do is secure the endpoint with authentication.

Yep, that was #736 but where it died is requiring it would be a huge UX issue for many baremetal flows.

That said...I'm increasingly feeling like a "good enough" mitigation would be something like switching to requiring an auth token after the cluster is initialized.

Consider the reverse of optionally disabling it for baremetal flows.

@jomeier
Copy link

jomeier commented May 13, 2020

We can access the ignition files from any PC in our network that doesn't belong to the vSphere UPI OKD 4 cluster:

okd-project/okd#176

The curl command described there doesn't work out of the box in pods running in the cluster but I'm not sure if this can be enabled somehow by a potential attacker.

I assume that also vSphere credentials are inside of the ignition files? This would be a major security leak. As proposed earlier in this PR: is it possible to secure the API endpoint?

If that's not possible in short term what is the proposed workaround to secure the ignition files with a firewall? Could you provide a best practice network layout for that? Our loadbalancer for port 22623 is in a different network than our cluster VMs. And it might be a little bit cumbersome to configure that. So any best practice setup hint is welcome.

@crawford
Copy link
Contributor

/hold

This needs an enhancement. Casually skimming the history, it's clear that there are still open questions.

@cgwalters cgwalters changed the title server: Deny serving Ignition to provisioned nodes server: Support denying serving Ignition to active nodes and pods May 20, 2020
@cgwalters
Copy link
Member Author

So in the middle of this epic PR discussion, the change turned from "deny" to "warn and add opt-in mechanism to deny". I forgot to change the PR title which probably led to a lot of confusion.

I completely agree we need an enhancement if we try to do anything that would deny (and particularly anything that ties together machineAPI and MCO or affects bare metal provisioning flow, disaster recovery etc.) I'm less in agreement that we need an enhancement to log by default. If it helps I can remove the ability to deny.

(But probably instead of logging to the pod we really want saner observability like an event and prometheus metrics, I need to look at that)

@jomeier
Copy link

jomeier commented May 20, 2020

Is there any hint in the docs that a firewall should be set up to prevent anyone from pulling the ignition files with cloud credentials contained in it? Or can this deny Switch be configured during installation of the cluster? By providing a switch in the install-config.yaml for example.

@cgwalters
Copy link
Member Author

After openshift/enhancements#368 lands we'll be in a better place to enforce an auth token for MAO managed setups.

I do agree with Alex we want an enhancement for this but it should basically be:

  • installer generates oc -n openshift-config create secret generic provisioning token=<random token>
  • installer injects that into its user data
  • MCO also does so for the pointer configs it manages
  • MCO denies requests which don't have a header with the token

It's harder to do better than that unless we go to per machine user data, but this would suffice to start. In MAO managed scenarios we should be able to iteratively upgrade later. But if we start requiring this token e.g. to be specified at the PXE commandline on UPI metal then it will become a bit of an "API".

@cgwalters
Copy link
Member Author

Working on an enhancement for this: https://hackmd.io/k7Mfb1lpSIWTzRFvo6o9ig

cgwalters added a commit to cgwalters/enhancements that referenced this pull request Aug 19, 2020
See openshift/machine-config-operator#784

The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
cgwalters added a commit to cgwalters/enhancements that referenced this pull request Aug 19, 2020
See openshift/machine-config-operator#784

The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
cgwalters added a commit to cgwalters/enhancements that referenced this pull request Aug 19, 2020
See openshift/machine-config-operator#784

The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
@openshift-ci-robot
Copy link
Contributor

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-etcd-quorum-loss 3d295414e006ce429707e0abb37b254be87162b2 link /test e2e-etcd-quorum-loss
ci/prow/e2e-aws-disruptive fc2b74b6a3f41d1a80c7d37e2c5a6ebe781de532 link /test e2e-aws-disruptive
ci/prow/e2e-vsphere 24cecd593e9e31843dddd35db1381928b695271e link /test e2e-vsphere
ci/prow/e2e-aws-proxy 75db5c2 link /test e2e-aws-proxy
ci/prow/e2e-upgrade 75db5c2 link /test e2e-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

cgwalters added a commit to cgwalters/enhancements that referenced this pull request Sep 24, 2020
See openshift/machine-config-operator#784

The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
cgwalters added a commit to cgwalters/enhancements that referenced this pull request Oct 6, 2020
See openshift/machine-config-operator#784

The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
@openshift-merge-robot
Copy link
Contributor

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-upgrade 75db5c2 link /test e2e-agnostic-upgrade
ci/prow/e2e-aws-serial 75db5c2 link /test e2e-aws-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cgwalters
Copy link
Member Author

Will be obsoleted by #2223

@cgwalters cgwalters closed this Nov 12, 2020
cgwalters added a commit to cgwalters/enhancements that referenced this pull request Nov 25, 2020
See openshift/machine-config-operator#784

The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
cgwalters added a commit to cgwalters/enhancements that referenced this pull request Nov 30, 2020
See openshift/machine-config-operator#784

The Ignition configuration can contain secrets, and we want to avoid it being accessible both inside and outside the cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.