Certificate support #241

benbromhead · 2019-08-14T19:06:04Z

I cannot figure out how to make a private team discussion topic visible or share it. So I'm going to move the certificate management discussion to a repo issue.

From @alourie:

At the moment we have a PR that allows providing TLS certificates and some scripts to setup secure node-to-node and client-to-node connections. The issue here is that to set it up properly, the user of the operator will have to perform the certificate generation for every single node manually, add the certificate to the k8 secret that we mount inside the Cassandra container and somehow hint the operator to use that certificate for the specific node. This is even worse in case of scaling up, where prior to scaling operation the user would have to remember to create an appropriate amount of new certificates for all the new nodes and add them to the secret again. Also, it would mean we volume-mount all nodes secrets inside all the Cassandra containers, which is not the most secure solution.

Here I'd like to discuss some better potential solutions for this feature.

The better solution I currently have in mind would work in a following fashion:

The user decides they want to start a secure Cassandra environment. They can either provide the operator with a root CA or allow the operator to use self-generated certificates (for example with a CRD parameter RootCA or something similar.

The operator generates the truststore using either the provided RootCA or by generating a new RootCA and mounts them inside the Cassandra container.

The operator also generates a "client" keystore/certificate for client encryption, creates a k8 secret with the certificate, and mounts it inside the container.

The entry-point script checks a specific known location on the node for the presence of the root CA, and truststore and if found performs the following operations:
a) generates a node certificate for that node
b) configures Cassandra to use the truststore and keystore
c) configures cqlsh to use the certificates.

The certificates might be places on the data PV so that they're not rebuilt on node replacement...or they can be actually rebuilt/replaced.

Then, in order to connect to the cluster, the user would download the client certificate from the k8 secret and use it in their application.

This solution allows providing only the rootCA for it to work, and will handle scaling automatically.

Any feedback is appreciated.

The text was updated successfully, but these errors were encountered:

benbromhead · 2019-08-14T19:07:27Z

From @smiklosovic:

for 5), I would opt for regenerating / rebuilding rather than storing it externally on a PV we dont necessarily have 100% control over.

If I understand this correctly, as long as we create root ca and similar as a secret, we can place it into config volume and create node stuff on-the-fly in a container upon its initialisation and setup all needed configs (secured jmx would be nice to have too).

I am not sure what 3) means, do we really need to mount client certificate inside container? It would be just generated and a secret would be created so operator (as a person) would fetch it from there to setup his tooling to connect to cluster but there is actually no need to mount this into containers themselves ...

benbromhead · 2019-08-14T19:08:04Z

Taking a step back here, I don't think managing a full PKI is something we want to do. I also know that customers won't want to have to pass a root CAs private key to our operator (it's generally something that lives in a HSM).

Cassandra requires two things to run TLS between nodes and clients. One is a public certificate it can build a chain of trust from, that lives in the truststore. The second is a private/public certificate that it will use to encrypt traffic with and that is signed by the public certificate (or a issued signing cert) in the truststore.

Ideally we want to support whatever certificate mechanism the end customer uses, or failing that support a certificate request mechanism that is open and extensible.

In terms of a workflow, most users will know the public certificate that will go in the truststore before provisioning a cluster. This is something we could define in CRD (e.g. pointing to a certificate object or secret).

The second stage is to generate a certificate pair for each node. This is likely something that will need to be done at pod run time. Keeping in mind that Cassandra does NOT pin certificates to token ranges or node identity, so certificates are only used to establish data protection (privacy and integrity) and (very, very broadly) trust. Knowing this, we can arbitrarily generate certificates as often as we want. We probably want this to happen often as well to support certificate rotation.

With this in mind, I would look to support some sort of certificate request mechanism, whereby maybe an init-container or sidecar, generates a private/public key pair and a CSR. Then leveraging an established key signing mechanism supported in K8s (either https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/ or https://github.com/jetstack/cert-manager) it would request and wait for a certificate. The init-container would then drop the certificate in the right spot for Cassandra.

Another way to do this would be to have the certificate generation, CSR, signing workflow etc, be kicked off by the operator. The only problem with this is you can't create pod specific secrets within a statefulset. So it would be hard to inject the cert into the right pod. I probably wouldn't head down this path.

I think we should raise this with the kinvolk team

alourie · 2019-08-15T05:49:11Z

@benbromhead @zegelin Okay, this sounds good to me. So if we go the path of sending a request from the pod to the api for a cert from k8....how are we going to do it from within the pod (the init container)? Does the pod have access to k8 api?

johananl · 2019-08-16T12:26:43Z

I'm still learning this proposal and would like to look at it a bit more in-depth before coming up with concrete suggestions, however I'd like to provide some initial feedback in the meantime:

I'm getting the impression the main question here is the amount of responsibility we want the operator to have. We should keep in mind that the operator's main responsibility is managing Cassandra clusters. While there could be many topics that are somehow related to the core functionality (e.g. TLS), we should carefully evaluate the amount of responsibility we give the operator so that it doesn't turn into a piece of software which tries to do everything.

That said, I'm completely with @benbromhead on this. I think we should ask ourselves what the minimal part of the PKI process which actually needs to by implemented inside the operator is, and add support for that. In my experience, the desire (with which I sympathize!) to handle everything on behalf of the user tends to lead to inflexible software which does too much and therefore can't be used in many cases (Ben's example about the HSM demonstrates exactly that).

I admit I still don't understand the low-level details of the TLS use case well enough to provide a concrete recommendation regarding which parts of the process to handle in the operator's code. I would have to look at it in more detail first.

alourie · 2019-08-19T05:45:50Z

@benbromhead @johananl I agree. The operator doesn't have to do everything, but in this case we need to take care the security, mostly because we can, but also because asking the user to do so would be too cumbersome and complicated.

That being said, we have 2 options:

We generate all the required chain on the node (C* container or init container).
We generate all the required chain within the operator and propagate to C* container.

I do think the 1st option is better, and it indeed seems to be the consensus, so we need an agreement on this and agreement about basic implementation details. Then we can actually implement it.

benbromhead · 2019-08-19T19:09:41Z

@johananl - Cassandra supports TLS encryption on two traffic paths:

client to node
node to node

Cassandra will accept a connection (client or node) if it can build a chain of trust back to a cert that it trusts. This is defined by two or more files (a truststore and a keystore) and some configuration.

To support this workflow we need to distribute a public root cert that all nodes trust, and that can sign private certs for each individual node. We also need to issue certificates to each node so that can authenticate and encrypt traffic.

Ideally we could leverage some existing k8s signing capability.

@alourie - I would start on the path of having each node generate a key pair, a signing request, it then sends that signing request off "somewhere" and gets a signed certificate back "somehow"

alourie · 2019-08-20T04:55:01Z

@benbromhead sure thing, I'll start working on that one.

johananl · 2019-08-22T12:51:15Z

EDIT: I've just realized I repeated more or less exactly what was said in this comment. Sorry :-)

Ideally we could leverage some existing k8s signing capability.

I would start on the path of having each node generate a key pair, a signing request, it then sends that signing request off "somewhere" and gets a signed certificate back "somehow"

This may help.

"Somewhere" could be the k8s API. An init container in each Cassandra pod could, for example, create a CertificateSigningRequest object.

"Somehow" could be a "signer" implementation which watches CertificateSigningRequest objects, signs CSRs automatically and then puts the certificates in a place reachable by the pod.

If desired, I can help figure out the signing part while @alourie is working on the CSR generation part.

WDYT @benbromhead?

EDIT: Using cert-manager for signing sounds like a natural solution, however we should think carefully whether we want to officially depend on a 3rd party project that isn't a part of upstream k8s here. Maybe some users wouldn't want to run cert-manager. Also, maybe some users are actually fine with a simpler, more manual solution which doesn't require a dedicated component for signing certificates.

benbromhead · 2019-08-28T15:42:25Z

If we could leverage k8s CSRs that would be super duper, it feels like a naturally pluggable point

alourie mentioned this issue Aug 20, 2019

Support user configs, user secrets and separate environments for cassandra and sidecar #218

Merged

smiklosovic added the security label Aug 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Certificate support #241

Certificate support #241

benbromhead commented Aug 14, 2019 •

edited by alourie

Loading

benbromhead commented Aug 14, 2019

benbromhead commented Aug 14, 2019

alourie commented Aug 15, 2019 •

edited

Loading

johananl commented Aug 16, 2019

alourie commented Aug 19, 2019 •

edited

Loading

benbromhead commented Aug 19, 2019

alourie commented Aug 20, 2019

johananl commented Aug 22, 2019 •

edited

Loading

benbromhead commented Aug 28, 2019

Certificate support #241

Certificate support #241

Comments

benbromhead commented Aug 14, 2019 • edited by alourie Loading

benbromhead commented Aug 14, 2019

benbromhead commented Aug 14, 2019

alourie commented Aug 15, 2019 • edited Loading

johananl commented Aug 16, 2019

alourie commented Aug 19, 2019 • edited Loading

benbromhead commented Aug 19, 2019

alourie commented Aug 20, 2019

johananl commented Aug 22, 2019 • edited Loading

benbromhead commented Aug 28, 2019

benbromhead commented Aug 14, 2019 •

edited by alourie

Loading

alourie commented Aug 15, 2019 •

edited

Loading

alourie commented Aug 19, 2019 •

edited

Loading

johananl commented Aug 22, 2019 •

edited

Loading