-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write sections on invalid commits and access control #261
Conversation
4c0a063
to
05dcc13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a large late addition whose main impact is to replace general guidance with half-formed more-specific solutions. For problems like this where full solutions involve substantial complexity, it is better for this document to focus on accurately describing the problem (as it does now).
The reason for this PR was discussed on the mailing list: https://mailarchive.ietf.org/arch/msg/mls/iH61yiILUASWyqt-4cbOwZsUpUQ/ In almost every section, this document describes a.) problems that need to be solved to deploy MLS, and b.) the general space of solutions and trade-offs between them. The issue with this section is that is presents problems with no guidance on solutions |
I concur with Richard. I don't disagree that this section doesn't provide clear guidance, but that's because the WG did not come down on specific solutions when it designed MLS. This PR does not change that situation, and I don't think we should land it. |
Is there different text we could add to this document that addresses my concerns? |
Assuming by "addressing my concerns" you mean adding detailed guidance for how to handle invalid commits, I think the answer is "no". The details of how you solve this problem are so tied up with application details that I don't really think there's a way to map out the space of solutions in a succinct enough way to fit in this document. This document's job is to describe the overall design and deployment environment for MLS, not to provide solutions for every problem. If folks wanted to start up an "implementation considerations" document where we can go into some of these things in depth, that would be fine; in fact, it has been discussed at some past WG meetings. But we don't need to weigh the architecture document down with this. |
Repeating what I said on the list, I don't think it makes any sense at all to publish this document (which, per the intro, "provides guidance on how to deploy MLS along with discussions of privacy/security trade-offs") without discussing this. These problems are core to every MLS deployment, and many of the solutions implementors have chosen have subtle and/or unexpected effects on the security & privacy properties of their deployment |
At this point, I think we just disagree, so I think we need the chairs to figure out how to resolve this. |
The architecture document publishes guidance on things we had consensus to provide guidance on. I proposed guidance in a number of areas that also ended up "on the cutting room floor". |
I don't know why your past proposals were rejected, but this text addresses substantial issues with the way MLS has been used in practice:
|
Regarding the concern about PCS-negation: The consequences of "resync" operations to PCS are well known and documented at the end of Section 12.4.3.2 of the specification. If an application wants to mitigate the resulting risk it can follow the recommendations described in that section, or indeed regularly rotate the signature key. It might be worth referencing that text in the architecture document, though. As far as I'm aware (and as explored in this paper by @cascremers), Applications that use Signal commonly have a similar problem, where they allow the re-establishment of sessions (to recover from loss of state). The result is the same as with a resync in MLS. The advantage of MLS is that it allows the rotation of signature keys to mitigate this problem. |
The text that you link to discusses signature key compromise; the "external rejoin" approach makes it impossible to recover from key schedule compromise. No signature key compromise is required |
Then I don't understand the compromise scenario you allude to when you say that rejoining negates PCS. I also re-read the text you propose in this PR and it doesn't make it any clearer. Could you elaborate? |
Note that while the GroupInfo is signed, it is not necessarily signed by the member that was compromised, so users might not have a way to revoke that key such that the GroupInfo signature would no longer verify. |
Why would Bob accept a GroupInfo from epoch i if he's already successfully processed a commit from epoch i+1? Is there a scenario where it would be necessary (from an operational standpoint) for Bob to accept epoch downgrades? |
I don't know of a reason that it would be necessary to accept epoch downgrades, but only accepting epoch upgrades doesn't really prevent this issue (and nobody has mentioned that they do this anyway). That's because every client is essentially a signing oracle for a GroupInfo with epoch n+1:
|
Uh, if they are accepting epoch downgrades they are not in compliance with RFC 9420, AFAIK. Here is the first bullet item of Section 12.4.2 on Processing a Commit:
You seem to be changing your story about this attack between yesterday and today. If Alice has "healed" when she sends a Commit with UpdatePath, the external_pub will have changed and all the ratchet_tree secrets above Alice. If Bob processes the "healing commit", the attacks you describe in your example attack yesterday are no longer possible in the new epoch. Regarding Eve and Frank not getting Alice's commit in today's attack, even if the DS has a GroupInfo and external_pub, the clients Eve and Frank would need to validate the credentials of all the LeafNodes in the GroupInfo (which only the AS could forge, not the DS). There is a separate whole section in the Security Considerations section about the compromise of the AS. Also, the application still needs a reason to accept the external commit. There may be app-level authorization. In MIMI, for example, Alice would have needed to add the DS to the roster (an active attack of Alice and a malicious DS), otherwise the clients of Eve and Frank would never authorize the presence of the DS in the GroupInfo. Finally, there is also the |
I don't think Brendan was suggesting the DS literally becomes a group member other than by compromising a client, but I agree with the last part in particular. I wonder if the whole attack scenario can be collapsed to the following:
If that's the case, an attack where the DS isolates members and keeps them on a compromised epoch and subsequently a fork is a trivial one. That's not even specific to MLS, but specific to the fact that all communication is relayed over the DS. As Rohan points out, you need an out-of-band verification mechanism in that case. I'd suggest writing down the exact attack, including all assumptions, in a document so that folks can give better feedback and iterations become possible. |
You are conflating two issues, @raphaelrobert. Yes, if an epoch is compromised while a member is offline, then when the member comes online, the DS can withhold PCS-achieving commits that would end the compromise. The only way to detect that is with out-of-band communication. I only discussed the potential of "group is compromised while member is offline" to address Konrad's question of "why don't externally joining members just check that the epoch field of the GroupInfo is higher than before?" -- Frank and Eve were always going to be vulnerable to a malicious DS. The client that should NOT be vulnerable is Bob, because Bob already saw and processed the PCS-achieving commit. However, Bob IS vulnerable, and did not actually achieve PCS by processing the commit (described why here). Even if Bob follows Konrad's suggestion of "check that the epoch counter in GroupInfo is higher than before", Bob still does not achieve PCS (described why here). Although, to be clear, a client performing an external join has no obligation under RFC 9420 to do Konrad's suggested check, so the existence of Frank and Eve is unecessary. |
Thanks for elaborating @Bren2010, I understand the attack now. Two observations:
|
To circle back to the top though, the question is whether we should include this text in the document. Given that these decisions impair security and that this consequence isn't immediately apparent to people, I think that's a very strong argument for including!
Like I said, in a pure RFC 9420 implementation the epoch checking doesn't happen, so Frank and Eve are unnecessary. The DS can rollback the group as far as it likes (bounded maybe only by credential expirations) |
It's an argument for including the analysis, but much of the proposed text is actually additional technical guidance. If you'd like to propose some text that only has additional analysis, I'd be glad to take a look |
This text has no guidance / recommendations / "should"s. It is only discussing potential solutions and their trade-offs. Not sure which text you're referring to. |
Most of the section entitled "Invalid Commits" in which you discuss a bunch of high-level designs. |
I'm sorry, I don't understand what you're asking |
Hi @Bren2010 , If a member of a group has a key schedule compromise, a malicious DS can prevent healing of the compromise by blocking Commits from the compromised member or preventing them from being fanned out, or by not forwarding a Remove proposal from the compromised member. (The DS maintains the group in an epoch containing a compromised member, possibly isolating the compromised member into a fork of the group). This risk can be partially mitigated by having a policy requiring each member to periodically update its keying material or be removed, by revoking the compromised client's credentials, and/or by using a mechanism like the epoch_authenticator to make sure all members have the same view of the group. A partially compromised client might choose to reinitialize or create a new group instead of merely updating its key material. In a variation of this attack, a malicious DS may present a GroupInfo to external joiners from a compromised epoch instead of the correct epoch. Clients are reminded to validate the LeafNodes of all members (including their credentials) of the ratchet tree in an external join. Members who are rejoining members with an external commit can also make sure that the epoch does not decrease (increases?) during a rejoin. (A malicious DS might even send an invalid commit to other members attempting to trigger a rejoin). Small very privacy sensitive groups may choose to to disable external joins. |
That's generally right, although the discussion above is about how the "check that the epoch counter increases on rejoin" solution isn't really effective. I'm not aware of any way we could modify the "DS chooses which commit is right" approach to prevent malicious members from bricking the group, while also not breaking MLS's security. We talked about a lot of solutions that would create various wrinkles for an attacker but nothing robust. |
I already proposed for Bob to instead check that the new transcript is an extension of his last known good one. @Bren2010 do you think that still doesn't solve the security downgrade caused by your attack? Then the main question is still what we want in this document and I think here we agree that a description of the problem and an analysis of the risk is appropriate. What we don't seem to have agreement on is the inclusion of concrete, technical solutions. Or at least not which ones. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working with folks on this @Bren2010, this text looks good to me.
Out for a one-week consensus call; see email. |
draft-ietf-mls-architecture.md
Outdated
messages in the same order. This would allow clients to only apply the first | ||
valid Commit for an epoch and ignore subsequent ones. Clients that send a Commit | ||
would then wait to apply it until it's broadcast back to them by the Delivery | ||
Service, assuming they don't receive another Commit first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is broadcast...
they do not receive...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine modulo my editorial comments.
draft-ietf-mls-architecture.md
Outdated
Such “desynchronization” problems can arise even when the Delivery Service takes | ||
no stance on which Commit is "correct" for an epoch. The DS can enable clients | ||
to choose between Commits, for example by providing Commits in the order | ||
received when there are multiple, and allow clients to reject any Commits that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
received when there are multiple, and allow clients to reject any Commits that | |
received when and allow clients to reject any Commits that |
I don't think "when there are multiple" does anything here.
Co-authored-by: Eric Rescorla <[email protected]>
No description provided.