Extend security model against host compromise threats (a.k.a. "`append-only` and its friends") #5041

makkarpov · 2024-09-02T00:00:06Z

Currently, restic threat model assumes that host which is being backed up is fully trusted; that assumption is reasonable. For certain backends, there is a support for an append only mode, which helps in preventing compromised hosts from deleting their own uncompromised backups (i.e. during ransomware attacks). However, current implementation of append only mode has several (documented) drawbacks:

Enforcement of append-only backend policy does not provide security by itself: you have to notice documentation lines about how to use certain commands properly.
You need to have a separate (trusted) host to perform repository management such as backup rotation. This host needs to know your repository password, and therefore has full access to your data. It is impossible to run a shared server to perform management, but still keep your data unreadable to that server.

This feature request aims to improve such append only scenarios by allowing to further shift enforcement of certain invariants to the backend servers, and by separating encryption keys so that server is able to manage the repository without seeing the actual data. It also allows to make a snapshot without knowing the decryption keys, so that malicious client cannot retrospectively steal files that are now deleted from the host system, but are still present in backups. It naturally addresses the problem of master key rotation.

restic security model is extended with the following properties:

Assuming that backend server is running in a secure mode, adversary who compromises the host system cannot retroactively publish a snapshot which will trick restic forget into deletion of valid data.
Adversary who has access to management key can read only basic snapshot metadata such as creation time. Stored data is not readable.

I understand that this is a very fundamental change. It complicates restic key management model considerably, it adds another corner case to the restic code, all for an attempt to solve an ill-posed problem of obtaining secure backups out of untrusted hosts. This issue is more of a discussion rather than a direct feature request, at least for the time being. I would like to hear maintainer's feedback on the whole idea, whether they consider these costs and complications to be justified at all.

[outdated and misleading technical details removed, new details are to be presented in a separate document]

Did restic help you today? Did it make you happy in any way?

Sure it did. restic is the best backup tool that I know.

The text was updated successfully, but these errors were encountered:

MichaelEischer · 2024-09-07T17:32:24Z

The suggested changes require a major rework of the repository format. That definitely won't happen before the changes planned for restic 0.18 and probably also 0.19 are done.

It complicates restic key management model considerably, it adds another corner case to the restic code, all for an attempt to solve an ill-posed problem of obtaining secure backups out of untrusted hosts.

This problem is fundamentally impossible to solve. The best you can manage is to prevent compromised hosts from gaining access to data from older snapshots (confidentiality) and damaging/deleting the older snapshots (availability). I'll comment on those aspects in that order.

Note that all aspects include "older snapshots". After a compromise an attacker is free to add arbitrary garbage to new snapshots. Manipulating old snapshots (integrity) without causing validation errors is not possible as long as the snapshot file cannot be overwritten.

Confidentiality

After reading through the description several times, I'm still confused how the management key and master key work and which access they grant. Based on the name, the management key would grant access to everything necessary for pruning snapshots, that is everything except the data blobs. And the master key also grant access to all data blobs and all management keys.

According to the description the management key also encrypts the pack file header, which now includes some asymmetrically encrypted key derivation data that can only be decrypted using the private part of the master key. That seems to be the part that ensures that clients loose access to pack file contents once the backup is complete, hence, ensuring confidentiality even if the host is compromised later on. However, this is also a major headache for pruning. It would only allow shrinking pack files, but no longer allow blobs to be merged into new pack files. In addition, an attacker monitoring files at the server-side would learn about some blob boundaries as blobs are no longer reencrypted during prune.

Although not described, tree packs cannot use that encryption scheme as speeding up backups using parent snapshots wouldn't be possible otherwise. Similarly, it's unclear which blob types are encrypted using the management key. What about config, index and snapshots?

When a management key (or master key) is rotated, a fresh key is generated and old keys are included in the "outdated keys" field.

"outdated keys" field of what? I don't understand how management key rotation is supposed to work or what problem it is supposed to solve. prune and many other commands need full metadata access. This means that they also need access to all management keys. That is, unless every single file in a repository is updated to use the new management key, which obviously doesn't scale.

To support management key rotation, a concept of key selector is introduced. [...] Note that presence of random data makes these key selectors indistinguishable from the outside.

What is the purpose of hiding which key was used? What does an attacker learn by being able to easily tell whether key A or B was used? Without access to the keys, that information seems to be pretty useless for an attacker.

Master key becomes an asymmetric keypair (X25519 or X448). Private key is stored under passphrase protection, public key is stored in plaintext (encrypted by the management key only).

That should also include a post-quantum cipher.

Alternatives

#187 has already suggested using asymmetric encryption to prevent an attacker from reading data.

There's also a suggestion on the rest-server side to add a write-only mode (restic/rest-server#192). That could provide similar guarantees, but only work for rest-server.

Having some cryptographic solution to this problem could be useful, but will definitely require a lot more thought to do it properly.

Availability

The described changes only work with rest-server. The server would in some way need to get hold of the management key for each repository. Enforcing that new snapshots are created using an up-to-date timestamp would solve the particular forget issue, but also prevents commands like rewrite or backup --time ... from working. The latter sounds like an acceptable tradeoff though.

Assuming that backend server is running in a secure mode, adversary who compromises the host system cannot retroactively publish a snapshot which will trick restic forget into deletion of valid data.

Good luck with that. The encryption changes above create a new avenue for attacks. Previously, prune always decrypts, unpacks and hashes each blob before moving it to a new pack file. That means if multiple copies of a blob exist but all but one were written by an attacker, then prune would still find and retain the correct copy. With inaccessible blobs, prune would have to randomly pick a blob and would likely retain a broken copy, thus damaging the backup.

This host needs to know your repository password, and therefore has full access to your data. It is impossible to run a shared server to perform management, but still keep your data unreadable to that server.

As just mentioned this is necessary to not damage the repository during prune (I think there's something called verifyable encryption, but no idea what its overhead is or whether it could solve this problem).

Similarly, you have to trust the server running prune anyways as it could just fill up new data packs with garbage. That is, it is impossible to prevent that a compromised host running prune can damage older snapshots. This host could also just delete all snapshot files to make restoring harder. rest-server could probably try to validate everything, but at that point it's easier to run restic on the same host as rest-server (which must be trusted anyways).

In fact you always need a trusted host that from time to time verifies that new snapshots have been created and contain valid data. Although, I see some value in being able to split the prune and check roles onto separate hosts.

Alternatives

Several S3 storages offer an object lock feature which prevents changes to files: #3195 . This can be used to prevent attackers from permanently deleting data from a bucket. If files are locked for 30 days, then it's possible to (virtually) revert the bucket state to any point in time during that interval. Thereby granting access to the last snapshot before an attack. That also gives you time to verify that the result of running prune on an untrusted host is valid.

As the object lock also applies to snapshots, it would also make snapshot manipulations detectable.

Using the server-side modification timestamps it is possible to detect timestamp manipulations, although that depends on the server-side not allowing the client to pick a timestamp.

Relying on object locks and server-side metadata seems to be a more universal and less complex solution.

Integrity

The integrity of snapshots relies on an attacker being unable to modify the content of a snapshot file. Enforcing that the timestamp of a snapshot can be trusted, would prevent attackers from modifying older snapshots.

However, the forget command also needs the permission to delete snapshots. As rest-server does not know the used policy, it would simply have to allow any snapshot deletion. I mean it would be possible to add extra handling to only allow deleting snapshots older than X. But then rest-server could just implement object locks, which provide the same guarantees but are much more flexible.

Summary

Preventing an attacker from reading data in a repository has already been asked for in #187 and would be nice to have in general. The current proposal might be a starting point for that, but there's still a lot of problems to solve.

The probably more pressing problem are that of availability/integrity. Object locks plus periodic integrity checks are IMHO a better approach than the proposed rest-server changes.

MichaelEischer · 2024-09-07T17:35:20Z

Before discussing further solutions in more detail, we should first make sure we understand which problems we're actually trying to solve. Right now, those feel rather mixed up.

makkarpov · 2024-09-07T18:17:21Z

Implementation comments

After reading through the description several times, I'm still confused how the management key and master key work and which access they grant. Based on the name, the management key would grant access to everything necessary for pruning snapshots, that is everything except the data blobs. And the master key also grant access to all data blobs and all management keys.

According to the description the management key also encrypts the pack file header, which now includes some asymmetrically encrypted key derivation data that can only be decrypted using the private part of the master key. That seems to be the part that ensures that clients loose access to pack file contents once the backup is complete, hence, ensuring confidentiality even if the host is compromised later on.

That is correct. Management key grants access to all metadata (that is, anything except blob contents inside of pack files), master key also grants access to blob contents. Current PoC implementation that I'm preparing in background also allows to select which specific blobs are encrypted with with which specific key.

However, this is also a major headache for pruning. It would only allow shrinking pack files, but no longer allow blobs to be merged into new pack files.

Pack files are free to be re-arranged without re-encryption (it just means that key derivation data must be preserved). But the attack vector that involves tricking prune into retaining an incorrect blob is valid. Prune can still run in "decrypt-and-reencrypt" mode if master key is provided, and I will think further about pruning in management-only mode. First thought is just not to attempt to merge/deduplicate blobs in such mode, and retain everything that is referenced by at least one snapshot.

Although not described, tree packs cannot use that encryption scheme as speeding up backups using parent snapshots wouldn't be possible otherwise. Similarly, it's unclear which blob types are encrypted using the management key. What about config, index and snapshots?

All of these are encrypted with management key. Only blob data inside of pack files is encrypted with master key. But requirement for trees to be readable is something that I missed. Previously, I'm assumed that deduplication is done at SHA-256 layer by just skipping already existing blobs.

What is the purpose of hiding which key was used? What does an attacker learn by being able to easily tell whether key A or B was used? Without access to the keys, that information seems to be pretty useless for an attacker.

No particular purpose from my side and I'm OK with storing them in clear. But restic docs already include lines like "attacker would be able to deduce X from file modification times" (implying that even such small metadata leaks are at least worth mentioning), so I felt that it is better not to introduce another one.

That should also include a post-quantum cipher.

It should be relatively simple to include it retrospectively, given that key format would be extensible. "Post-quantum ciphers" sometimes turn out to be breakable on single CPU core in few hours, and I feel that it is better to wait. Also, Kyber (which is, AFAIK, the only NIST-approved DH alternative) is still not present in go crypto library at this time (golang/go#64537).

Servers and trust

This problem is fundamentally impossible to solve. The best you can manage is to prevent compromised hosts from gaining access to data from older snapshots (confidentiality) and damaging/deleting the older snapshots (availability). I'll comment on those aspects in that order.

That's why I called the problem ill-posed. Risk of adversary first silently breaking the backup process, then waiting for old snapshots to be deleted, and then deleting the current data is always a concern.

Key conception of the whole proposal is that I can more or less trust the remote server not to delete data. Risk of rogue (or compromised) servers could be mitigated e.g. by storing data in multiple places. Risks of data exposure are much harder to mitigate, since once data is exposed --- it is exposed forever, internet never forgets.

High-level goal of this proposal is to be able to run a shared server for backup management, and only trust this server to actually store your data without deleting/manipulating.

MichaelEischer · 2024-09-07T18:45:42Z

Pack files are free to be re-arranged without re-encryption (it just means that key derivation data must be preserved).

So essentially a pack header would end up containing multiple key derivation data sets? The main task of prune is to combine blobs from partially used pack files into fewer large pack files.

It should be relatively simple to include it retrospectively, given that key format would be extensible. "Post-quantum ciphers" sometimes turn out to be breakable on single CPU core in few hours, and I feel that it is better to wait.

That's why it's always recommended to combine both a traditional cipher and a post-quantum one. That way both ciphers would have to fail to cause a problem. But yes, we're not going to implement one ourselves.

Current PoC implementation that I'm preparing in background also allows to select which specific blobs are encrypted with with which specific key.

I've haven't spent much time on thinking about the lower-level details of the proposal. So be prepared for things to look very different in the end. That said, discussing this in detail will probably have to wait until after restic 0.18 and 0.19, which will take a long while.

Risk of rogue (or compromised) servers could be mitigated e.g. by storing data in multiple places.

That is assuming a compromised client cannot break all backups. But from what I've understood, your idea is to move pruning to the servers?

makkarpov · 2024-09-07T19:19:37Z

So essentially a pack header would end up containing multiple key derivation data sets? The main task of prune is to combine blobs from partially used pack files into fewer large pack files.

Yes, pack header could contain arbitrary number of wrapped blob keys. Newly created packs start with single key, merging also merges keys.

But from what I've understood, your idea is to move pruning to the servers?

Ideally. At least server could perform some conservative pruning by deleting all blobs that are definitely unused. But now I realize that pruning is quite a challenge when you cannot be sure that declared hash matches actually stored data.

That said, discussing this in detail will probably have to wait until after restic 0.18 and 0.19, which will take a long while.

Well, as I think, nothing prevents discussions and intermediate code reviews. All in all, mergeable code won't appear out of thin air.

makkarpov · 2024-09-07T19:36:05Z

Another approach to "blind prunes" is to make actual blob encryption deterministic and compute blob IDs from ciphertext hashes instead. These hashes are easily verifiable by server, and we could trust that non-compromised clients have computed their IDs correctly. This also solves index poisoning problem I mentioned earlier. This, however, would imply that key derivation must also be deterministic, and each blob must carry its own key derivation data as a part of hashed ciphertext - otherwise malicious clients could submit un-decryptable blobs with defective key derivation data, and server has no way of checking this.

A downside of this approach is that KDF data could consume measurable amount of space (especially with PQ algorithms), which could incur storage cost if you have a lot of small blobs, and also a performance cost since X25519 key exchange is required for every operation. Another downside is that compression (if used) must be deterministic as well, and different compression methods/levels would result in different blob IDs and thus duplicated data.

The only remaining conceptual issue (from what has been discussed) is tree blob readability.

MichaelEischer · 2024-09-08T20:10:39Z

Well, as I think, nothing prevents discussions and intermediate code reviews

Each of those require considerable amounts of time and therefore conflict with other work on restic. To give a concrete example: I had planned to work on #5021 this weekend but instead spent several hours on this and other discussions.

Another approach to "blind prunes" is to make actual blob encryption deterministic and compute blob IDs from ciphertext hashes instead.

That will also require a complete replacement of the currently used MAC. There are probably other problems beyond those already mentioned.

A downside of this approach is that KDF data could consume measurable amount of space (especially with PQ algorithms), which could incur storage cost if you have a lot of small blobs, and also a performance cost since X25519 key exchange is required for every operation.

An overhead of several milliseconds per blob is too much. Even for 1MB blocks, it would be a major bottleneck when trying to restore at 100MB/s per CPU core. That will only get worse for smaller files.

Another downside is that compression (if used) must be deterministic as well, and different compression methods/levels would result in different blob IDs and thus duplicated data.

I'm strictly against introducing such a requirement. We use https://github.com/klauspost/compress, which with every release typically optimizes the compression a bit. Pinning the library to a specific version is also not an option.

The only remaining conceptual issue (from what has been discussed) is tree blob readability.

The approach is still vague enough to me that I can't comment on that.

smiller255 · 2024-11-20T18:56:05Z

To confirm my understanding of what is discussed here: The main benefit of the proposed change in contrast to the currently available option of the append-only in rest server or rclone would be to safely allow management and pruning of backup data?

If I do not care about pruning/editing past backups there wouldn't be any advantages from this proposal, is that understanding correct? With storage space as cheap as it is (for most of the world) not being able to prune/delete old backups is likely an acceptable compromise for most users.

makkarpov · 2024-11-20T20:10:15Z

I'm preparing a completely new technical draft, so please disregard all technical details in above messages. But the goals remain intact.

Yes, this is the one of the goals. It is stemming from the fact that once you set up a backup of your machine and if you worry about this machine getting hacked, you need an another trusted machine to do pruning. Raspberry Pi? This could quickly get out of control, especially if you want to avoid a single point of exposure with a machine knowing all decryption keys of everyone else's machines and servers.

There are a useful side-effects still, some fundamental and some technical. Fundamentally, it would also allow for safer and more secure multi-machine backups into a single repository (therefore, having a multi-machine dedup). Technically, it will also (via underlying code refactors) allow things like encrypting your repository with a GPG key instead of passwords, and not having that GPG key present during the backups.

Current password-encryption system is going to remain, being just another implementation of a generic crypto interface.

Also, please don't mix actual storage prices with big tech companies overselling you stuff they don't expect you to fully utilize.

MichaelEischer added type: discussion undecided topics needing supplementary input misc: repository format issues requiring repository format changes labels Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend security model against host compromise threats (a.k.a. "`append-only` and its friends") #5041

Extend security model against host compromise threats (a.k.a. "`append-only` and its friends") #5041

makkarpov commented Sep 2, 2024 •

edited

Loading

MichaelEischer commented Sep 7, 2024

MichaelEischer commented Sep 7, 2024

makkarpov commented Sep 7, 2024 •

edited

Loading

MichaelEischer commented Sep 7, 2024

makkarpov commented Sep 7, 2024 •

edited

Loading

makkarpov commented Sep 7, 2024 •

edited

Loading

MichaelEischer commented Sep 8, 2024

smiller255 commented Nov 20, 2024

makkarpov commented Nov 20, 2024 •

edited

Loading

Extend security model against host compromise threats (a.k.a. "append-only and its friends") #5041

Extend security model against host compromise threats (a.k.a. "append-only and its friends") #5041

Comments

makkarpov commented Sep 2, 2024 • edited Loading

Did restic help you today? Did it make you happy in any way?

MichaelEischer commented Sep 7, 2024

Confidentiality

Alternatives

Availability

Alternatives

Integrity

Summary

MichaelEischer commented Sep 7, 2024

makkarpov commented Sep 7, 2024 • edited Loading

Implementation comments

Servers and trust

MichaelEischer commented Sep 7, 2024

makkarpov commented Sep 7, 2024 • edited Loading

makkarpov commented Sep 7, 2024 • edited Loading

MichaelEischer commented Sep 8, 2024

smiller255 commented Nov 20, 2024

makkarpov commented Nov 20, 2024 • edited Loading

Extend security model against host compromise threats (a.k.a. "`append-only` and its friends") #5041

Extend security model against host compromise threats (a.k.a. "`append-only` and its friends") #5041

makkarpov commented Sep 2, 2024 •

edited

Loading

makkarpov commented Sep 7, 2024 •

edited

Loading

makkarpov commented Sep 7, 2024 •

edited

Loading

makkarpov commented Sep 7, 2024 •

edited

Loading

makkarpov commented Nov 20, 2024 •

edited

Loading