Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate caching of decrypted data #17

Open
nerdstein opened this issue Sep 30, 2015 · 15 comments
Open

Evaluate caching of decrypted data #17

nerdstein opened this issue Sep 30, 2015 · 15 comments

Comments

@nerdstein
Copy link
Contributor

The whole point of encryption would be flawed if the caching system stored a decrypted field value (e.g. rendering/render array).

How do we solve this problem?

@damontgomery
Copy link
Contributor

@nerdstein, when I was working on this module, the stored values were encrypted and the decryption occurred after the entities are loaded into memory. Did you find a scenario where they were not?

@nerdstein
Copy link
Contributor Author

It came up in discussion, I think it was more or less trying to understand exactly what the behavior is here. I couldn't answer it intelligently

@svendecabooter
Copy link
Contributor

In my installation, the unencrypted data can still be found in the "cache_render" table.

Wim Leers gave some feedback on IRC on how to avoid render caching for specific fields:
WimLeers: svendecabooter: set max-age=0 on those fields' formatters
WimLeers: svendecabooter: you could do interesting things with decorating the cache back-end and automatically writing cache entries with a CID matching a certain prefix to /dev/null — that way you could disable render caching for particular things only

To be investigated futher, as I'm not (yet) closely familiar with the inner workings of render caching.

@svendecabooter
Copy link
Contributor

I don't think we can one-sidedly decide whether to cache the rendered output of encrypted field or not.

For big sites it will have a huge performance cost if all the encrypted fields have to be decrypted on the fly, and pages containing those values can never be cached. I would assume a site might decide it wants the rendered output cached, and save that into memcache or something like that . That way the unencrypted data would not be found if the database got compromised, and the performance impact would be kept to a minimum.

Whether the rendered output gets cached might also differ on a field-by-field basis I can imagine. Then it would be up to the site maintainer to set up a sensible caching strategy, depending on their security vs performance needs.

I'm not sure what tools / functionalities we could set in place to make it easy for site maintainers to decide on these things, and if we need to provide anything at all, that core doesn't offer.

Could we add support for BigPipe, to add the possibility to load encrypted fields as the dynamically loaded content on the page via BigPipe?

@damontgomery
Copy link
Contributor

I will defer since I'm not actively involved. My recommendation is that if you enable encryption, unencrypted content is never stored in the database anywhere. This would disable caching for all of those entities and with cache tags, only certain entities would need to be loaded on each section.

I guess what I was unclear at the time was whether cache tags cached the entity and then still rendered it later or if it stored the rendered entity. It sounds like it's a combination of both which was an oversight on my part.

TLDR: keep it simple, if you install the module, you expect the data to never show up in the database, so have it never show up.

@wimleers
Copy link

If you follow @damontgomery's rationale, then, when the field_encrypt module is installed, you:

  • should never be able to access the keys used for encryption, otherwise all this effort is in vain (= keys safety)
  • expect that all responses that contain any data decrypted by field_encrypt should never be cached by Drupal's dynamic_page_cache, page_cache or even any proxies. i.e. you want Cache-Control: private, max-age=0, no-cache or something like that (= response level)
  • expect that any fragments of the page that contain any data decrypted by field_encrypt should never be cached (= fragment level)
  • expect that any entity data loaded while answering a request that contain any data decrypted by field_encrypt should never be cached, except perhaps the static caching (= entity level)
  • expect that any exception that contains any data decrypted by field_encrypt can ever be logged with said decrypted data (= exception handling)

Is that what you actually want? Should it be configurable?

Without answers to those high-level objects, we can go in circles without end. So, first we need answers to those questions

@svendecabooter
Copy link
Contributor

Thanks for your feedback Wim.

Here's some feedback:

Key management is handled by the Key module (https://www.drupal.org/project/key). It provides KeyProvider plugins, which would allow to store the keys in various locations. This can be offsite or in another safe place. So within the context of field_encrypt module we can assume that the keys will be stored safely.

This module seems (to me) focussed on securing the permanent storage of the field data in the Drupal storage layer. For the caching layers on top it's a lot less clear how far this module should go. At some point the data will need to be decrypted anyway before being sent to the end user, and theoretically there will probably always be a way to capture the data in between those two steps.

We could focus on making sure no unencrypted data is found in the database cache, but people could also use Memcache, Redis, etc... as a caching backend. To be consistent, we would then also need to make sure these systems don't contain unencrypted data. So then we should avoid it entering the cache in the first place. That would certainly be the most secure option.
But that will probably clash with people wanting to use the module on High Availability sites, who might want to prefer having unencrypted data in memory, rather than having a huge performance hit.

@nerdstein: do you have input on what the most common scenario's are regarding Wim's questions in the D7 install base, and thus were we should probably focus on?

@damontgomery
Copy link
Contributor

Wim, it makes sense to me. I don't see the point of the module if it isn't black and white, although I'm not a lawyer. In reality it is likely a gray area.

@arknoll
Copy link

arknoll commented Mar 24, 2016

  • should never be able to access the keys used for encryption, otherwise all this effort is in vain (= keys safety)

The responsibility for this is the key module

  • expect that all responses that contain any data decrypted by field_encrypt should never be cached by Drupal's dynamic_page_cache, page_cache or even any proxies. i.e. you want Cache-Control: private, max-age=0, no-cache or something like that (= response level)

True: we want no record anywhere of the decrypted data. It gets decrypted on the fly in memory when requested by an authorized user, and is never stored in a permanent fashion.

  • expect that any fragments of the page that contain any data decrypted by field_encrypt should never be cached (= fragment level)

True: no caching of the data, anywhere.

  • expect that any entity data loaded while answering a request that contain any data decrypted by field_encrypt should never be cached, except perhaps the static caching (= entity level)

The rule is to never cache any encrypted data. If that means that we can't cache any piece of an entity if some of its fields are encrypted, then this statement is true.

  • expect that any exception that contains any data decrypted by field_encrypt can never be logged with said decrypted data (= exception handling)

True.

@arknoll
Copy link

arknoll commented Mar 24, 2016

The way we are using this module is that fields will be encrypted when unpublished and then we'll go through a process that will decrypt and store decrypted once the underlying entity is published. That way, in theory, we can limit the performance implications to only unpublished entities.

@nerdstein
Copy link
Contributor Author

In principle, I agree with this entire thread. In fact, I think we've made great strides to do just that. In practice, I exercise on the side of configurability for use cases we may not be aware of. Caching is a system that is wide reaching.

There is a classic example of this with the render cache. Encrypted data is decrypted in memory upon load. But, it goes through a (potential) mediation process of rendering. This isn't as cut and dry as "don't cache encrypted data" because the render cache has been processed by code and the result may or may not reflect the decrypted data.

At one point, we discussed having a checkbox for "Should this be encrypted in the render cache?" ( @svendecabooter please educate me if this feature currently exists because I know we discussed it at length). This allowed discretionally encrypting or decrypting the render cache (again, there is "gray area" where the render cache does not actually store the decrypted data). I'd prefer to understand comprehensively where this data might be stored (in cache or otherwise), and understand if we want to provide configuration discretionally, always, or never. This discussion should include metadata as well.

@wimleers
Copy link

The way we are using this module is that fields will be encrypted when unpublished and then we'll go through a process that will decrypt and store decrypted once the underlying entity is published. That way, in theory, we can limit the performance implications to only unpublished entities.

Interesting! I didn't expect this to be possible.

If I get full confirmation of what I wrote from everybody (and it looks like this is going to be the case), then the answer is fairly simple: max-age=0 for ALL THE THINGS (see https://www.drupal.org/developing/api/8/cache/max-age). But we'll want test coverage for each of the levels (I should've said layers) I mentioned above. And there will likely be some trickiness with getting that to work.

I can help with that trickiness. Others can already write the test coverage with those expectations (yay TDD).

@nerdstein
Copy link
Contributor Author

@wimleers can you work with @svendecabooter to come up with a proposed strategy for this?

@svendecabooter
Copy link
Contributor

First attempt at making the caching of a particular encrypted field configurable:
#46

Some review of what I'm attempting would be awesome, since I have the feeling I'm only scratching the surface of understanding the whole Drupal 8 caching system...

@dawehner
Copy link

What would happen if you set max-age=0 but also provide a lazy-loader all the time. Then all the rest of the render array can be cached and just the encrypted parts could be rendered always on runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants