Add ability for aggressive cache cleanup to only consider Kraken's disk usage #385

Anton-Kalpakchiev · 2024-11-22T16:22:42Z

Why
Currently, origin, agent, proxy, and build-index all have caches, which store different data. These caches have different management policies, such as LRU or a simple TTL + TTI.

In the past, the TTL + TTI policy was failing, as during peak hours the origin's cache was getting full before the TTL/TTI expired. To address this, an aggressive cache cleanup policy was added (#335), which works as follows: use the configured TTL + TTI policy, until the disk utilization of the filesystem Kraken is mounted on is above a certain % threshold. Then, use a more aggressive TTL and TTI, provided by the .yaml config, until disk utilization falls below the threshold.

This policy works well when kraken's disk is not shared with other services, e.g. when each service has its own host. However, the policy doesn't work when the Kraken service shares disk with other non-Kraken services. In that scenario, Kraken decides how to manage its cache based on the disk usage of other services on the same host/filesystem. For instance, the agent could clear its own cache just because another service on the host is using up a lot of disk. This is not intended -- while the agent shouldn't use too much disk on a host, it should not try to compensate for another service using too much disk.

To address this, a second option can be added for the aggressive cache cleanup. Instead of using the filesystem's disk utilization, we could use only Kraken's disk utilization. For example, with this option, if the filesystem is 1GB and the threshold is 20%, Kraken will aggressively clean its cache only when its size is more than 200MB, regardless of whether the other 800MB are used or not. This way Kraken would manage its cache only based on its own usage patterns. This option will be configurable through the .yaml config.

What

Add an option for aggressive cache cleanup to trigger when only Kraken's disk usage passes a threshold.
Keep the option to take into account the whole filesystem's disk utilization.
Choosing one of the two options above should be configurable through the .yaml config.
If no option is specified, the default should be using the filesystem's utilization (the current implementation).

Anton-Kalpakchiev self-assigned this Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability for aggressive cache cleanup to only consider Kraken's disk usage #385

Add ability for aggressive cache cleanup to only consider Kraken's disk usage #385

Anton-Kalpakchiev commented Nov 22, 2024 •

edited

Loading

Add ability for aggressive cache cleanup to only consider Kraken's disk usage #385

Add ability for aggressive cache cleanup to only consider Kraken's disk usage #385

Comments

Anton-Kalpakchiev commented Nov 22, 2024 • edited Loading

Anton-Kalpakchiev commented Nov 22, 2024 •

edited

Loading