Cache re-validation strategy to avoid cache stampede #15

Benoss · 2024-11-20T09:51:02Z

I am trying to find a library that can tackle caching and cache re-validation.

For example when the cache is not available, I would like the first request execute the function but concurrent ones to wait until the first has finished instead of executing the function as well.

This is to avoid https://en.wikipedia.org/wiki/Cache_stampede
a good resource on the subject: https://grantjenks.com/docs/diskcache/case-study-landing-page-caching.html

Here is a simple example where I would expect only one concurrent API call at a time

import cachebox
from cachebox import TTLCache
import httpx
from concurrent import futures
import time
import logging

logging.basicConfig(level=logging.DEBUG)

mycache = TTLCache(0, ttl=3)

@cachebox.cached(mycache)
def sync_call() -> dict:
    logging.info("Httpx Call")
    res = httpx.get("https://fakeresponder.com/?sleep=2000")
    data = res.json()
    return data


if __name__ == "__main__":
    with futures.ThreadPoolExecutor(max_workers=5) as executor:
        for _ in range(1, 10):
            future_list = [executor.submit(sync_call) for _ in range(10)]
            for future in futures.as_completed(future_list):
                logging.info(f"got result: {future.result()}")

            time.sleep(1)

ecarrara · 2024-11-20T14:03:18Z

Hey @Benoss, I built a library called yapcache that does what you're looking for - It makes sure only one request executes the underlying function when the cache is empty, to prevent stampedes. And it uses the cachebox library under the hood.

# ...
from yapcache import memoize
from yapcache.caches import InMemoryCache
from yapcache.distlock import RedisDistLock

cache = InMemoryCache(maxsize=2_000)   # uses cachebox TTLCache

@memoize(
    cache,
    ttl=60,
    cache_key=lambda n: f"fn1-{n}",
    lock=lambda key: RedisDistLock(redis_client, key),
)
async def fn1(n: int):
    logging.info("Httpx Call")
    res = httpx.get("https://fakeresponder.com/?sleep=2000")
    data = res.json()
    return data

awolverp · 2024-11-21T07:30:56Z

thank you for your issue ❤️
It takes a long time; I'll do it whenever I get the chance to

Benoss · 2024-11-21T07:56:02Z

Hey @Benoss, I built a library called yapcache that does what you're looking for - It makes sure only one request executes the underlying function when the cache is empty, to prevent stampedes. And it uses the cachebox library under the hood.

Thanks, I will give it a go. Love the name of the lib BTW

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache re-validation strategy to avoid cache stampede #15

Cache re-validation strategy to avoid cache stampede #15

Benoss commented Nov 20, 2024 •

edited

Loading

ecarrara commented Nov 20, 2024

awolverp commented Nov 21, 2024

Benoss commented Nov 21, 2024

Cache re-validation strategy to avoid cache stampede #15

Cache re-validation strategy to avoid cache stampede #15

Comments

Benoss commented Nov 20, 2024 • edited Loading

ecarrara commented Nov 20, 2024

awolverp commented Nov 21, 2024

Benoss commented Nov 21, 2024

Benoss commented Nov 20, 2024 •

edited

Loading