Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache re-validation strategy to avoid cache stampede #15

Open
Benoss opened this issue Nov 20, 2024 · 3 comments
Open

Cache re-validation strategy to avoid cache stampede #15

Benoss opened this issue Nov 20, 2024 · 3 comments

Comments

@Benoss
Copy link

Benoss commented Nov 20, 2024

I am trying to find a library that can tackle caching and cache re-validation.

For example when the cache is not available, I would like the first request execute the function but concurrent ones to wait until the first has finished instead of executing the function as well.

This is to avoid https://en.wikipedia.org/wiki/Cache_stampede
a good resource on the subject: https://grantjenks.com/docs/diskcache/case-study-landing-page-caching.html

Here is a simple example where I would expect only one concurrent API call at a time

import cachebox
from cachebox import TTLCache
import httpx
from concurrent import futures
import time
import logging

logging.basicConfig(level=logging.DEBUG)

mycache = TTLCache(0, ttl=3)

@cachebox.cached(mycache)
def sync_call() -> dict:
    logging.info("Httpx Call")
    res = httpx.get("https://fakeresponder.com/?sleep=2000")
    data = res.json()
    return data


if __name__ == "__main__":
    with futures.ThreadPoolExecutor(max_workers=5) as executor:
        for _ in range(1, 10):
            future_list = [executor.submit(sync_call) for _ in range(10)]
            for future in futures.as_completed(future_list):
                logging.info(f"got result: {future.result()}")

            time.sleep(1)
@ecarrara
Copy link
Contributor

Hey @Benoss, I built a library called yapcache that does what you're looking for - It makes sure only one request executes the underlying function when the cache is empty, to prevent stampedes. And it uses the cachebox library under the hood.

# ...
from yapcache import memoize
from yapcache.caches import InMemoryCache
from yapcache.distlock import RedisDistLock

cache = InMemoryCache(maxsize=2_000)   # uses cachebox TTLCache

@memoize(
    cache,
    ttl=60,
    cache_key=lambda n: f"fn1-{n}",
    lock=lambda key: RedisDistLock(redis_client, key),
)
async def fn1(n: int):
    logging.info("Httpx Call")
    res = httpx.get("https://fakeresponder.com/?sleep=2000")
    data = res.json()
    return data

@awolverp
Copy link
Owner

thank you for your issue ❤️
It takes a long time; I'll do it whenever I get the chance to

@Benoss
Copy link
Author

Benoss commented Nov 21, 2024

Hey @Benoss, I built a library called yapcache that does what you're looking for - It makes sure only one request executes the underlying function when the cache is empty, to prevent stampedes. And it uses the cachebox library under the hood.

Thanks, I will give it a go. Love the name of the lib BTW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants