Add disk caching #90

benjeffery · 2023-09-12T11:36:33Z

WIP. Currently the test tree sequence has no UUID so trying to find a way to get one.

jeromekelleher · 2023-09-12T16:03:13Z

LGTM - does it work?

We'll want to integrate with the CLI, have you thought about how to configure the cache dir? (I was think appdirs might be a good approach, ultimately?)

jeromekelleher

see comment about cache version

model.py

jeromekelleher · 2023-09-12T18:08:16Z

Easiest way to get a uuid is to save to tempfile and read back in

benjeffery · 2023-09-13T17:04:33Z

I've modified this to use appdirs and to cache based on the hash of files of functions in the calling tree of the cached function. This means that changes to pages code won't trigger an invalidation, which will help a lot with dev.

jeromekelleher

I think we should merge this so we have it, but it's probably overkill in a few different ways.

diskcache is really quite complex. After spending 10 mins with the docs, it looks like it's using relational DB plus a bunch of stuff around timing out keys. We really don't need this, and it also means we'd have to be much more careful about managing opening and closing the cache than we are here. We could just store the files (as CSVs, even) based on their keys and it would be much simpler and easier to maintain.
The key generation policy is clever, but expensive to run and hard to test. Just incrementing a number when we change the dataframe format isn't that hard, and it'll make key generation fast.

jeromekelleher · 2023-09-14T08:08:59Z

cache.py

+    try:
+        os.makedirs(cache_dir)
+        logger.info(f"Set cache_dir to {cache_dir}")
+    except OSError:


This could also happen, e.g. if there wasn't sufficient permissions, or if it already existed and was a file, right? So, the log error will be misleading.

Why not use the pathlib API directly?

cache_dir.mkdir(exist_ok=True, parents=True)

I think this takes care of all the required corner cases and raises a sensible error on weirdnesses.

Nice, much simpler. Fixed.

cache.py

jeromekelleher · 2023-09-14T08:23:37Z

model.py

+    @property
+    def file_uuid(self):
+        return self.ts.file_uuid
+
    @cached_property


right, so there's two layers of caching here. I had to think a bit about what this means - so the first time the property is called, we fall through to the disk_cache version, and then afterwards (within that process) it'll be cached by functools.

benjeffery · 2023-09-19T12:37:58Z

diskcache is really quite complex. After spending 10 mins with the docs, it looks like it's using relational DB plus a bunch of stuff around timing out keys. We really don't need this, and it also means we'd have to be much more careful about managing opening and closing the cache than we are here. We could just store the files (as CSVs, even) based on their keys and it would be much simpler and easier to maintain.

From the diskcache docs: "Cache objects are thread-safe and may be shared between threads. Two Cache objects may also reference the same directory from separate threads or processes. In this way, they are also process-safe and support cross-process communication." So I don't think we need to worry about opening and closing

The key generation policy is clever, but expensive to run and hard to test. Just incrementing a number when we change the dataframe format isn't that hard, and it'll make key generation fast.

Yeah, fair point. I replaced it with a version string.

jeromekelleher · 2023-09-19T13:31:52Z

Did you push the changes here @benjeffery?

benjeffery · 2023-09-19T16:15:16Z

Whoops, should be there now!

jeromekelleher · 2023-09-19T16:21:21Z

From the diskcache docs: "Cache objects are thread-safe and may be shared between threads. Two Cache objects may also reference the same directory from separate threads or processes. In this way, they are also process-safe and support cross-process communication." So I don't think we need to worry about opening and closing

In principle - in practise, sqlite hasn't worked well in my experience without carefully handling the context managers. Let's try it out and see anyway.

benjeffery force-pushed the caching branch from cb89339 to 1b47747 Compare September 12, 2023 13:57

jeromekelleher reviewed Sep 12, 2023

View reviewed changes

model.py Outdated Show resolved Hide resolved

benjeffery force-pushed the caching branch from 1b47747 to d23c28a Compare September 13, 2023 17:00

benjeffery marked this pull request as ready for review September 13, 2023 17:02

jeromekelleher approved these changes Sep 14, 2023

View reviewed changes

Add disk caching

d220b41

benjeffery force-pushed the caching branch from d23c28a to d220b41 Compare September 19, 2023 16:14

jeromekelleher merged commit e85d3b7 into tskit-dev:main Sep 19, 2023
3 checks passed

benjeffery deleted the caching branch September 20, 2023 06:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add disk caching #90

Add disk caching #90

benjeffery commented Sep 12, 2023 •

edited

Loading

jeromekelleher commented Sep 12, 2023

jeromekelleher left a comment

jeromekelleher commented Sep 12, 2023

benjeffery commented Sep 13, 2023

jeromekelleher left a comment

jeromekelleher Sep 14, 2023

benjeffery Sep 19, 2023

jeromekelleher Sep 14, 2023

benjeffery commented Sep 19, 2023

jeromekelleher commented Sep 19, 2023

benjeffery commented Sep 19, 2023

jeromekelleher commented Sep 19, 2023

Add disk caching #90

Add disk caching #90

Conversation

benjeffery commented Sep 12, 2023 • edited Loading

jeromekelleher commented Sep 12, 2023

jeromekelleher left a comment

Choose a reason for hiding this comment

jeromekelleher commented Sep 12, 2023

benjeffery commented Sep 13, 2023

jeromekelleher left a comment

Choose a reason for hiding this comment

jeromekelleher Sep 14, 2023

Choose a reason for hiding this comment

benjeffery Sep 19, 2023

Choose a reason for hiding this comment

jeromekelleher Sep 14, 2023

Choose a reason for hiding this comment

benjeffery commented Sep 19, 2023

jeromekelleher commented Sep 19, 2023

benjeffery commented Sep 19, 2023

jeromekelleher commented Sep 19, 2023

benjeffery commented Sep 12, 2023 •

edited

Loading