Skip to content

Caching results from common API calls

Jonathan A Rees edited this page Jan 25, 2016 · 6 revisions

The Open Tree webapps make frequent calls to a few API methods, often requesting the same data (e.g. arguson views of major clades). For performance reasons, and to relieve stress on the API servers, we've opted to cache these results using web2py.

Absent caching, a web client (such as a webapp) makes a request to Apache on the API server, which in turn makes an HTTP request to one of several subsystems (e.g. treemachine). Caching could be done on the client side (using HTTP client caching supported by server cache headers such as Etag:) or in Apache (using built-in proxy caching features). With this solution, caching the subsystem response is done by web2py, which communicates with Apache.

This solution assumes that all cached APIs (currently treemachine and taxomachine) are in the same domain as phylesystem-api, as in our standard configuration. Systems that distribute APIs across multiple domains should either use non-caching method URLs or modify the cached action below to work across domains.

Using web2py's @cache decorator

Initially, cached values are stored in RAM and set to never expire. To clear all cached values (after each synthesis release or other change in source data), simply restart web2py.

Web2py uses a @cache decorator to designate controller actions whose responses will be cached. Arguments to this decorator are evaluated on each request, include one which lets us define a unique cache key for each method call and its arguments, for example:

  taxomachine/v1/getContextsJSON
  treemachine/v1/getSyntheticTree?format=arguson&maxDepth=3&subtreeNodeID=170042&treeID=otol.draft.22

The "query string" portion of this key is reconstructed from request.vars, so it captures all arguments, whether originally sent via GET or POST. For example, this is needed to distinguish calls to getSyntheticTree, which would otherwise all return a single response (a single arguson view).

When a cached controller action is called, web2py will check the cache to see if there's a response under the proposed key. If found, this is returned immediately; if not found, the controller action is called normally, and its result stored in the cache before being returned to the caller.

How to call APIs for cached results

Since phylesystem-api (a web2py app) is the default recipient for calls to api.opentreeoflife.org, we've added the caching hooks there. This also makes for a single, generic controller action cached in the default controller. Any Open Tree API method can be called via this proxy, simply by adding cached/ immediately after the domain for an API method.

For example, the tree-view app loads arguson views of a target clade (and its nearby descendants) using this method:

https://api.opentreeoflife.org/treemachine/v1/getSyntheticTree

To cache the results for next time (or retrieve the cached results quickly):

https://api.opentreeoflife.org/cached/treemachine/v1/getSyntheticTree

That's it! To use caching for common API calls in the tree-view app, we've simply modified its config file to include CACHED_ versions of some API base URLs, and updated cache-worthy method URLs to use them.

Clearing stale cache entries

For tree-view (arguson) responses, this is done by simply restarting apache, which resets the RAM cache.

We're also using the @cache decorator in the main webapp to cache "local comments" (tied to a particular node, taxon, or URL) in RAM. These are subject to more frequent changes, from the curation UI or users working directly in the 'feedback' issue tracker on GitHub.

This is handled (imperfectly) using a combination of methods:

  • A GitHub webhook provides notification when an issue (or issue comment) is created. This pings the /plugin_localcomments/clear_local_comments (described here), which analyses the JSON payload and tries to clear only the related cache items.

  • The "delete comment" and "close issue" buttons in curation UI will also clear the cache. Currently this is a brute-force clearing of all cached comments, since we lack the context to be more discriminating.

  • Since GitHub's webhook can't be triggered by modifying or deleting comments on GitHub itself, we only store these results for 5 minutes. This should hopefully provide a balance of performance under load and adequate freshness.

Special considerations for disk-based caching

If caching these variables in RAM creates problems, we can change a single line of code to switch to a filesystem-based "disk cache". These values would survive a web2py restart, so the cache would need to be cleared explicitly using either of these web2py cache methods:

cache.ram(key, None)          # clear a single value using its unique key
cache.ram.clear(regex='...')  # clear all values with keys matching this regex
Clone this wiki locally