Skip to content

Performance Guide FAQ

William Silversmith edited this page Oct 2, 2024 · 2 revisions

CloudVolume has a variety of different subsystems and use cases, each of which may have particular optimizations for performing them at scale. This article makes an attempt to describe how to use CloudVolume optimally by use case.

Downloading a Set of Single Voxels

Naïvely, downloading a single voxel from any data type will be fairly slow. The image chunk must be downloaded, decoded, and then the point extracted. That's a lot of extra work performed in serial.

# Naive slow pseudocode for downloading a set of points

cv = CloudVolume(...)
pts = [ [100, 2012, 1113], [291, 3838, 120], ... ] 

for x,y,z in pts:
   label = cv[x,y,z][0,0,0,0] # slow

Here are four optimizations that will make this process much faster.

  1. Chunk Size Optimization: Use an appropriate chunk size and compression codec when writing the image.
  2. Concurrency: Download the required chunks in parallel.
  3. Caching: Retain frequently used chunks locally either in-memory or on-disk.
  4. Efficient Decoding: Exploit the structure of the compressed file to avoid unnecessary decoding work.

Unless you are willing to re-write the volume, chunk size optimization is not available to most users so I'll stick to a discussion of the other tactics.

cv.scattered_points implements concurrency, caching, and efficient decoding when available. It uses multiple threads to fetch image chunks. If you set a cache, it will be used to avoid re-downloading from the same point, and if the underlying format supports it, efficient decoding is used.

cv = CloudVolume(..., lru_bytes=int(200e6)) # in-memory Least Recently Used cache
pts = [ ... ]
results = cv.scattered_points(pts)

If you don't specify it up-front, the LRU will not be used. The LRU stores the image chunks in encoded form but with the bitstream compression codec stripped (e.g. gzip, brotli). This avoids some repeated decompression work and allows for exploiting the structure of some encodings. However, for raw encoding, a large amount of memory will be used for each chunk. Efficient decoding is supported for raw (trivial), compressed_segmentation (high efficiency), crackle (z-efficient), and compresso (z-efficient). high efficiency means a single voxel can be extracted without additional work. z-efficient means only a single z-slice needs to be decoded (so a 128x128x64 chunk would be decoded to a first approximation 64x faster). With additional work, it could be extended to fpzip, zfpc, jpeg, and jxl (at least under some settings).

For graphene volumes, it is even more recommended to use scattered_points because the additional decoding request to the PyChunkGraph server will be batched.