Skip to content

Chunked Object Store

elandau edited this page May 1, 2012 · 9 revisions

Storing large objects in cassandra has to be done carefully since it can cause excessive heap pressure and hot spots. Astyanax provides utility classes that address this issues by splitting up large objects into multiple keys and handles fetching them in random order to reduce hot spots.

Storing an object
ChunkedStorageProvider provider = new CassandraChunkedStorageProvider(keyspace, CF_DATA.getName());
         
ObjectMetadata meta = ChunkedStorage.newWriter(provider, objName, someInputStream)
    .withChunkSize(0x1000)    // Optional chunk size to override the default for this provider
    .withConcurrencyLevel(8)  // Upload chunks in 8 threads
    .withTtl(60)                     //  Optional TTL for the entire object
    .call();

Reading an object
ObjectMetadata meta2 = ChunkedStorage.newInfoReader(provider, objName)
    .withBatchSize(11)        // Randomize fetching blocks within a batch.  The batch should be a small multiple of the number of nodes
    .withRetryPolicy(new ExponentialBackoffWithRetry(250,20))  // Retry policy for when a chunk isn't available.  This helps implement retries in a cross region setup where replication may be slow
    .withConcurrencyLevel(2)  // Download chunks in 2 threads.  Be careful here.  Too many client + too many thread = Cassandra not happy
    .call();

Deleting an object
ChunkedStorage.newDeleter(provider, objName).call();
EDIT
Getting object info
Use this to determine the object size when creating a ByteArrayInputStream.

ObjectMetadata meta = ChunkedStorage.newInfoReader(provider, objName).call();
int objectSize = meta.getObjectSize();

Clone this wiki locally