Skip to content

Paginate through DynamoDB items with ease and enable client pagination using encrypted tokens.

License

Notifications You must be signed in to change notification settings

emdgroup/dynamodb-paginator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache License Sponsored by EMD Group Downloads

DynamoDB Paginator

Features:

  • Supports Binary and String key types
  • Generates AES256 encrypted and authenticated pagination tokens
  • Works with TypeScript type guards natively
  • Ensures a minimum number of items when using a FilterExpression
  • Compatible with AWS SDK v2 and v3
  • Supports pagination over segmented parallel scans

Pagination in NoSQL stores such as DynamoDB can be challenging. This library provides a developer friendly interface around the DynamoDB Query and Scan APIs. It generates and encrypted and authenticated pagination token that can be shared with an untrustworthy client (like the browser or a mobile app) without disclosing potentially sensitive data and protecting the integrity of the token.

Why is the pagination token encrypted?

When researching pagination with DynamoDB, you will come across blog posts and libraries that recommend to JSON-encode the LastEvaluatedKey attribute (or even the whole query command). This is dangerous!

The token is sent to a client which can happily decode the token, look at the values for the partition and sort key and even modify the token, making the application vulnerable to NoSQL injections.

How is the pagination token encrypted?

The encryption key passed to the paginator is used to derive an encryption and a signing key using an HMAC.

The LastEvaluatedKey attribute is first flattened by length-encoding its datatypes and values. The encoded key is then encrypted with the encryption key using AES-256 in CBC mode with a randomly generated IV.

The additional authenticated data (AAD), the IV, the ciphertext and an int64 of the length of the AAD are concatenated to form the message to be signed.

The encrypted and signed pagination token is then returned by concatenating the IV, ciphertext and the first 16 bytes of the HMAC-SHA256 of the message using the signing key.

"Dance like nobody is watching. Encrypt like everyone is." -- Werner Vogels

Usage

import { Paginator } from '@emdgroup/dynamodb-paginator';
import { DynamoDB } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocument } from '@aws-sdk/lib-dynamodb';

import * as crypto from 'crypto';

const client = DynamoDBDocument.from(new DynamoDB({}));
// persist the key in the SSM parameter store or similar
const key = crypto.randomBytes(32);

const paginateQuery = Paginator.createQuery({
  key,
  client,
});

const paginator = paginateQuery({
  TableName: 'MyTable',
  KeyConditionExpression: 'PK = :pk',
  ExpressionAttributeValues: {
      ':pk': 'U#ABC',
  },
});

Use for await...of syntax:

for await (const item of paginator) {
  // do something with item

  // only work on the first 50 items,
  // then generate a pagination token and break.
  if (paginator.count === 50) {
    console.log(paginator.nextToken);
    break;
  }
}

items.requestCount; // number of requests to DynamoDB

Use await all() syntax:

const items = await paginator.limit(50).all();
paginator.nextToken;

const nextPaginator = paginator.from(paginator.nextToken);
nextPaginator.all(); // up to 50 more items

Use TypeScript guards to filter for items:

interface User {
  PK: string;
  SK: string;
}

function isUser(arg: Record<string, unknown>): args is User {
  return typeof arg.PK === 'string' && 
    typeof arg.SK === 'string' &&
    arg.PK.startsWith('U#');
}

for await (const user of paginator.filter(isUser)) {
  // user is of type User
}

Paginator

The Paginator class is a factory for the PaginationResponse object. This class is instantiated with a 32-byte key and the DynamoDB document client (versions 2 and 3 of the AWS SDK are supported).

const paginateQuery = Paginator.createQuery({
  key: () => Promise.resolve(crypto.randomBytes(32)),
  client: documentClient,
});

To create a paginator over a scan operation, use createScan.

const paginateScan = Paginator.createScan({
  key: () => Promise.resolve(crypto.randomBytes(32)),
  client: documentClient,
});

Parallel Scans

This library also supports pagination over segmented parallel scans. This is useful when you have a large table and want to parallelize the scan operation to reduce the time it takes to scan the whole table.

To create a paginator over a segmented scan operation, use createParallelScan.

const paginateParallelScan = Paginator.createParallelScan({
  key: () => Promise.resolve(crypto.randomBytes(32)),
  client: documentClient,
});

Then, create a paginator and pass the segments parameter.

const paginator = paginateParallelScan({
  TableName: 'MyTable',
  Limit: 250,
}, { segments: 10 });

await paginator.all();

The scan will be executed in parallel over 10 segments. The paginator will return the items in the order they are returned by DynamoDB which might deliver items from different segments out of order. Refer to the following waterfall diagram for an example. The parallel scan was executed over a high-latency connection to better illustrate the variability in the requests and responses. Even though the Limit is set to 250, DynamoDB will return on occasion less than 250 items per segment. The paginator will continue to request items until all segments have been exhausted.

parallel scan

Constructors

constructor

new Paginator(args)

Use the static factory function create() instead of the constructor.

Parameters

Name Type
args PaginatorOptions

Methods

createParallelScan

Static createParallelScan(args): <T>(scan: ScanCommandInput, opts: PaginateQueryOptions<T> & { segments: number }) => ParallelPaginationResponse<T>

Returns a function that accepts a DynamoDB Scan command and return an instance of PaginationResponse.

Parameters

Name Type
args PaginatorOptions

Returns

fn

▸ <T>(scan, opts): ParallelPaginationResponse<T>

Returns a function that accepts a DynamoDB Scan command and return an instance of PaginationResponse.

Type parameters
Name Type
T extends AttributeMap
Parameters
Name Type
scan ScanCommandInput
opts PaginateQueryOptions<T> & { segments: number }
Returns

ParallelPaginationResponse<T>


createQuery

Static createQuery(args): <T>(query: QueryCommandInput, opts?: PaginateQueryOptions<T>) => PaginationResponse<T>

Returns a function that accepts a DynamoDB Query command and return an instance of PaginationResponse.

Parameters

Name Type
args PaginatorOptions

Returns

fn

▸ <T>(query, opts?): PaginationResponse<T>

Returns a function that accepts a DynamoDB Query command and return an instance of PaginationResponse.

Type parameters
Name Type
T extends AttributeMap
Parameters
Name Type
query QueryCommandInput
opts? PaginateQueryOptions<T>
Returns

PaginationResponse<T>


createScan

Static createScan(args): <T>(scan: ScanCommandInput, opts?: PaginateQueryOptions<T>) => PaginationResponse<T>

Returns a function that accepts a DynamoDB Scan command and return an instance of PaginationResponse.

Parameters

Name Type
args PaginatorOptions

Returns

fn

▸ <T>(scan, opts?): PaginationResponse<T>

Returns a function that accepts a DynamoDB Scan command and return an instance of PaginationResponse.

Type parameters
Name Type
T extends AttributeMap
Parameters
Name Type
scan ScanCommandInput
opts? PaginateQueryOptions<T>
Returns

PaginationResponse<T>

PaginatorOptions

Properties

client

client: DynamoDBDocumentClientV2 | DynamoDBDocumentClient

AWS SDK v2 or v3 DynamoDB Document Client.


indexes

Optional indexes: Record<string, [partitionKey: string, sortKey?: string]> | (index: string) => [partitionKey: string, sortKey?: string]

Object that resolves an index name to the partition and sort key for that index. Also accepts a function that builds the names based on the index name.

Defaults to (index) => [`${index}PK`, `${index}SK`].


key

key: CipherKey | Promise<CipherKey> | () => CipherKey | Promise<CipherKey>

A 32-byte encryption key (e.g. crypto.randomBytes(32)). The key parameter also accepts a Promise that resolves to a key or a function that resolves to a Promise of a key.

If a function is passed, that function is lazily called only once. The function is called concurrently with the first query request to DynamoDB to reduce the overall latency for the first query. The key is cached and the function is not called again.


schema

Optional schema: [partitionKey: string, sortKey?: string]

Names for the partition and sort key of the table. Defaults to ['PK', 'SK'].

PaginationResponse

The PaginationResponse class implements the query result iterator. It has a number of utility functions such as peek() and all() to simplify common usage patterns.

The iterator can be interrupted and resumed at any time. The iterator will stop to produce items after the end of the query is reached or the provided limit parameter is exceeded.

Type parameters

Name Type
T extends AttributeMap = AttributeMap

Properties

count

count: number

Number of items yielded

Accessors

consumedCapacity

get consumedCapacity(): number

Total consumed capacity for query

Returns

number


finished

get finished(): boolean

Returns true if all items for this query have been returned from DynamoDB.

Returns

boolean


nextToken

get nextToken(): undefined | string

Token to resume query operation from the current position. The token is generated from the LastEvaluatedKey attribute provided by DynamoDB and then AES256 encrypted such that it can safely be provided to an untrustworthy client (such as a user browser or mobile app). The token is Base64 URL encoded which means that it only contains URL safe characters and does not require further encoding.

The encryption is necessary to prevent leaking sensitive information that can be included in the LastEvaluatedKey provided by DynamoDB. It also prevents a client from modifying the token and therefore manipulating the query execution (NoSQL injection).

The length of the token depends on the length of the values for the partition and sort key of the table or index that you are querying. The token length is at least 42 characters.

Returns

undefined | string


requestCount

get requestCount(): number

Number of requests made to DynamoDB

Returns

number


scannedCount

get scannedCount(): number

Number of items scanned by DynamoDB

Returns

number

Methods

[asyncIterator]

[asyncIterator](): AsyncGenerator<T, void, void>

for await (const item of items) {
  // work with item
}

Returns

AsyncGenerator<T, void, void>


all

all(): Promise<T[]>

Return all items from the query (up to limit items). This is potentially dangerous and expensive as it this query will keep making requests to DynamoDB until there are no more items. It is recommended to pair all() with a limit() to prevent a runaway query execution.

Returns

Promise<T[]>


filter

filter<K>(predicate): PaginationResponse<K>

Filter results by a predicate function

Type parameters

Name Type
K extends AttributeMap

Parameters

Name Type
predicate (arg: AttributeMap) => arg is K

Returns

PaginationResponse<K>


from

from<L>(nextToken): L

Start returning results starting from nextToken

Type parameters

Name Type
L extends PaginationResponse<T, L>

Parameters

Name Type
nextToken undefined | string

Returns

L


limit

limit<L>(limit): L

Limit the number of results to limit. Will return at least limit results even when using FilterExpressions.

Type parameters

Name Type
L extends PaginationResponse<T, L>

Parameters

Name Type
limit number

Returns

L


peek

peek(): Promise<undefined | T>

Returns the first item in the query without advancing the iterator. peek() can also be used to "prime" the iterator. It will immediately make a request to DynamoDB and fill the iterators cache with the first page of results. This can be useful if you have other concurrent asynchronous requests:

const items = paginateQuery(...);

await Promise.all([
  items.peek(),
  doSomethingElse(),
]);

for await (const item of items) {
  // the first page of items has already been pre-fetched so they are available immediately
}

peek can be invoked inside a for await loop. peek returns undefined if there are no more items returned or if the limit has been reached.

for await (const item of items) {
  const next = await items.peek();
  if (!next) {
    // we've reached the last item
  }
}

peek() does not increment the count attribute.

Returns

Promise<undefined | T>

PaginateQueryOptions

Type parameters

Name Type
T extends AttributeMap

Properties

context

Optional context: string | Buffer

The context defines the additional authenticated data (AAD) that is used to generate the signature for the pagination token. It is optional but recommended because it adds an additional layer of authentication to the pagination token. Pagination token will be tied to the context and replaying them in other contexts will fail. Good examples for the context are a user ID or a session ID concatenated with the purpose of the query, such as ListPets. The context cannot be extracted from the pagination token and can therefore contain sensitive data.


filter

Optional filter: (arg: AttributeMap) => arg is T

Type declaration

▸ (arg): arg is T

Filter results by a predicate function

Parameters
Name Type
arg AttributeMap
Returns

arg is T


from

Optional from: string

Start returning results starting from nextToken


limit

Optional limit: number

Limit the number of results to limit. Will return at least limit results even when using FilterExpressions.

About

Paginate through DynamoDB items with ease and enable client pagination using encrypted tokens.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published