Skip to content

Latest commit

 

History

History
782 lines (604 loc) · 36.6 KB

README.md

File metadata and controls

782 lines (604 loc) · 36.6 KB

🔥 Firestore Backfire

npm version License

Ultimate control over importing and exporting data from Firestore and the Firestore Emulator, on your CLI and in your code.

This documentation is for 2.x. Find documentation for 1.x here.

✨ Features ✨

  • Import and export your Firestore data with ease
  • Specify which documents or collections are imported or exported using paths or by matching regex patterns
  • Control the depth of subcollections to import or export
  • Limit the number of documents to export
  • Import and export data as NDJSON to a variety of storage sources:
    • local files
    • Google Cloud Storage
    • AWS S3
    • Or implement your own data source

Table of contents

Installation

Install firestore-backfire and @google-cloud/firestore using your favourite package manager.

yarn add firestore-backfire @google-cloud/firestore
pnpm add firestore-backfire @google-cloud/firestore
npm install firestore-backfire @google-cloud/firestore

Peer dependencies for Google Cloud Storage

If you plan to import and export data from Google Cloud Storage, you should install:

  • @google-cloud/storage

Peer dependencies for AWS S3

If you plan to import and export data from S3, you should install:

  • @aws-sdk/client-s3

Additionally, if you want to use a credential profile to run this program, you should also install:

  • @aws-sdk/credential-provider-ini

Usage and examples

CLI

Firestore Backfire can be called on the CLI using backfire. The aliases bf and firestore are also provided for convenience.

If installed in your project, run it using your package manager:

yarn backfire import path-to-my-data ...

If installed globally, you can call it directly:

backfire import path-to-my-data ...

You can also use it in your package.json scripts.

// package.json
{
  "scripts": {
    "import-my-data": "backfire import path-to-my-data ..."
  }
}

CLI options

All options listed in the documentation have a CLI flag equivalent unless otherwise specified. The flag will always be -- followed by the option name. For example, the option limit can be passed on the CLI using --limit. In most cases, a shorthand may be available. Use the backfire [command] --help command to see the available options and their repsective flags.

CLI examples

Export documents...

  • to a file called emails.ndjson in an S3 bucket called bucket, using the AWS credentials profile named default
  • from a Firestore project called demo using the credentials found at key.json
  • from the emails and messages collection
backfire export s3://bucket/emails --awsProfile default -P demo -K key.json --paths emails messages --awsRegion us-east-1

Export documents...

  • to a local file called emails.ndjson in the export folder
  • from a Firestore project called demo using the credentials found at key.json
  • from the emails collection
  • where the document id starts with "abc" or "123"
  • where the document id cannot end with "xyz"
backfire export ./export/emails -P demo -K key.json --paths emails --match ^emails/abc ^emails/123 --ignore xyz$

Import documents...

  • from a file called emails.ndjson in a Google Cloud Storage bucket called bucket, belonging to a project with the ID gcp-demo, using a service account key file called gcp-demo.json
  • to the demo project in the Firestore Emulator running on port 8080
  • where the document belongs to a root level collection (depth of 0)
  • only import the first 10 documents
  • overwrite any existing data
backfire import gs://bucket/emails --gcpProject gcp-demo --gcpKeyFile gcp-demo.json -P demo -E localhost:8080 --depth 0 --limit 10 --mode overwrite

Node

Firestore Backfire exposes functions in Node that you can use to import and export data using a data source.

import {
  importFirestoreData,
  exportFirestoreData,
  // ...
} from "firebase-backfire";

await importFirestoreData(connection, reader, options);
await exportFirestoreData(connection, writer, options);

Options for specifying the Firestore instance to connect to can be provided through the connection parameter. The reader and writer parameters are data sources (see here for more information on how to create a data source). The options parameter allow you to configure the import/export behaviour.

Exporting data

To export data from Firestore, use the export command on the CLI, or use the exportFirestoreData function in Node. Each document is exported as per the SerializedFirestoreDocument interface as a line of NDJSON.

backfire export <path> [options]
import { exportFirestoreData } from "firestore-backfire";

await exportFirestoreData(connection, writer, options);

When using the CLI, path should point to the location where you want the data to be exported to. This can be a path to a local file, a Google Cloud Storage path (prefixed with gs://), or an S3 path (prefixed with s3://).

When using the exportFirestoreData function, the connection parameter can be an instance of Firestore, or it can be an object that specifies options for creating a connection to Firestore. The writer parameter must be an implementation of IDataSourceWriter. See the section on data sources for more information.

Options

All options have a CLI flag equivalent unless otherwise specified. Follows the ExportFirestoreDataOptions interface.

Option Type Description
paths string[] Provide a list of paths where you want to export data from. This can be a collection path (e.g. emails), or a path to a document (e.g. emails/1). If not specified, all paths will be exported, starting from the root collections.
match RegExp[] Provide a list of regex patterns that a document path must match to be exported.
ignore RegExp[] Provide a list of regex patterns that prevent a document from being exported when its path matches any of the patterns. Takes precendence over match.
depth number Limit the subcollection depth to export documents from. Documents in the root collection have a depth of 0. If not specified, no limit is applied.
limit number Limit the number of documents to export. If not specified, no limit is applied.
overwrite boolean Overwrite any existing data at the output path. Defaults to false.
update number The interval (in seconds) at which update logs are printed. Update logs are at the debug level. Defaults to 5.
exploreInterval* number The interval (in milliseconds) at which chunks of paths are dequeued for exploration using Firestore SDK's listDocuments() or listCollections() methods. Defaults to 10.
exploreChunkSize* number The chunk size to use when dequeuing paths for exploration. Defaults to 5000.
downloadInterval* number The interval (in milliseconds) at which chunks of document paths are dequeued to be filtered and downloaded from Firestore. Defaults to 1000.
downloadChunkSize* number The chunk size to use when dequeueing paths for download. Defaults to limit if supplied, otherwise it dequeues all available paths.

* Advanced configuration - default values should be suitable for most use cases. Considered internal, so may change as implementation changes.

Logging options

By default, only log messages at the info level and above are printed. Follows the LoggingOptions interface.

Option Type Description
debug boolean Print debug level logs and higher.
verbose boolean Print verbose level logs and higher. Overrides debug.
quiet boolean Silence all logs. Overrides debug and verbose.

Importing data

To import data into Firestore, use the import command on the CLI, or use the importFirestoreData function in Node. The data being imported is expected to be in NDJSON format, where each line follows the SerializedFirestoreDocument interface.

backfire import <path> [options]
import { importFirestoreData } from "firestore-backfire";

await importFirestoreData(connection, reader, options);

When using the CLI, path should point to the location where you want the data to be imported from. This can be a path to a local file, a Google Cloud Storage path (prefixed with gs://), or an S3 path (prefixed with s3://).

When using the importFirestoreData function, the connection parameter can be an instance of Firestore, or it can be an object that specifies options for creating a connection to Firestore. The reader parameter must be an implementation of IDataSourceReader. See the section on data sources for more information.

⚠️ NOTE: When using the Firestore Emulator, importing a large amount of data can result in errors as the emulator is not designed to scale.

Options

All options have a CLI flag equivalent unless otherwise specified. Follows the ImportFirestoreDataOptions interface.

Option Type Description
paths string[] Provide a list of paths where you want to import data from. This can be a collection path (e.g. emails), or a path to a document (e.g. emails/1). If not specified, all paths will be imported.
match RegExp[] Provide a list of regex patterns that a document path must match to be imported.
ignore RegExp[] Provide a list of regex patterns that prevent a document from being imported if its path matches any of the patterns. Takes precendence over match.
depth number Limit the subcollection depth to import documents from. Documents in the root collection have a depth of 0. If not specified, no limit is applied.
limit number Limit the number of documents to import. If not specified, no limit is applied.
mode "create" "insert" "overwrite" "merge" Specify how to handle importing documents that would overwrite existing data. See the import mode section for more information. Defaults to create.
update number The interval (in seconds) at which update logs are printed. Update logs are at the debug level. Defaults to 5.
flush* number The interval (in seconds) at which documents are flushed to Firestore. Defaults to 1.
processInterval* number The interval (in milliseconds) at which documents are processed as they stream in from the data source. Defaults to 10.
processLimit* number The maximum number of pending writes to Firestore. Defaults to 200.

* Advanced configuration - default values should be suitable for most use cases. Considered internal, so may change as implementation changes.

Import mode

The mode option specifies how to handle importing documents that would overwrite existing data in Firestore. The default import mode is create.

  • create mode will log an error when impporting documents that already exist in Firestore, and existing documents will not be modified.
  • insert mode will only import documents that do not exist, and existing documents will not be modified.
  • overwrite mode will import documents that do not exist, and completely overwrite any existing documents.
  • merge mode will import documents that do not exist, and merge existing documents.

Logging options

By default, only log messages at the info level and above are printed. Follows the LoggingOptions interface.

Option Type Description
debug boolean Print debug level logs and higher.
verbose boolean Print verbose level logs and higher. Overrides debug.
quiet boolean Silence all logs. Overrides debug and verbose.

Get document

Have you ever wanted to quickly inspect or export a document as JSON from Firestore? This CLI command can help you do just that. path should be a valid Firestore document path. Prints the document as pretty JSON.

Also ensure you provide appropriate options for connecting to Firestore.

backfire get <path> [options]

Options

All options have a CLI flag equivalent unless otherwise specified. Follows the GetFirestoreDataOptions interface.

Option Type Description
stringify boolean or number JSON.stringify() the output. Pass true to use the default indent of 2, or pass a number to specify the indent amount.

List documents and collections

List the document IDs or collection IDs at the specified path.

Also ensure you provide appropriate options for connecting to Firestore.

backfire list documents <path> [options]
backfire list collections [path] [options]

When listing collections, you may leave path empty to list root collections, or pass a valid Firestore document path to list its subcollections.

Options

All options have a CLI flag equivalent unless otherwise specified. Follows the ListFirestoreDataOptions interface.

Option Type Description
limit number Limit the number of documents/collections to return. Note that this does not "truly" limit the API call, it only truncates the output after the data is received from Firebase.

Count documents and collections

Count the number of the documents in a collection, or the number of collections at the specified path.

Also ensure you provide appropriate options for connecting to Firestore.

backfire count documents <path> [options]
backfire count collections [path] [options]

When counting collections, you may leave path empty to count root collections, or pass a valid Firestore document path to count its subcollections.

Connecting to Firestore

In order to read and write data to Firestore, you will need to specify some options for the connection. Follows the FirestoreConnectionOptions interface.

Option Type Description
project string The ID of the Firestore project to connect to.
adc boolean Use Application Default Credentials.
keyFile string The path to a service account's private key JSON file. Takes precedence over adc.
emulator string or boolean Connect to a local Firestore emulator. Defaults to localhost:8080. Pass a string value to specify a different host. Takes precedence over adc and keyFile.
credentials* object Service account credentials. Fields client_email and private_key are expected. Takes precedence over adc, keyFile and emulator.

* Not available in the CLI.

  • The project option is always required
  • To connect to a real Firestore instance, you must specify adc or keyFile, or pass a credentials object (Node only)
  • If you are connecting to a local Firestore emulator, you can use the emulator option

As an alternative, these options can also be provided through a configuration file or as environment variables. Note that CLI options will always take precendence over environment variables.

  • GOOGLE_CLOUD_PROJECT can be used to provide project
  • GOOGLE_APPLICATION_CREDENTIALS can be used to provide keyFile
  • FIRESTORE_EMULATOR_HOST can be used to provide emulator

In Node, you can also pass an existing instance of Firestore instead of providing connection options.

Data sources

A data source provides a way to to read and write data to an external location. This pacakge comes with a few implementations, and exports interfaces for you to implement your own ones in Node if the provided implementations do not suit your needs.

Local

This data source reads and writes data as local files to your machine. To use this data source on the CLI, specify a path that points to a valid file path (note that this is different from v1). If the path is in a directory that does not exist, it will be created for you.

No other configuration options are required.

Google Cloud Storage

This data source reads and writes data from a Google Cloud Storage bucket. To use this data source on the CLI, specify a path beginning with gs://.

Credentials for reading and writing to the Google Cloud Storage bucket must also be provided as CLI options or through a configuration file.

Option Type Description
gcpProject string The Google Cloud project the bucket belongs to.
gcpAdc boolean Use Application Default Credentials.
gcpKeyFile string Path to the service account credentials file to use. Takes precedence over gcpAdc.
gcpCredentials* object Service account credentials. Fields client_email and private_key are expected. Takes precedence over gcpAdc and gcpKeyFile.

* Not available in the CLI.

  • The gcpProject option is always required
  • You must specify gcpAdc or gcpKeyFile, or pass a gcpCredentials object (Node only)

Alternatively, these values can also be provided through the corresponding environment variables:

  • GOOGLE_CLOUD_PROJECT can be used to provide gcpProject
  • GOOGLE_APPLICATION_CREDENTIALS can be used to provide gcpKeyFile

IMPORTANT: These environment variables are also used by Firestore connection options. If you need to use different credentials for connecting to Firestore and accessing Google Cloud Storage, you can override the environment variables by passing them as CLI options or through a configuration file.

AWS S3

This data source reads and writes data from an S3 bucket. To use this data source on the CLI, specify a path beginning with s3://.

Credentials for reading and writing to the S3 bucket must also be provided as CLI options or through a configuration file.

Option Type Description
awsRegion string The AWS region to use.
awsProfile string The name of the profile to use from your local AWS credentials. Requires @aws-sdk/credential-provider-ini to be installed.
awsAccessKeyId string The access key id to use. This takes precendence over the awsProfile option, which means that if you provide awsProfile as well as access keys, the access keys will be used.
awsSecretAccessKey string The secret access key to use. This takes precendence over the awsProfile option, which means that if you provide awsProfile as well as access keys, the access keys will be used.
  • The awsRegion option is always required
  • You can choose to use either awsProfile, or awsAcecssKeyId and awsSecretAccessKey

Alternatively, these values can also be provided through the corresponding environment variables:

  • AWS_REGION can be used to provide awsRegion
  • AWS_PROFILE can be used to provide awsProfile
  • AWS_ACCESS_KEY_ID can be used to provide awsAccessKeyId
  • AWS_SECRET_ACCESS_KEY can be used to provide awsSecretAccessKey

Creating a data source in Node

All provided data source implementations are registered in a default instance of DataSourceFactory, which is exposed to you in Node. You can create a reader or writer implementation directly from the factory by calling the createReader() or createWriter() method.

The factory will automatically select the data source to create based on the path it was given. The default implementation will fall back to using the local data source if the path does not match any other data sources.

// Import the default factory instance
import {
  dataSourceFactory,
  importFirestoreData,
  exportFirestoreData,
} from "firestore-backfire";

// Create the reader and writer.
// The `options` object should provide the credentials to
// connect to data sources if required (such as GCS or S3).
const path = "s3://my-bucket/exported-data.ndjson";
const reader = await dataSourceFactory.createReader(path, options);
const writer = await dataSourceFactory.createWriter(path, options);

// Use the reader and writer
await importFirestoreData(connection, reader, options);
await exportFirestoreData(connection, writer, options);

Custom data sources

There are two types of data sources: readers and writers. A reader reads text data from a stream, whilst a writer writes lines of text data to a stream.

A data source does not need to provide both a reader and a writer, but obviously if a reader is not provided, you cannot import data, and if a writer is not provided, you cannot export data.

To create a data source and make it useable with Firestore Backfire, follow these steps:

  1. Create at least one of the following:
  2. Construct a IDataSource object, in which you should define:
    • A unique id for the data source
    • A match function, which takes a path parameter and returns true if the path can be used with this data source
    • A reader property, which can your IDataSourceReader class directly, or provide a function that will create an instance of it (this can be left empty if you do not want to import data)
    • A writer property, which can your IDataSourceWriter class directly, or provide a function that will create an instance of it (this can be left empty if you do not want to export data)
  3. Register the data source using the register() method on the default DataSourceFactory instance (exposed as dataSourceFactory)

Once your data source has been registered, you can use the createReader() or createWriter() methods on the default DataSourceFactory instance to construct your data source.

Alternatively, you can instantiate your custom data source yourself and pass it directly to the importFirestoreData or exportFirestoreData if you do not need to support different path types or use the default implementations.

Implementation example

You can always take a look at how the provided implementations are written by looking at the source code, and seeing how they are registered. Below is a basic example as a reference.

import {
  IDataSourceReader,
  IDataSourceWriter,
  dataSourceFactory,
} from "firestore-backfire";

// First define your custom implementations

class MyDataReader implements IDataSourceReader {
  // ...
}

class MyDataWriter implements IDataSourceReader {
  // ...
}

// You might want to define some custom options to use
// with your data source, such as credentials
interface MyCustomOptions {
  username?: string;
  password?: string;
}

// Then register them with the data source factory

dataSourceFactory.register<MyCustomOptions>({
  id: "custom",
  // Use this data source with any paths starting with "custom://"
  match: (path) => path.startsWith("custom://"),
  // You can tell the factory to use the class directly, which will
  // pass the `path` and `options` object to the constructor
  reader: { useClass: MyDataReader },
  // You can also tell the factory to call a function to create
  // the class, which is useful for processing options that are passed
  writer: {
    useFactory: async (path, options) => {
      // E.g. check that the required options are present
      if (!options.username) throw new Error("username is required");
      if (!options.password) throw new Error("password is required");
      // If everything is good, return an instance of your class
      return new MyDataWriter(path, options.username, options.password);
    },
  },
});

// Then create the data source using the factory
const path = "custom://my-data";
const reader = await dataSourceFactory.getReader<MyCustomOptions>(path, {
  username: "...",
  password: "...",
});

Configuration file

Instead of providing options on the CLI, you can also set defaults through a configuration file. You can use the flag --config <path> to point to a specific file to use as configuration. Note that CLI options will always override options provided through a configuration file.

IMPORTANT: Do not to commit any secrets in your config file to version control.

The configuration file is loaded using cosmiconfig, which supports a wide range of configuration file formats. Some examples of supported formats:

  • .backfirerc.json
  • .backfirerc.yaml
  • .backfirerc.js
  • backfire.config.js

Sample YAML config:

project: demo-project
keyFile: ./service-account.json
emulator: localhost:8080
paths:
  - emails
match:
  - ^emails/123
ignore:
  - xyz$
depth: 2

Sample JSON config:

{
  "project": "demo-project",
  "keyFile": "./service-account.json",
  "emulator": "localhost:8080",
  "paths": ["emails"],
  "match": ["^emails/123"],
  "ignore": ["xyz$"],
  "depth": 2
}

Migration

1.x to 2.x

Firestore Backfire v2 is a rewrite of v1 to provide a more up to date and extensible design. It provides new and improved functionality, uses NDJSON as the data format, and no longer uses worker threads.

Breaking changes

  • -p has been renamed to -P
  • -k has been renamed to -K
  • -e has been renamed to -E
  • --patterns has been renamed to --match
  • --workers has been removed as worker threads are no longer used
  • --logLevel has been removed, use --verbose, --debug or --silent instead
  • --prettify has been renamed to --stringify
  • --force has been renamed to --overwrite
  • --mode values have changed to "create", "insert", "overwrite", "merge"
  • Import and export file format changed to NDJSON (not backward compatible)

New features

  • New options:
    • ignore (--ignore, -i) to ignore paths
    • limit (--limit, -l) to limit number of documents imported/exported
    • update (--update) to specify the frequency of update messages
    • A few more advanced configuration options
  • New commands:
    • backfire get <path> to get a document from Firestore
    • backfire list:documents <path> to list documents in a collection
    • backfire list:collections [path] to list root collections or subcollections
  • Support for passing some options as environment variables
    • GOOGLE_CLOUD_PROJECT
    • GOOGLE_APPLICATION_CREDENTIALS
    • AWS_PROFILE
    • AWS_ACCESS_KEY_ID
    • AWS_SECRET_ACCESS_KEY
    • AWS_REGION
  • Ability to create custom data sources in Node
  • Ability to use an existing Firestore instance in Node

Contributing

Thanks goes to these wonderful people (emoji key):


Ben Yap

💻 ⚠️ 📖

Anderson José de França

🤔

This project follows the all-contributors specification. Contributions of any kind welcome! Please follow the contributing guidelines.

Changelog

Please see CHANGELOG.md.

License

Please see LICENSE.