Skip to content

Storing Files

Adam Hooper edited this page Sep 20, 2018 · 2 revisions

Workbench stores lots of blob-ish data: user-uploaded files; fetched tables; cached tables; and module code.

Architecture

Workbench uses Minio to store files. On users' computers, it emulates S3 and stores files on the filesystem. On our production servers, it emulates S3 and stores files in cloud storage.

Cloud storage is ~3x cheaper than a filesystem; it doesn't use locks; and it grows automatically to match its data.

For now, Workbench stores some files on the filesystem. On production, Workbench uses an NFS server to emulate a local filesystem. NFS is expensive and difficult to administer.

Specifics

User-uploaded files

Here's how it hooks together:

             ┏━━━━━━━━━━ Workbench ━━━━━━━━━━┓
             ┃  ┌─────┐   ┌───────┐          ┃
Get-data ├╌╌╌╂╌╌┼ App ┼╌╌╌┤ Minio │          ┃
Request  │   ┃  └─────┘   └─┬───┬─┘          ┃
             ┃              ┊   ┊            ┃
Upload  ├╌╌╌╌╂╌╌╌╌╌╌╌╌╌╌╌╌╌╌┘   ┊ ┌────────┐ ┃
Parts   │    ┃                  └╌┤ Worker │ ┃
             ┃                    └────────┘ ┃
             ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

The workers and app servers read and write minio data using the S3 protocol. (They know its access key and secret key, so they can access all its files.)

When the user requests to upload a file, the browser divides the file into chunks and asks the app to sign each chunk. Then it directly sends the signed chunk to the minio server. Our minio server is publicly accessible (just like S3's servers).

In docker-compose, minio runs standalone and stores all its buckets in a single minio_data volume. On production, minio is a gateway that proxies Google Cloud Storage: we created GCS buckets and IAM roles during our initial setup, and the app's and worker's roles give them access to read and write the buckets.

Fetched tables

TODO: move these to minio

Cached data

TODO: move this to minio

Module code

TODO: move this out of NFS and delete the NFS server