-
Notifications
You must be signed in to change notification settings - Fork 45
Storing Files
Workbench stores lots of blob-ish data: user-uploaded files; fetched tables; cached tables; and module code.
Workbench uses Minio to store files. On users' computers, it emulates S3 and stores files on the filesystem. On our production servers, it emulates S3 and stores files in cloud storage.
Cloud storage is ~3x cheaper than a filesystem; it doesn't use locks; and it grows automatically to match its data.
For now, Workbench stores some files on the filesystem. On production, Workbench uses an NFS server to emulate a local filesystem. NFS is expensive and difficult to administer.
Here's how it hooks together:
┏━━━━━━━━━━ Workbench ━━━━━━━━━━┓
┃ ┌─────┐ ┌───────┐ ┃
Get-data ├╌╌╌╂╌╌┼ App ┼╌╌╌┤ Minio │ ┃
Request │ ┃ └─────┘ └─┬───┬─┘ ┃
┃ ┊ ┊ ┃
Upload ├╌╌╌╌╂╌╌╌╌╌╌╌╌╌╌╌╌╌╌┘ ┊ ┌────────┐ ┃
Parts │ ┃ └╌┤ Worker │ ┃
┃ └────────┘ ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
The workers and app servers read and write minio data using the S3 protocol. (They know its access key and secret key, so they can access all its files.)
When the user requests to upload a file, the browser divides the file into chunks and asks the app to sign each chunk. Then it directly sends the signed chunk to the minio server. Our minio server is publicly accessible (just like S3's servers).
In docker-compose, minio runs standalone and stores all its buckets in a single minio_data
volume. On production, minio is a gateway that proxies Google Cloud Storage: we created GCS buckets and IAM roles during our initial setup, and the app's and worker's roles give them access to read and write the buckets.
TODO: move these to minio
TODO: move this to minio
TODO: move this out of NFS and delete the NFS server