-
-
Notifications
You must be signed in to change notification settings - Fork 224
docs(self-hosted): external storage configurations #1269
Changes from 1 commit
611ed0c
bc99370
d6f0bf6
4f1962d
50347c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,89 @@ | ||||||
--- | ||||||
title: External Storage | ||||||
--- | ||||||
|
||||||
<!-- Hello! If you're reading this, you're in luck because I can't decide whether to make.. wait let me copy the text from Discord. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: much easier to review if questions/comments like this are added in the GH comment system, rather than in-line in the PR. |
||||||
|
||||||
I got some time before Monday to write up some docs about setting up an S3 storage for selfhosted instance, but I can't decide whether I should put it under a big "External Services" page, in which people can include external postgres, external redis, and that kind of things; or should I put it under a page called "External Storage"? | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would call the proposed page "Integrating with Major Cloud Providers" or something similar, just to make it clear that we are specifically referring to GCS/AWS/Azure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. External storage sounds fine. I don't think we should recommend people to use external postgres, redis, etc as that can introduce a lot of issues for people trying to set that up unless they really know what they're doing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with "don't think we should recommend people to use external postgres, redis, etc", but if there are some people who wishes to do that... I don't know if there's any other better way to say to them that "they're on their own" |
||||||
|
||||||
There. Please help me decide this. I'll delete this comment afterwards --> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should have a separate page that, rather than "External Services", says something like "Unsupported Workflows". For example, S3 storage support technically exists, but is a. untested, and b. unused by Sentry internally. So there's no real pressure ensuring that is stays functional over time. Ultimately, what we have at the moment is a (very possibly bit-rotted) thin wrapper around Django's FileStore capabilities. We do not want to indicate to users that it is something we'll offer support for, because realistically we can't offer very good support for it, and folks will be left disappointed. For this specific doc, we need to be very clear that this is provided as a rough best effort template, and that we offer very limited support for it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I know this, but I think the current S3 support is good enough for selfhosted. Users can always pull out their own Django plugins though. Like @stayallive's S3 Nodestore plugin https://github.com/stayallive/sentry-nodestore-s3
I agree. Need to take some time to come up with good enough copywriting for this lol. |
||||||
|
||||||
<Alert title="Note" level="info"> | ||||||
After changing configuration files, re-run the <code>./install.sh</code> script, to rebuild and restart the containers. See the <Link to="/self-hosted/#configuration">configuration section</Link> for more information. | ||||||
</Alert> | ||||||
|
||||||
<!-- Should we add a description about what "external storage" is? --> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What exactly do you mean by "external storage"? Does that essentially mean "storage supplied by a cloud services provider like AWS/GCP/Azure"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's something that's not strictly on the same filesystem as the sentry self-hosted instance. But it also excludes if you want to use something like NAS or external bind-mount storage to store Sentry data. For files, it's the blob storage provided by each cloud provider. For databases, it's external database that's either managed or unmanaged, but it should be separate to the sentry self-hosted instance. Do you have any suggestion on how to better phrase this out? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can safely assume that people can find that out for themselves if they are looking at this page |
||||||
|
||||||
## Filestore | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit:
Suggested change
Sentry (confusingly) maintains a separate service called |
||||||
|
||||||
Filestore handles storing attachment, sourcemap, and replays. Filestore configuration for Sentry should be configured on the `sentry/config.yml` file. | ||||||
|
||||||
### S3 backend | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to make sure, have you tried the Azure/s3 compatible backend without issues? We're using GCS so wanted to make sure There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes! See my patch getsentry/self-hosted@master...teknologi-umum:sentry:teknum-patch |
||||||
|
||||||
The configuration for S3-compatible backend is pointed to `sentry.filestore.s3.S3Boto3Storage`. | ||||||
|
||||||
```yaml | ||||||
filestore.backend: 's3' | ||||||
filestore.options: | ||||||
bucket_acl: 'private' | ||||||
default_acl: 'private' | ||||||
access_key: '<REDACTED>' | ||||||
secret_key: '<REDACTED>' | ||||||
bucket_name: 'my-bucket' | ||||||
region_name: 'auto' | ||||||
endpoint_url: 'https://<REDACTED>' | ||||||
addressing_style: 'path' # For regular AWS S3, use "auto" or "virtual". For other S3-compatible API like MinIO or Ceph, use "path". | ||||||
signature_version: 's3v4' | ||||||
``` | ||||||
|
||||||
Refer to [botocore configuration](https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html) for valid configuration values. | ||||||
|
||||||
<!-- ### Google Cloud Storage backend | ||||||
|
||||||
I don't know how this works. The source code that points to this configurations: | ||||||
- https://github.com/getsentry/sentry/blob/751ef4a029dda5802311fc424a5f63d72b7efd3d/src/sentry/conf/server.py#L2149 | ||||||
- https://github.com/getsentry/sentry/blob/751ef4a029dda5802311fc424a5f63d72b7efd3d/src/sentry/filestore/gcs.py#L226-L245 --> | ||||||
|
||||||
## Vroom | ||||||
|
||||||
Vroom is the service that handles profiling. By default the data for profiling is saved on local filesystem. On self-hosted deployment, this should be done by overriding the `SENTRY_BUCKET_PROFILES` environment variable. It's also possible that additional environment variables should be added, depending on the backend of choice. | ||||||
|
||||||
### S3 backend | ||||||
|
||||||
```bash | ||||||
# For regular AWS S3 | ||||||
s3://my-bucket?awssdk=v1®ion=us-west-1&endpoint=amazonaws.com | ||||||
|
||||||
# For other S3-compatible API | ||||||
aldy505 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
s3://my-bucket?awssdk=v1®ion=any-region&endpoint=minio.yourcompany.com&s3ForcePathStyle=true&disableSSL=false | ||||||
``` | ||||||
|
||||||
Additional environment variables should be provided: | ||||||
- `AWS_ACCESS_KEY=foobar` | ||||||
- `AWS_SECRET_KEY=foobar` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit (here and elsewhere): change the value from |
||||||
- `AWS_SESSION_TOKEN=foobar` (optional) | ||||||
|
||||||
Further explanation on the query string options: | ||||||
- `region`: The AWS region for requests. | ||||||
- `endpoint`: The endpoint URL (hostname only or fully qualified URI). | ||||||
- `disableSSL`: A value of "true" disables SSL when sending requests. | ||||||
- `s3ForcePathStyle`: A value of "true" forces the request to use path-style addressing. | ||||||
|
||||||
### Azure Blob Storage backend | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My inclination is to remove Azure for now, and indicate somewhere in this document that there are no shims for it atm. Since we don't have a |
||||||
|
||||||
```bash | ||||||
azblob://my-container?protocol=https&domain=yourcompany.blob.core.windows.net&localemu=false&cdn=false | ||||||
``` | ||||||
|
||||||
Additional environment variables that should be provided (pick what's compatible with your configuration): | ||||||
- `AZURE_STORAGE_ACCOUNT=foobar` - The service account name. Required if used along with `AZURE_STORAGE_KEY`, because it defines authentication mechanism to be [azblob.NewSharedKeyCredential](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/storage/azblob#NewSharedKeyCredential), which creates immutable shared key credentials. Otherwise, "storage_account" in the URL query string parameter can be used. | ||||||
aldy505 marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
- `AZURE_STORAGE_KEY=foobar` - To use a shared key credential alongside with `AZURE_STORAGE_ACCOUNT`. | ||||||
- `AZURE_STORAGE_SAS_TOKEN=foobar` - To use a SAS token | ||||||
|
||||||
Other authentication options and details can be found on the [gocloud.dev/blob/azblob's documentation](https://pkg.go.dev/[email protected]/blob/azureblob#hdr-URLs) | ||||||
|
||||||
Further explanation on the query string options: | ||||||
- `domain`: Your storage domain. | ||||||
- `protocol`: Network protocol (`http` or `https`). | ||||||
- `cdn`: A value of "true" specifies that the blob server is a CDN. | ||||||
- `localemu`: A value of "true" specifies that the blob server is a local emulator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should generalize to "Data Storage" or something.
This way this document can explain where the data is stored by default and can list alternatives if there are any.
Could have the following sections:
We should probably either rename those section to what the specifically store or explain that in the intro because "Vroom" is not very descriptive but if it's explained that that component is responsible for (ingest and) storing profiling data it makes a lot more sense.
Maybe with until someone else also chimes in before rewriting the whole thing in case I'm off base with this outline but this sounds like a document I would love to have had when I started my self-hosted adventures 👍
For the Object Storage thing we might want to link to the relevant documentation instead of adding examples for every option under the sun because otherwise there is no bound to the size of this document.