Skip to content
This repository has been archived by the owner on Aug 14, 2024. It is now read-only.

docs(self-hosted): external storage configurations #1269

Closed
wants to merge 5 commits into from
Closed

docs(self-hosted): external storage configurations #1269

wants to merge 5 commits into from

Conversation

aldy505
Copy link
Contributor

@aldy505 aldy505 commented May 12, 2024

Sorry for the ping but I need your feedbacks on this. @hubertdeng123 @azaslavsky @stayallive

Copy link

vercel bot commented May 12, 2024

@aldy505 is attempting to deploy a commit to the Sentry Team on Vercel.

A member of the Team first needs to authorize it.

@@ -0,0 +1,89 @@
---
title: External Storage
Copy link

@stayallive stayallive May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should generalize to "Data Storage" or something.

This way this document can explain where the data is stored by default and can list alternatives if there are any.

Could have the following sections:

  • Sentry (with a general explanation about postgres, clickhouse and kafka maybe)
    • Filestore (Uploads, Replays)
      • Database
      • Object Storage
    • Nodestore (Event data)
      • Database
  • Vroom (Profiles)
    • Docker volume
    • Object Storage

We should probably either rename those section to what the specifically store or explain that in the intro because "Vroom" is not very descriptive but if it's explained that that component is responsible for (ingest and) storing profiling data it makes a lot more sense.

Maybe with until someone else also chimes in before rewriting the whole thing in case I'm off base with this outline but this sounds like a document I would love to have had when I started my self-hosted adventures 👍

For the Object Storage thing we might want to link to the relevant documentation instead of adding examples for every option under the sun because otherwise there is no bound to the size of this document.

After changing configuration files, re-run the <code>./install.sh</code> script, to rebuild and restart the containers. See the <Link to="/self-hosted/#configuration">configuration section</Link> for more information.
</Alert>

<!-- Should we add a description about what "external storage" is? -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can safely assume that people can find that out for themselves if they are looking at this page


<!-- Hello! If you're reading this, you're in luck because I can't decide whether to make.. wait let me copy the text from Discord.

I got some time before Monday to write up some docs about setting up an S3 storage for selfhosted instance, but I can't decide whether I should put it under a big "External Services" page, in which people can include external postgres, external redis, and that kind of things; or should I put it under a page called "External Storage"?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External storage sounds fine. I don't think we should recommend people to use external postgres, redis, etc as that can introduce a lot of issues for people trying to set that up unless they really know what they're doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with "don't think we should recommend people to use external postgres, redis, etc", but if there are some people who wishes to do that... I don't know if there's any other better way to say to them that "they're on their own"


Filestore handles storing attachment, sourcemap, and replays. Filestore configuration for Sentry should be configured on the `sentry/config.yml` file.

### S3 backend
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, have you tried the Azure/s3 compatible backend without issues? We're using GCS so wanted to make sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

title: External Storage
---

<!-- Hello! If you're reading this, you're in luck because I can't decide whether to make.. wait let me copy the text from Discord.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: much easier to review if questions/comments like this are added in the GH comment system, rather than in-line in the PR.


I got some time before Monday to write up some docs about setting up an S3 storage for selfhosted instance, but I can't decide whether I should put it under a big "External Services" page, in which people can include external postgres, external redis, and that kind of things; or should I put it under a page called "External Storage"?

There. Please help me decide this. I'll delete this comment afterwards -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a separate page that, rather than "External Services", says something like "Unsupported Workflows".

For example, S3 storage support technically exists, but is a. untested, and b. unused by Sentry internally. So there's no real pressure ensuring that is stays functional over time. Ultimately, what we have at the moment is a (very possibly bit-rotted) thin wrapper around Django's FileStore capabilities. We do not want to indicate to users that it is something we'll offer support for, because realistically we can't offer very good support for it, and folks will be left disappointed.

For this specific doc, we need to be very clear that this is provided as a rough best effort template, and that we offer very limited support for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because realistically we can't offer very good support for it, and folks will be left disappointed.

I know this, but I think the current S3 support is good enough for selfhosted. Users can always pull out their own Django plugins though. Like @stayallive's S3 Nodestore plugin https://github.com/stayallive/sentry-nodestore-s3

we need to be very clear that this is provided as a rough best effort template

I agree. Need to take some time to come up with good enough copywriting for this lol.

After changing configuration files, re-run the <code>./install.sh</code> script, to rebuild and restart the containers. See the <Link to="/self-hosted/#configuration">configuration section</Link> for more information.
</Alert>

<!-- Should we add a description about what "external storage" is? -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "external storage"? Does that essentially mean "storage supplied by a cloud services provider like AWS/GCP/Azure"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's something that's not strictly on the same filesystem as the sentry self-hosted instance. But it also excludes if you want to use something like NAS or external bind-mount storage to store Sentry data. For files, it's the blob storage provided by each cloud provider. For databases, it's external database that's either managed or unmanaged, but it should be separate to the sentry self-hosted instance. Do you have any suggestion on how to better phrase this out?

src/docs/self-hosted/external-storage.mdx Outdated Show resolved Hide resolved

<!-- Should we add a description about what "external storage" is? -->

## Filestore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
## Filestore
## Django Filestore

Sentry (confusingly) maintains a separate service called Filestore which acts as an intermediate layer in front of ex GCP, though we don't really recommend this for self-hosted use.


<!-- Hello! If you're reading this, you're in luck because I can't decide whether to make.. wait let me copy the text from Discord.

I got some time before Monday to write up some docs about setting up an S3 storage for selfhosted instance, but I can't decide whether I should put it under a big "External Services" page, in which people can include external postgres, external redis, and that kind of things; or should I put it under a page called "External Storage"?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call the proposed page "Integrating with Major Cloud Providers" or something similar, just to make it clear that we are specifically referring to GCS/AWS/Azure.


Additional environment variables should be provided:
- `AWS_ACCESS_KEY=foobar`
- `AWS_SECRET_KEY=foobar`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (here and elsewhere): change the value from foobar to something else, to make clear that the two keys will be different in practice. I would suggest something like your_secret_key or similar.

src/docs/self-hosted/external-storage.mdx Outdated Show resolved Hide resolved
- `disableSSL`: A value of "true" disables SSL when sending requests.
- `s3ForcePathStyle`: A value of "true" forces the request to use path-style addressing.

### Azure Blob Storage backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination is to remove Azure for now, and indicate somewhere in this document that there are no shims for it atm. Since we don't have a Filestore shim for Azure, in practice it will be very hard to run vroom on Azure, and users will likely work themselves into a corner if they try.

title: External Storage
---

In some cases, storing Sentry data on-disk is not really something people can do. Sometimes, it's better if they can offload it into some bucket storage (like AWS S3 or Google Cloud Storage).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit confusing, but I think adding another page later after this about "Unsupported Workflows" in which we can specify more about what kind of things that we can't offer support to (external Redis, external Postgres, installing third party plugins for extending some stuff).

See @azaslavsky's comment here #1269 (comment)

Comment on lines +53 to +59
### Google Cloud Storage backend

You will need to set `GOOGLE_APPLICATION_CREDENTIALS` environment variable. For more information, refer to the [Google Cloud documentation for setting up authentication](https://cloud.google.com/storage/docs/reference/libraries#setting_up_authentication).

```bash
gs://my-bucket
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested this. I only know how to configure this. Can you guys test this out on your dogfood instance?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants