Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the
prometheus-longterm-metrics
andthanos
optional components #461base: master
Are you sure you want to change the base?
Add the
prometheus-longterm-metrics
andthanos
optional components #461Changes from 7 commits
d981ef3
06dc997
0d7178e
42f687d
a921527
0307996
2eab8b7
76e60f7
3f75d49
79a531d
59f6c68
2c6d5f5
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should those be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which bits would be useful to configure? The endpoints, paths, images, other?
I agree that we could always add more configuration options, I'm just wondering which are a priority for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The endpoints would be the priority, to allow serving them from some other location, though still a low priority relative to the feature as a whole. Worst case, redirects can be defined in the nginx configuration, so don't block the PR just for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indefinitely ! Do we actually want this? Can we set an expiry after like 10 years?
Grafana will be able to display data from Thanos go to back to 10 years? With this kind of extreme long term stats, what is the UI to visualize it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can choose to change this if you wish. Thanos suggests keeping data indefinitely by default. If you do not need to keep data forever, I suggest just using the
prometheus-longterm-monitoring
component without thanos and setting thePROMETHEUS_LONGTERM_RETENTION_TIME
to whatever you'd like.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, is there a switch to disable Thanos?
Same question, in the case we want to use Thanos, how to visualize the data stored on Thanos? I assume if Thanos is enabled, the retention duration on the Prometheus side will be very short to avoid doubling the storage so without data being stored in Prometheus, how to visualize that data stored on Thanos.
Just a question. If another component is required, we can do it in a follow up Pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To answer my own question, Thanos is actually a separate component so it does not have to be enabled together with the Prometheus-long-term component? The Prometheus-long-term component can function standalone of Thanos?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's right.
prometheus-longterm-metrics
collects and stores specific metrics that we want to keep for longer fromprometheus
. If you want to also enablethanos
, thenthanos
will store those same metrics in a much more compact/efficient way so that you can store more data over a longer time period.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with forever storage for long-term-metrics. The point is to keep an archive of key metrics. If those are daily or hourly, archiving a few dozen metrics won't be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see how much space that will take in practice and we will adjust. And eventually we need a way to visualize those older metrics. Otherwise, what's the point to keep them forever if we can not visualize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious how that would work out?
It seems to depend on
prometheus:9090
and extends the volumes ofprometheus
service.I think it is more confusing to omit the dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually a completely separate service with no shared volumes. It does depend on
prometheus:9090
by default but that default can be updated by setting thePROMETHEUS_LONGTERM_TARGETS
variable.PROMETHEUS_LONGTERM_TARGETS
can point to any prometheus endpoints at all.See the discussion about this here: #461 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is that the rules are mounted under
prometheus
service here:https://github.com/bird-house/birdhouse-deploy/blob/longterm-monitoring/birdhouse/optional-components/prometheus-longterm-rules/config/monitoring/docker-compose-extra.yml
So, whether
PROMETHEUS_LONGTERM_TARGETS
refers to the same instance or a custom remote one, the local instance needs to have the long-term rules defined. Therefore, doesn't that mean thatprometheus
service (i.e.: usingcomponents/monitoring
) becomes mandatory regardless, since it is needed to mount the rules?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, prometheus must be running on the same host as the rest of the birdhouse stack.
That first prometheus instance must have at least one rule that has the
longterm-metrics
group label in order for a second prometheus instance (either running on the same machine or elsewhere) to know which metrics it should monitor and store from the first prometheus instance.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a second look at the directory structure, and starting to see why
components/monitoring
is not a dependency. Is it only becausecomponents/monitoring
combines cardvisor, grafana, alertmanager, prometheus, etc. all together?Therefore, if someone wants "only" longterm-metrics, they need to define their own
prometheus
service with a definition similar to this?birdhouse-deploy/birdhouse/components/monitoring/docker-compose-extra.yml
Lines 32 to 49 in 3f75d49
I think part of what makes all of this confusing is that there are 2 different, though related, component definitions (and I still do not understand why):
From a user's perspective, this all seems really convoluted.
Isn't there are way to simplify this hierarchy? For example, what if there was a distinct
optional-components/prometheus
definition, andcomponents/monitoring
depends on it. Then, optional-components/prometheus-longterm-metrics could also depend only onoptional-components/prometheus
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmigneault
If someone only wants longterm metrics then they can have a single prometheus service that sets
--storage.tsdb.retention.time
to something longer than 90 days.However, if you want some metrics to be stored for longer than others, you need a second prometheus instance. This is because the
--storage.tsdb.retention.time
value is set for an entire prometheus instance (see discussion here: #277 (comment) and here: #461 (comment)).These are separate in order to accommodate @tlvu's request to be able to deploy the
prometheus-longterm-metrics
component on a different server. prometheus-longterm-rules are just suggested default rules to be added to theprometheus
server. As described in the README file:These rules are added to the
prometheus
component on the same machine that is running the rest of the stack because that is where all the relevant metrics are collected by the rest of the monitoring components (cadvisor, nodeexporter). These rules are not necessary, they are recommended defaults but you can customize them however you like. It is easier to customize and add your own if the rules are separate. Users can choose to include the defaults, or not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Misha's explanations. I just want to add that each organization needs are different so we should not force a certain "configuration".
Specifically to this PR, we should not force the 2 Prometheus to be on the same machine. So
optional-components/prometheus-longterm-metrics
should not depend oncomponents/monitoring
. We should also not be forced to useoptional-components/prometheus-longterm-rules
because each org will probably have different metrics they deemed useful to keep for longterm.During Misha's leave, a sysadmin from PCIC actually had a question about how to pull existing metrics from our PAVICS Prometheus into his own centralized Prometheus for a centralized view of all his servers in one place. I point him to this PR for inspiration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. I was not clear in my explanation. What I intended to say was:
"if someone wants only
prometheus
monitoring with both long-term and short term metrics, they need to define their ownprometheus
service" [...], because they cannot activateprometheus
on its own (i.e.: they must use the full packagecomponents/monitoring
that includes all other services).Having short/long-term separate, and having them configurable on different servers is perfectly fine by me. I'm all for that flexibility, and never mentioned otherwise.
It just seems from the configuration files that, in order for long-term metrics to be sent to the remote server, the rules must be mounted into the short-term local
prometheus
(as shown below), and since thatprometheus
must exist, then it should be a dependency. I understandlong-term-prometheus
NOT being a dependency to toggle it independently of the short-term one. This is good. Here I am referring to https://github.com/bird-house/birdhouse-deploy/tree/3f75d496b932ab4d90857206003d0225cd20c435/birdhouse/optional-components/prometheus-longterm-rules havingcomponent/monitoring
dependency, NOT https://github.com/bird-house/birdhouse-deploy/tree/3f75d496b932ab4d90857206003d0225cd20c435/birdhouse/optional-components/prometheus-longterm-metrics. In other words, if one addsoptional-components/prometheus-longterm-rules
ONLY without also thinking to addcomponents/monitoring
, docker-compose will complain thatservice: prometheus
is not defined.birdhouse-deploy/birdhouse/optional-components/prometheus-longterm-rules/config/monitoring/docker-compose-extra.yml
Lines 3 to 6 in 3f75d49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I understand your concern now.
Docker compose won't actually complain in this case. Additional settings defined under the
config/
folder in components are only applied if the component that it references is also enabled. So in this case, if you addprometheus-longterm-rules
as a component but notmonitoring
, then nothing will happen (since the additional docker-compose settings are inprometheus-longterm-rules/config/monitoring
.If we do make the
prometheus-longterm-rules
component dependent on themonitoring
components then if someone adds theprometheus-longterm-rules
but not themonitoring
rule to theBIRDHOUSE_EXTRA_CONF_DIRS
configuration variable, thenmonitoring
will be added in automatically. This is also a surprising behaviour that users may not expect.My opinion is that we should leave it as is, so that users must specify both components if they want both components. I'm open to other opinions on this subject though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for validating how this resolves.
I'm fine with either approach if the README indicates clearly what is supposed to happen and what to do if the one or the other situation is desired.