-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACM-14962: init MCOA metrics #1659
base: main
Are you sure you want to change the base?
ACM-14962: init MCOA metrics #1659
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: thibaultmg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
553efbb
to
413717f
Compare
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
/retest |
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts
{ | ||
ConfigGroupResource: addonapiv1alpha1.ConfigGroupResource{ | ||
Group: prometheusalpha1.SchemeGroupVersion.Group, | ||
Resource: prometheusalpha1.ScrapeConfigName, | ||
}, | ||
ConfigReferent: addonapiv1alpha1.ConfigReferent{ | ||
Name: "platform-metrics-default", | ||
Namespace: mcoconfig.GetDefaultNamespace(), | ||
}, | ||
}, | ||
{ | ||
ConfigGroupResource: addonapiv1alpha1.ConfigGroupResource{ | ||
Group: prometheusv1.SchemeGroupVersion.Group, | ||
Resource: prometheusv1.PrometheusRuleName, | ||
}, | ||
ConfigReferent: addonapiv1alpha1.ConfigReferent{ | ||
Name: "platform-rules-default", | ||
Namespace: mcoconfig.GetDefaultNamespace(), | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want to reference these ones out of the get go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My aim is to have it working out of the box for the base case. It lists here the metrics we need to collect for having the platform dashboards working. What are you thinking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of not listing them here since they are defaults, here I feel it would only make sense to list extra metrics/rules the user would like to collect/have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that we must let the user the ability to tune this list himself. As of now, he must be able to remove metrics from this list. However, the need for most or our users is to collect more metrics.
This is something I must discuss again with our PM. Maybe this will change later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could probably remove the rules more easily however. I'll think about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there I a need I'm not opposed to having them here I was just double checking 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaving it like this for now. I might remove some of them after reviewing the list of needed metrics, in another PR.
operators/multiclusterobservability/pkg/rendering/renderer_mcoa.go
Outdated
Show resolved
Hide resolved
// PlatformMetricsCollectionSpec defines the spec for the addon to collect and forward metrics | ||
// from fleet managed clusters. | ||
type PlatformMetricsCollectionSpec struct { | ||
// Enabled defines a flag to enable/disable the platform metrics collection. | ||
// | ||
// +optional | ||
// +kubebuilder:validation:Optional | ||
Enabled bool `json:"enabled,omitempty"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this initial version since metrics will be taking a managed approach to metric collection I wonder if these values are the best ones to use since the equivalent for logs & traces currently correlates to an unmanaged approach where the user fully controls the root resources. I would be in favor of deliberating a bit how we want these different scenarios to be exposed to users. This correlates with stolostron/multicluster-observability-addon#88 (comment) maybe a discussion for the nexus syncs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean that you expect different fields in the MCO CRD to have either managed or unmanaged deployments?
It would be very interesting to discuss those concepts in the next nexus sync.
On the metrics side, I am not sure there is demand for unmanaged type of deployments 🤷 . The expressed need is to replicate what we have now. We need something in between with some managed fields and some that we can leave it to the user for fine tuning.
Let's discuss all that in the nexus sync 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% let's discuss on nexus sync 👍 just writing these down so we can go back to them:
- if a signal is providing a turnkey solution then it will most likely be managed since that's the only way to reasonably offer support the storage side
- maybe some use-cases only make sense to be managed (storage for instance, if you enable storage you are opting into the managed use case)
- otherwise if we want to provide the unmanaged use-case for storage then we have to think how will it look like on the MCO CR
- whatever we decide we should make sure that the CR feels similar across signals
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregarding the discussion how the capabilities section will look like, I still have to manually test it but apart from that I would put it in lgtm territory
operators/multiclusterobservability/pkg/rendering/renderer_mcoa_test.go
Outdated
Show resolved
Hide resolved
baseCMA := &addonv1alpha1.ClusterManagementAddOn{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Labels: map[string]string{ | ||
"k1": "v1", | ||
}, | ||
}, | ||
Spec: addonv1alpha1.ClusterManagementAddOnSpec{ | ||
InstallStrategy: addonv1alpha1.InstallStrategy{ | ||
Placements: []addonv1alpha1.PlacementStrategy{ | ||
{ | ||
Configs: []addonv1alpha1.AddOnConfig{}, | ||
}, | ||
}, | ||
}, | ||
}, | ||
} | ||
// to kustomize resource | ||
baseCMAUnstructured, err := runtime.DefaultUnstructuredConverter.ToUnstructured(baseCMA) | ||
assert.NoError(t, err) | ||
kres := &kustomizeres.Resource{} | ||
err = runtime.DefaultUnstructuredConverter.FromUnstructured(baseCMAUnstructured, kres) | ||
assert.NoError(t, err) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do it this way VS the way we did it for TestRenderAddonDeploymentConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? I'm testing the ClusterManagementAddOn, not the AddOnDeploymentConfig.
dups := make(map[string]struct{}) | ||
for _, cfg := range cma.Spec.SupportedConfigs { | ||
key := cfg.ConfigGroupResource.Group + "/" + cfg.ConfigGroupResource.Resource | ||
if _, ok := dups[key]; ok { | ||
t.Errorf("duplicated supportedConfigs %s", key) | ||
} | ||
dups[key] = struct{}{} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we checking for duplicates? This is actually something that is supported by the addon-framework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addon framework will fail at runtime, I'm just checking it earlier at unit-test time.
Signed-off-by: Thibault Mange <[email protected]>
Quality Gate passedIssues Measures |
This PR initialises metrics collection for MCOA. More precisely, it:
Relates to the MCOA PR: stolostron/multicluster-observability-addon#77
Notes: