Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACM-14962: init MCOA metrics #1659

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

thibaultmg
Copy link
Contributor

@thibaultmg thibaultmg commented Nov 7, 2024

This PR initialises metrics collection for MCOA. More precisely, it:

  • Updates MCO CRD to support the metrics capability for both platform and user workload.
  • Adds customised variables to the addon configuration.
  • Adds permissions to MCOA for handling metrics resources.
  • Generates only the configs for the enabled signals to avoid having to update it for getting a valid addon status

Relates to the MCOA PR: stolostron/multicluster-observability-addon#77

Notes:

Copy link

openshift-ci bot commented Nov 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: thibaultmg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Nov 7, 2024
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
@thibaultmg thibaultmg force-pushed the ACM_14962_init_mocoa_metrics branch from 553efbb to 413717f Compare November 22, 2024 13:09
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
@thibaultmg
Copy link
Contributor Author

/retest

Signed-off-by: Thibault Mange <[email protected]>
Signed-off-by: Thibault Mange <[email protected]>
Copy link
Contributor

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts

Comment on lines +195 to +214
{
ConfigGroupResource: addonapiv1alpha1.ConfigGroupResource{
Group: prometheusalpha1.SchemeGroupVersion.Group,
Resource: prometheusalpha1.ScrapeConfigName,
},
ConfigReferent: addonapiv1alpha1.ConfigReferent{
Name: "platform-metrics-default",
Namespace: mcoconfig.GetDefaultNamespace(),
},
},
{
ConfigGroupResource: addonapiv1alpha1.ConfigGroupResource{
Group: prometheusv1.SchemeGroupVersion.Group,
Resource: prometheusv1.PrometheusRuleName,
},
ConfigReferent: addonapiv1alpha1.ConfigReferent{
Name: "platform-rules-default",
Namespace: mcoconfig.GetDefaultNamespace(),
},
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to reference these ones out of the get go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My aim is to have it working out of the box for the base case. It lists here the metrics we need to collect for having the platform dashboards working. What are you thinking about?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of not listing them here since they are defaults, here I feel it would only make sense to list extra metrics/rules the user would like to collect/have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we must let the user the ability to tune this list himself. As of now, he must be able to remove metrics from this list. However, the need for most or our users is to collect more metrics.
This is something I must discuss again with our PM. Maybe this will change later on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could probably remove the rules more easily however. I'll think about it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there I a need I'm not opposed to having them here I was just double checking 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving it like this for now. I might remove some of them after reviewing the list of needed metrics, in another PR.

Comment on lines +71 to +79
// PlatformMetricsCollectionSpec defines the spec for the addon to collect and forward metrics
// from fleet managed clusters.
type PlatformMetricsCollectionSpec struct {
// Enabled defines a flag to enable/disable the platform metrics collection.
//
// +optional
// +kubebuilder:validation:Optional
Enabled bool `json:"enabled,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this initial version since metrics will be taking a managed approach to metric collection I wonder if these values are the best ones to use since the equivalent for logs & traces currently correlates to an unmanaged approach where the user fully controls the root resources. I would be in favor of deliberating a bit how we want these different scenarios to be exposed to users. This correlates with stolostron/multicluster-observability-addon#88 (comment) maybe a discussion for the nexus syncs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that you expect different fields in the MCO CRD to have either managed or unmanaged deployments?
It would be very interesting to discuss those concepts in the next nexus sync.
On the metrics side, I am not sure there is demand for unmanaged type of deployments 🤷 . The expressed need is to replicate what we have now. We need something in between with some managed fields and some that we can leave it to the user for fine tuning.
Let's discuss all that in the nexus sync 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% let's discuss on nexus sync 👍 just writing these down so we can go back to them:

  • if a signal is providing a turnkey solution then it will most likely be managed since that's the only way to reasonably offer support the storage side
  • maybe some use-cases only make sense to be managed (storage for instance, if you enable storage you are opting into the managed use case)
  • otherwise if we want to provide the unmanaged use-case for storage then we have to think how will it look like on the MCO CR
  • whatever we decide we should make sure that the CR feels similar across signals

Signed-off-by: Thibault Mange <[email protected]>
Copy link
Contributor

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregarding the discussion how the capabilities section will look like, I still have to manually test it but apart from that I would put it in lgtm territory

Comment on lines +358 to +380
baseCMA := &addonv1alpha1.ClusterManagementAddOn{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"k1": "v1",
},
},
Spec: addonv1alpha1.ClusterManagementAddOnSpec{
InstallStrategy: addonv1alpha1.InstallStrategy{
Placements: []addonv1alpha1.PlacementStrategy{
{
Configs: []addonv1alpha1.AddOnConfig{},
},
},
},
},
}
// to kustomize resource
baseCMAUnstructured, err := runtime.DefaultUnstructuredConverter.ToUnstructured(baseCMA)
assert.NoError(t, err)
kres := &kustomizeres.Resource{}
err = runtime.DefaultUnstructuredConverter.FromUnstructured(baseCMAUnstructured, kres)
assert.NoError(t, err)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do it this way VS the way we did it for TestRenderAddonDeploymentConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? I'm testing the ClusterManagementAddOn, not the AddOnDeploymentConfig.

Comment on lines +396 to +403
dups := make(map[string]struct{})
for _, cfg := range cma.Spec.SupportedConfigs {
key := cfg.ConfigGroupResource.Group + "/" + cfg.ConfigGroupResource.Resource
if _, ok := dups[key]; ok {
t.Errorf("duplicated supportedConfigs %s", key)
}
dups[key] = struct{}{}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we checking for duplicates? This is actually something that is supported by the addon-framework

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addon framework will fail at runtime, I'm just checking it earlier at unit-test time.

Signed-off-by: Thibault Mange <[email protected]>
Copy link

sonarcloud bot commented Dec 16, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants