Adding felix service metric port #3534

vikastigera · 2024-10-09T22:32:24Z

Description

Changes done to add felix service metric
port in calico-node service. This helps in
removing the manual step required by the client
to enable felix metric for BYO prometheus.

https://tigera.atlassian.net/browse/EV-5305

*** Additional changes required to update felix-metrics-service-monitor.yaml to this :

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    team: network-operators
  name: felix-metrics
spec:
  endpoints:
    - honorLabels: true
      interval: 5s
      port: felix-metrics-port
      scheme: http
      scrapeTimeout: 5s
  namespaceSelector:
    matchNames:
      - calico-system
  selector:
    matchLabels:
      k8s-app: calico-node

Testing

For PR author

Tests for change.
If changing pkg/apis/, run make gen-files
If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

Milestone set according to targeted release.
Appropriate labels:
- kind/bug if this is a bugfix.
- kind/enhancement if this is a a new feature.
- enterprise if this PR applies to Calico Enterprise only.

tmjd · 2024-10-10T14:23:36Z

There is a PrometheusMetricsPort in FelixConfig that should be used instead of always using a built-in default value.

rene-dekker · 2024-10-10T21:13:53Z

@tmjd
There is a PrometheusMetricsPort in FelixConfig that should be used instead of always using a built-in default value.

There is also in migration/core.go a migration that sets installation.spec.NodeMetricsPort. Should we perhaps:

In the core_controller verify installation.spec.NodeMetricsPort and store it in a new variable "nodeMetricsPort"
If variable does not exist, use *felixConfiguration.Spec.PrometheusMetricsPort if it exists
pass add new variable to the NodeConfiguration object
In node.go, if NodeConfiguration.NodeMetricsPort is not nil, we add it to the service.

tmjd · 2024-10-11T14:05:55Z

Was felixConfiguration.Spec.PrometheusMetricsPort a new field that has been added? I'm wondering if that field was needed if we already had NodeMetricsPort in the operator.

If both NodeMetricsPort and PrometheusMetricsPort are set and are different should the operator report that as a problem?

vikastigera · 2024-10-24T21:42:33Z

Was felixConfiguration.Spec.PrometheusMetricsPort a new field that has been added? I'm wondering if that field was needed if we already had NodeMetricsPort in the operator.

If both NodeMetricsPort and PrometheusMetricsPort are set and are different should the operator report that as a problem?

PrometheusMetricsPort is not a new field.
I have updated the code to use NodeMetricsPort and in case it is nil it will use PrometheusMetricsPort or default value. (

operator/api/v1/installation_types.go

Line 119 in ea0f4fb

    
           // If specified, this overrides any FelixConfiguration resources which may exist. If omitted, then

)

tmjd · 2024-10-28T13:43:27Z

So I guess one thing I'm unclear of, should the installationl.NodeMetricsPort and felixConfig.PrometheusMetricsPort be configuring the same thing or are they different?
I think the answer is they are different and also I think I was wrong originally mentioning NodeMetricsPort since it is used to configure calico-metrics-port but PrometheusMetricsPort is used for felix-metrics-port. For some reason I was thinking they were for the same thing but I think clearly they are not.

vikastigera · 2024-11-06T23:37:12Z

@tmjd
As per the documentation and code :
"NodeMetricsPort specifies which port calico/node serves prometheus metrics on. By default, metrics are not enabled.If specified, this overrides any FelixConfiguration resources which may exist. If omitted, then prometheus metrics may still be configured through FelixConfiguration."

Changes I have done is specific to enabling Felix metric for "Bring your own Prometheus" use case (https://docs.tigera.io/calico-enterprise/latest/operations/monitor/prometheus/byo-prometheus#scrape-metrics-from-specific-components).
Also, calico-metrics-port and felix-metrics-port are for different purpose.

tmjd · 2024-11-07T20:50:16Z

pkg/render/node.go

+	if c.cfg.Installation.NodeMetricsPort != nil {
+		return *c.cfg.Installation.NodeMetricsPort
+	}


Since the two metrics ports are unrelated, other than providing some metrics information. I don't think there is any reason we should use NodeMetricsPort here.
If this is needed then I'd suggest this logic should be moved to the core_controller.

updated to only use PrometheusMetricsPort

tmjd

LGTM as far as operator API and operator code quality. Was there a review that this is appropriate for Calico/Enterprise, for example should the felix metrics port be exposed on the same service as the node metrics or should there be a separate service?

Changes done to add felix service metric port in calico-node service. This helps in removing the manual step required by the client to enable felix metric for BYO prometheus.

vikastigera · 2024-11-19T23:49:06Z

LGTM as far as operator API and operator code quality. Was there a review that this is appropriate for Calico/Enterprise, for example should the felix metrics port be exposed on the same service as the node metrics or should there be a separate service?

Rene and I had discussion to add this in same service as we are using using "calico-node" in the selector. Also we discussed that ServiceMonitor also needs to be updated, I have updated the code for that.

tmjd · 2024-11-21T20:20:00Z

@rene-dekker did you want to review this then to make sure it follows what you discussed with @vikastigera ?

rene-dekker · 2024-11-21T23:48:40Z

pkg/controller/monitor/monitor_controller.go

@@ -373,18 +379,24 @@ func (r *ReconcileMonitor) Reconcile(ctx context.Context, request reconcile.Requ
 		return reconcile.Result{}, err
 	}

+	felixConfiguration, err := utils.GetFelixConfiguration(ctx, r.client)
+	if err != nil {
+		log.Error(err, "Error retrieving Felix configuration")


Is there a reason that we log and move on as opposed to degrading and returning? Or would that create some sort of deadlock with the core controller?

Good question, I wouldn't expect it to be problematic.

rene-dekker · 2024-11-21T23:55:50Z

Was there a review that this is appropriate for Calico/Enterprise, for example should the felix metrics port be exposed on the same service as the node metrics or should there be a separate service?

I think the vision we have around scraping is that we prefer users to get the metrics by scraping our prometheus and have our prometheus scrape node/felix. Users are struggling with the mTLS configuration and by leveraging the ExternalPrometheus option that could be simplified. With that in mind, I don't see any drawback from this, unless you can think of any reason.

tmjd · 2024-11-22T15:00:22Z

Was there a review that this is appropriate for Calico/Enterprise, for example should the felix metrics port be exposed on the same service as the node metrics or should there be a separate service?

I think the vision we have around scraping is that we prefer users to get the metrics by scraping our prometheus and have our prometheus scrape node/felix. Users are struggling with the mTLS configuration and by leveraging the ExternalPrometheus option that could be simplified. With that in mind, I don't see any drawback from this, unless you can think of any reason.

I wasn't trying to suggest a problem with it, I just wanted to make sure someone had reviewed this change from the aspect of is this the right solution for Enterprise and addressed the problem it was targeting.

rene-dekker

lgtm

vikastigera requested a review from a team as a code owner October 9, 2024 22:32

marvin-tigera added this to the v1.37.0 milestone Oct 9, 2024

marvin-tigera added docs-pr-required release-note-required labels Oct 9, 2024

vikastigera force-pushed the EV-5305 branch from 6ba839d to 6ad5b40 Compare October 24, 2024 18:49

tmjd reviewed Nov 7, 2024

View reviewed changes

vikastigera requested a review from tmjd November 16, 2024 01:46

tmjd approved these changes Nov 18, 2024

View reviewed changes

vikastigera added 5 commits November 19, 2024 11:10

Adding felix service metric port

c5b2c60

Changes done to add felix service metric port in calico-node service. This helps in removing the manual step required by the client to enable felix metric for BYO prometheus.

Updating port selection logic

2c31a18

fixing static check

4a2a980

Changing logic to use FelixPrometheusMetricsPort

cdc86c3

Adding felix-metrics-port endpoint to ServiceMonitor

12ba733

vikastigera force-pushed the EV-5305 branch from 02d1a96 to 12ba733 Compare November 19, 2024 19:11

vikastigera added 2 commits November 19, 2024 12:21

Logging error

423c257

PR update

02d9be0

rene-dekker reviewed Nov 21, 2024

View reviewed changes

Fixing error handling

57ccfca

rene-dekker approved these changes Nov 23, 2024

View reviewed changes

rene-dekker merged commit 9cb3733 into tigera:master Nov 23, 2024
5 checks passed

vikastigera deleted the EV-5305 branch November 25, 2024 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding felix service metric port #3534

Adding felix service metric port #3534

vikastigera commented Oct 9, 2024

tmjd commented Oct 10, 2024

rene-dekker commented Oct 10, 2024 •

edited

Loading

tmjd commented Oct 11, 2024

vikastigera commented Oct 24, 2024

tmjd commented Oct 28, 2024

vikastigera commented Nov 6, 2024

tmjd Nov 7, 2024

vikastigera Nov 16, 2024

tmjd left a comment

vikastigera commented Nov 19, 2024

tmjd commented Nov 21, 2024

rene-dekker Nov 21, 2024

tmjd Nov 22, 2024

vikastigera Nov 22, 2024

rene-dekker commented Nov 21, 2024

tmjd commented Nov 22, 2024

rene-dekker left a comment

Adding felix service metric port #3534

Adding felix service metric port #3534

Conversation

vikastigera commented Oct 9, 2024

Description

Testing

For PR author

For PR reviewers

tmjd commented Oct 10, 2024

rene-dekker commented Oct 10, 2024 • edited Loading

tmjd commented Oct 11, 2024

vikastigera commented Oct 24, 2024

tmjd commented Oct 28, 2024

vikastigera commented Nov 6, 2024

tmjd Nov 7, 2024

Choose a reason for hiding this comment

vikastigera Nov 16, 2024

Choose a reason for hiding this comment

tmjd left a comment

Choose a reason for hiding this comment

vikastigera commented Nov 19, 2024

tmjd commented Nov 21, 2024

rene-dekker Nov 21, 2024

Choose a reason for hiding this comment

tmjd Nov 22, 2024

Choose a reason for hiding this comment

vikastigera Nov 22, 2024

Choose a reason for hiding this comment

rene-dekker commented Nov 21, 2024

tmjd commented Nov 22, 2024

rene-dekker left a comment

Choose a reason for hiding this comment

rene-dekker commented Oct 10, 2024 •

edited

Loading