feat(config-cache): initial stub for cluster config cache service #3803

karol-kokoszka · 2024-04-15T13:54:01Z

Fixes #3802

This is just an initial stub for new cluster-config-cache service.

General description of the goal is here #3767 (comment)

Please make sure that:

Code is split to commits that address a single change
Commit messages are informative
Commit titles have module prefix
Commit titles have issue nr. suffix

Michal-Leszczynski

Looks good, just some minor comments.

Michal-Leszczynski · 2024-04-19T09:27:34Z

pkg/service/configcache/service.go

+const (
+	// CQL defines cql connection type.
+	CQL ConnectionType = iota
+	// Alternator defines alternator connection type.
+	Alternator
+
+	updateFrequency = 30 * time.Minute
+)


I think it would be nicer to separate enums and other consts by placing them in separate blocks.
What do you think?

And what is the benefit of doing so over keeping them in one block split with empty line ?
There is actually no enum in Go, the ConnectionType is just a work around for this lack of enum.

Compiler would still accept

config.TLSConfig[5] = &tlsConfigWithAddress{}

Michal-Leszczynski · 2024-04-19T09:42:45Z

pkg/service/configcache/service.go

+	for _, p := range []ConnectionType{CQL, Alternator} {
+		var tlsEnabled, clientCertAuth bool
+		var address string
+		if p == CQL {
+			address = nodeInfoResp.CQLAddr(host)
+			tlsEnabled, clientCertAuth = nodeInfoResp.CQLTLSEnabled()
+			tlsEnabled = tlsEnabled && !c.ForceTLSDisabled
+			if tlsEnabled && !c.ForceNonSSLSessionPort {
+				address = nodeInfoResp.CQLSSLAddr(host)
+			}
+		} else if p == Alternator {
+			tlsEnabled, clientCertAuth = nodeInfoResp.AlternatorTLSEnabled()
+			address = nodeInfoResp.AlternatorAddr(host)
+		}
+		if tlsEnabled {
+			tlsConfig, err := svc.tlsConfig(c.ID, clientCertAuth)
+			if err != nil && !errors.Is(err, service.ErrNotFound) {
+				return config, errors.Wrap(err, "fetch TLS config")
+			}
+			if clientCertAuth && errors.Is(err, service.ErrNotFound) {
+				return config, errors.Wrap(err, "client encryption is enabled, but certificate is missing")
+			}
+			config.TLSConfig[p] = &tlsConfigWithAddress{
+				Config:  tlsConfig,
+				Address: address,
+			}
+		}
+	}


I know that's how it was done before, but I don't see the reasons for keeping a map consisting of only two entries. Also this for loop which fills this map is cloudy as CQL and ALTERNATOR are still if-ed in the first half of the loop. Also getting configs from map is less clear (are the configs always there? Should we validate that every time we want to get them?)

Wouldn't it be better to directly store 2 TLS configs in Node config?

You are right.
Created an issue to not jump into the chaos with the PRs.
It will be addressed as a part of the epic.

Michal-Leszczynski · 2024-04-19T10:10:22Z

pkg/service/configcache/service.go

+	// Alternator defines alternator connection type.
+	Alternator
+
+	updateFrequency = 30 * time.Minute


Why do we update so rarely? Shouldn't it be something like 5min?

It actually can, I have no strong opinion though.
If the cache is not updated, but some configuration changed in meantime, then manager may misbehave for some time (if we spread the cache access to all services, not the healtcheck only).

But this is something what we would met with the current way of caching the nodeinfo as well.
Let me bring the default from previous approach. So the 5min you mentioned.

Michal-Leszczynski · 2024-04-19T10:17:44Z

pkg/service/configcache/service.go

+		go func() {
+			client, err := svc.scyllaClient(ctx, c.ID)
+			if err != nil {
+				fmt.Println(err)
+			}
+
+			// Hosts that are going to be asked about the configuration are exactly the same as
+			// the ones used by the scylla client.
+			hostsWg := sync.WaitGroup{}


I think it would be safer to wait for all updates before returning from this function. Otherwise it's possible that a single cluster will be updated in parallel by different calls to update function.

yes, you are right and it's addressed in the PR fixing the next issue.
Shame on me, of not doing it here, but I had to split the code and realized exactly the same what you pointed here.

Let me merge this PR to feature branch and you can continue with the review of next PR.

Michal-Leszczynski · 2024-04-19T10:23:41Z

pkg/service/configcache/service.go

+		}
+
+		go func() {
+			client, err := svc.scyllaClient(ctx, c.ID)


Just a note that this opens a client without closing it. We probably can't close it because of the cache, but then maybe it makes sense to create client from scratch so that it can be closed right after?

It actually doesn't lead to TCP connections leak. Here is why.
Keep-alive is enabled by default on HTTP level in go, what means that the client will reuse TCP connection for multiple requests to the same server. That is good.
Scylla-manager explicitly sets the keep-alive on TCP level too

scylla-manager/pkg/scyllaclient/client.go

Line 52 in 701b8cb

KeepAlive: 30 * time.Second,

(and it's enabled by default in go btw but the value is 15 seconds).
Setting keep-alive on TCP level means that inactive connections are eventually detected and closed by the operating system.

The scylla client cache is OK as is, we don't need to explicitly close the client.

Fixes #3802

karol-kokoszka · 2024-04-19T15:50:46Z

@Michal-Leszczynski I'm merging it to feature branch.

karol-kokoszka requested a review from Michal-Leszczynski as a code owner April 15, 2024 13:54

karol-kokoszka changed the base branch from master to feature-branch_config_cache_service April 18, 2024 08:58

karol-kokoszka force-pushed the kk/3767-cache-nodeinfo-outside-healthcheck branch from 41e3c5b to e14498f Compare April 18, 2024 08:59

fix(cluster-svc): introduce interface defining functionality

80775bd

karol-kokoszka force-pushed the kk/3767-cache-nodeinfo-outside-healthcheck branch 2 times, most recently from 2154611 to 3247350 Compare April 18, 2024 12:52

karol-kokoszka mentioned this pull request Apr 18, 2024

Error handling + initial cache update possibility #3812

Merged

Michal-Leszczynski reviewed Apr 19, 2024

View reviewed changes

feat(config-cache): initial stub for cluster config cache service

b0bc2d1

Fixes #3802

karol-kokoszka force-pushed the kk/3767-cache-nodeinfo-outside-healthcheck branch from 3247350 to b0bc2d1 Compare April 19, 2024 13:40

karol-kokoszka merged commit afa59b0 into feature-branch_config_cache_service Apr 19, 2024
40 of 56 checks passed

karol-kokoszka deleted the kk/3767-cache-nodeinfo-outside-healthcheck branch April 19, 2024 15:51

karol-kokoszka mentioned this pull request Apr 22, 2024

Initial stub #3802

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(config-cache): initial stub for cluster config cache service #3803

feat(config-cache): initial stub for cluster config cache service #3803

karol-kokoszka commented Apr 15, 2024

Michal-Leszczynski left a comment

Michal-Leszczynski Apr 19, 2024

karol-kokoszka Apr 19, 2024

Michal-Leszczynski Apr 19, 2024

karol-kokoszka Apr 19, 2024

Michal-Leszczynski Apr 19, 2024

karol-kokoszka Apr 19, 2024

Michal-Leszczynski Apr 19, 2024

karol-kokoszka Apr 19, 2024

Michal-Leszczynski Apr 19, 2024

karol-kokoszka Apr 19, 2024

karol-kokoszka commented Apr 19, 2024

feat(config-cache): initial stub for cluster config cache service #3803

feat(config-cache): initial stub for cluster config cache service #3803

Conversation

karol-kokoszka commented Apr 15, 2024

Michal-Leszczynski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karol-kokoszka commented Apr 19, 2024