Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(config-cache): initial stub for cluster config cache service #3803

Conversation

karol-kokoszka
Copy link
Collaborator

Fixes #3802

This is just an initial stub for new cluster-config-cache service.

General description of the goal is here #3767 (comment)


Please make sure that:

  • Code is split to commits that address a single change
  • Commit messages are informative
  • Commit titles have module prefix
  • Commit titles have issue nr. suffix

@karol-kokoszka karol-kokoszka changed the base branch from master to feature-branch_config_cache_service April 18, 2024 08:58
@karol-kokoszka karol-kokoszka force-pushed the kk/3767-cache-nodeinfo-outside-healthcheck branch from 41e3c5b to e14498f Compare April 18, 2024 08:59
@karol-kokoszka karol-kokoszka force-pushed the kk/3767-cache-nodeinfo-outside-healthcheck branch 2 times, most recently from 2154611 to 3247350 Compare April 18, 2024 12:52
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor comments.

Comment on lines 24 to 31
const (
// CQL defines cql connection type.
CQL ConnectionType = iota
// Alternator defines alternator connection type.
Alternator

updateFrequency = 30 * time.Minute
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer to separate enums and other consts by placing them in separate blocks.
What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what is the benefit of doing so over keeping them in one block split with empty line ?
There is actually no enum in Go, the ConnectionType is just a work around for this lack of enum.

Compiler would still accept

	config.TLSConfig[5] = &tlsConfigWithAddress{}

Comment on lines +171 to +198
for _, p := range []ConnectionType{CQL, Alternator} {
var tlsEnabled, clientCertAuth bool
var address string
if p == CQL {
address = nodeInfoResp.CQLAddr(host)
tlsEnabled, clientCertAuth = nodeInfoResp.CQLTLSEnabled()
tlsEnabled = tlsEnabled && !c.ForceTLSDisabled
if tlsEnabled && !c.ForceNonSSLSessionPort {
address = nodeInfoResp.CQLSSLAddr(host)
}
} else if p == Alternator {
tlsEnabled, clientCertAuth = nodeInfoResp.AlternatorTLSEnabled()
address = nodeInfoResp.AlternatorAddr(host)
}
if tlsEnabled {
tlsConfig, err := svc.tlsConfig(c.ID, clientCertAuth)
if err != nil && !errors.Is(err, service.ErrNotFound) {
return config, errors.Wrap(err, "fetch TLS config")
}
if clientCertAuth && errors.Is(err, service.ErrNotFound) {
return config, errors.Wrap(err, "client encryption is enabled, but certificate is missing")
}
config.TLSConfig[p] = &tlsConfigWithAddress{
Config: tlsConfig,
Address: address,
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that's how it was done before, but I don't see the reasons for keeping a map consisting of only two entries. Also this for loop which fills this map is cloudy as CQL and ALTERNATOR are still if-ed in the first half of the loop. Also getting configs from map is less clear (are the configs always there? Should we validate that every time we want to get them?)

Wouldn't it be better to directly store 2 TLS configs in Node config?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.
Created an issue to not jump into the chaos with the PRs.
It will be addressed as a part of the epic.

// Alternator defines alternator connection type.
Alternator

updateFrequency = 30 * time.Minute
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we update so rarely? Shouldn't it be something like 5min?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually can, I have no strong opinion though.
If the cache is not updated, but some configuration changed in meantime, then manager may misbehave for some time (if we spread the cache access to all services, not the healtcheck only).

But this is something what we would met with the current way of caching the nodeinfo as well.
Let me bring the default from previous approach. So the 5min you mentioned.

Comment on lines +133 to +141
go func() {
client, err := svc.scyllaClient(ctx, c.ID)
if err != nil {
fmt.Println(err)
}

// Hosts that are going to be asked about the configuration are exactly the same as
// the ones used by the scylla client.
hostsWg := sync.WaitGroup{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be safer to wait for all updates before returning from this function. Otherwise it's possible that a single cluster will be updated in parallel by different calls to update function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right and it's addressed in the PR fixing the next issue.
Shame on me, of not doing it here, but I had to split the code and realized exactly the same what you pointed here.

Let me merge this PR to feature branch and you can continue with the review of next PR.

}

go func() {
client, err := svc.scyllaClient(ctx, c.ID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that this opens a client without closing it. We probably can't close it because of the cache, but then maybe it makes sense to create client from scratch so that it can be closed right after?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually doesn't lead to TCP connections leak. Here is why.
Keep-alive is enabled by default on HTTP level in go, what means that the client will reuse TCP connection for multiple requests to the same server. That is good.
Scylla-manager explicitly sets the keep-alive on TCP level too

KeepAlive: 30 * time.Second,
(and it's enabled by default in go btw but the value is 15 seconds).
Setting keep-alive on TCP level means that inactive connections are eventually detected and closed by the operating system.

The scylla client cache is OK as is, we don't need to explicitly close the client.

@karol-kokoszka karol-kokoszka force-pushed the kk/3767-cache-nodeinfo-outside-healthcheck branch from 3247350 to b0bc2d1 Compare April 19, 2024 13:40
@karol-kokoszka
Copy link
Collaborator Author

@Michal-Leszczynski I'm merging it to feature branch.

@karol-kokoszka karol-kokoszka merged commit afa59b0 into feature-branch_config_cache_service Apr 19, 2024
40 of 56 checks passed
@karol-kokoszka karol-kokoszka deleted the kk/3767-cache-nodeinfo-outside-healthcheck branch April 19, 2024 15:51
@karol-kokoszka karol-kokoszka mentioned this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initial stub
2 participants