Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 3.2.6 #3727

Merged
merged 52 commits into from
Feb 26, 2024
Merged

Release 3.2.6 #3727

merged 52 commits into from
Feb 26, 2024

Conversation

@Michal-Leszczynski
Copy link
Collaborator

@karol-kokoszka this branch is still missing some commits, e.g. f1aef7d - results in incorrect test env setup in gh actions:

Warning: Unexpected input(s) 'raft-enabled', valid inputs are ['scylla-version', 'ip-family', 'start-dev-env']

@karol-kokoszka
Copy link
Collaborator Author

looks I have to cherry pick the commits again, it will be easier. Last non-docs related included is 55fbc26

dependabot bot and others added 26 commits February 26, 2024 09:27
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.14.0 to 0.17.0.
- [Commits](golang/crypto@v0.14.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
This test used to be failing with Scylla 5.4 and IPV6 because of different IPV6 string representations.
There are some differences in schema propagation in Scylla 5.4.0 with consistent_cluster_management enabled: scylladb/scylladb#16349

With this setup, given scenario could fail:
- backup dc1 to loc1 and dc2 to loc2
- restore schema only from loc1 (so restore schema only to dc1)

The result could be that nodes from dc1 have new, correct schema, while nodes from dc2 have the same schema as before the restore.

For this reason, it's safer to download schema from every location to every node with location access. This does not solve the whole issue, but makes restore schema more likely to succeed.
Since Scylla 5.4 is out, and we support only 2 recent minor releases, we don't need to test SM against Scylla 5.1 anymore.
Since we support cluster with and without raft schema changes, we should test both of those cases (especially for restore schema tests).
Fixes #3644
With this commit it's possible to run start-dev-env SKIP_GOSSIP=true which results in adding 'skip_wait_for_gossip_to_settle: 0' to scylla.yaml. Recently improved testing cluster setup seems to work fine (and faster) with that. However, not waiting for gossip breaks restore schema tests. This means that if someone wants to work on anything except restore schema, they could use this option to make testing cluster setup way faster.
Since StorageServiceRepairStatus (without timeout param) returns only when the repair job has finished, we shouldn't time out on our end (even if backoff retry could handle that).

This resulted in many backoff errors in SM logs even on successful repair:
{"L":"INFO","T":"2023-12-01T01:31:54.398Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceRepairStatus","wait":"28.607063257s","error":"after 16m0s: context deadline exceeded","_trace_id":"uqSjtDSfRoOl1WhetiLPgA"}
Since StorageServiceSstablesByKeyspacePost returns only when load&stream has finished, we shouldn't time out on our end (even if backoff retry could handle that).

This resulted in many backoff errors in SM logs even on successful restore:
{"L":"INFO","T":"2023-11-30T23:03:07.117Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceSstablesByKeyspacePost","wait":"999.175032ms","error":"after 30s: context deadline exceeded","_trace_id":"uqSjtDSfRoOl1WhetiLPgA"}
There are still some bugs (one described in the issue below) regarding using float intensity. In order to get rid of them, we should only tolerate float intensity at the entrypoint, but completely remove it from SM repair internals (both in service and progress display).

We keep float intensity in task properties and in swagger endpoint parameters, but we convert it to int intensity as a first thing in internals, which happens in:
- GetTarget (from task properties)
- sctool repair control endpoint (from query param)

Fixes #3665
Also, add timeout on first node setup, as misconfiguration could lead to hanging at this step.
Newer Scylla versions (e.g. 2024) docker images don't run ssh server on them own, but we require it for some of SM tests.
Because of problems with restoring backups into Scylla 5.4 with raft schema enabled (#3662), we want to test the following workaround:

- use fresh cluster without raft schema
- restore as usual
- enable raft schema in the cluster

In order to do that, we leave raft schema on src cluster and test how it works with raft schema enabled/disabled on dst cluster.
Removing filtering was done so that our tests can pass with Scylla 5.4 and raft enabled, but it didn't improve the real life situations where agents don't have cross region remote location access.
…ing CQL SSL

Previously, SSL was preferred when client_encryption_options.enabled
coming from ScyllaDB configuration was true and SSL port is open,
even when Scylla Manager did not have any client certificate registered
for particular cluster.

This caused issues when ScyllaDB cluster was exposing both CQL and CQL
SSL with mTLS, because even when Manager was not registered with
certificates, it still insisted to establish sessions using SSL port.
CQL healthchecks was also affected.

Fixes #3698
"/storage_service/describe_ring/" returns the token range of a
random keyspace. this API is never used in production. what is used
is is cousin which accepts a mandatory keyspace path parameter, like
"/storage_service/describe_ring/{keyspace}", so let's drop it in
scylla-manager, we will drop this API in scylla as well.

Signed-off-by: Kefu Chai <[email protected]>
karol-kokoszka and others added 25 commits February 26, 2024 09:33
…use non SSL port on cluster

Manager 3.2.6 gives a possibility of explicitly disabling TLS on session even though the
certificate and key are available `force_tls_disabled`.
Besides that, there is an option to force session to always use non-tls port from scylla config `force_non_ssl_session_port`.
It adds information about `force-tls-disabled` and `force-non-ssl-session-port` flags.
The fix for #3707 includes extending the cluster kept in the DB with the 'host'
property, which represents the initial host used when adding a cluster to Scylla Manager. Scylla-operator passes the DNS name here,
making it immune to the ephemeral IPs in the Kubernetes environment. This commit adds the initial host as the first host to query
in order to discover node IPs.
…ndpoint

This endpoint returns 'consistent_cluster_management' option from scylla.yaml.
@karol-kokoszka karol-kokoszka merged commit 97e10f9 into branch-3.2 Feb 26, 2024
19 of 21 checks passed
@karol-kokoszka karol-kokoszka deleted the release-3.2.6 branch February 26, 2024 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants