Release 3.2.6 #3727

karol-kokoszka · 2024-02-22T18:06:27Z

Includes the following:

Please make sure that:

Code is split to commits that address a single change
Commit messages are informative
Commit titles have module prefix
Commit titles have issue nr. suffix

Michal-Leszczynski · 2024-02-23T11:20:07Z

@karol-kokoszka this branch is still missing some commits, e.g. f1aef7d - results in incorrect test env setup in gh actions:

Warning: Unexpected input(s) 'raft-enabled', valid inputs are ['scylla-version', 'ip-family', 'start-dev-env']

karol-kokoszka · 2024-02-26T08:25:29Z

looks I have to cherry pick the commits again, it will be easier. Last non-docs related included is 55fbc26

Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.14.0 to 0.17.0. - [Commits](golang/crypto@v0.14.0...v0.17.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]>

This test used to be failing with Scylla 5.4 and IPV6 because of different IPV6 string representations.

There are some differences in schema propagation in Scylla 5.4.0 with consistent_cluster_management enabled: scylladb/scylladb#16349 With this setup, given scenario could fail: - backup dc1 to loc1 and dc2 to loc2 - restore schema only from loc1 (so restore schema only to dc1) The result could be that nodes from dc1 have new, correct schema, while nodes from dc2 have the same schema as before the restore. For this reason, it's safer to download schema from every location to every node with location access. This does not solve the whole issue, but makes restore schema more likely to succeed.

Since Scylla 5.4 is out, and we support only 2 recent minor releases, we don't need to test SM against Scylla 5.1 anymore.

Fixes #3654

Since we support cluster with and without raft schema changes, we should test both of those cases (especially for restore schema tests). Fixes #3644

With this commit it's possible to run start-dev-env SKIP_GOSSIP=true which results in adding 'skip_wait_for_gossip_to_settle: 0' to scylla.yaml. Recently improved testing cluster setup seems to work fine (and faster) with that. However, not waiting for gossip breaks restore schema tests. This means that if someone wants to work on anything except restore schema, they could use this option to make testing cluster setup way faster.

Fixes #3656

Since StorageServiceRepairStatus (without timeout param) returns only when the repair job has finished, we shouldn't time out on our end (even if backoff retry could handle that). This resulted in many backoff errors in SM logs even on successful repair: {"L":"INFO","T":"2023-12-01T01:31:54.398Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceRepairStatus","wait":"28.607063257s","error":"after 16m0s: context deadline exceeded","_trace_id":"uqSjtDSfRoOl1WhetiLPgA"}

Since StorageServiceSstablesByKeyspacePost returns only when load&stream has finished, we shouldn't time out on our end (even if backoff retry could handle that). This resulted in many backoff errors in SM logs even on successful restore: {"L":"INFO","T":"2023-11-30T23:03:07.117Z","N":"cluster.client","M":"HTTP retry backoff","operation":"StorageServiceSstablesByKeyspacePost","wait":"999.175032ms","error":"after 30s: context deadline exceeded","_trace_id":"uqSjtDSfRoOl1WhetiLPgA"}

There are still some bugs (one described in the issue below) regarding using float intensity. In order to get rid of them, we should only tolerate float intensity at the entrypoint, but completely remove it from SM repair internals (both in service and progress display). We keep float intensity in task properties and in swagger endpoint parameters, but we convert it to int intensity as a first thing in internals, which happens in: - GetTarget (from task properties) - sctool repair control endpoint (from query param) Fixes #3665

Also, add timeout on first node setup, as misconfiguration could lead to hanging at this step.

Newer Scylla versions (e.g. 2024) docker images don't run ssh server on them own, but we require it for some of SM tests.

Because of problems with restoring backups into Scylla 5.4 with raft schema enabled (#3662), we want to test the following workaround: - use fresh cluster without raft schema - restore as usual - enable raft schema in the cluster In order to do that, we leave raft schema on src cluster and test how it works with raft schema enabled/disabled on dst cluster.

Removing filtering was done so that our tests can pass with Scylla 5.4 and raft enabled, but it didn't improve the real life situations where agents don't have cross region remote location access.

…ing CQL SSL Previously, SSL was preferred when client_encryption_options.enabled coming from ScyllaDB configuration was true and SSL port is open, even when Scylla Manager did not have any client certificate registered for particular cluster. This caused issues when ScyllaDB cluster was exposing both CQL and CQL SSL with mTLS, because even when Manager was not registered with certificates, it still insisted to establish sessions using SSL port. CQL healthchecks was also affected. Fixes #3698

"/storage_service/describe_ring/" returns the token range of a random keyspace. this API is never used in production. what is used is is cousin which accepts a mandatory keyspace path parameter, like "/storage_service/describe_ring/{keyspace}", so let's drop it in scylla-manager, we will drop this API in scylla as well. Signed-off-by: Kefu Chai <[email protected]>

…ns to cluster

…use non SSL port on cluster Manager 3.2.6 gives a possibility of explicitly disabling TLS on session even though the certificate and key are available `force_tls_disabled`. Besides that, there is an option to force session to always use non-tls port from scylla config `force_non_ssl_session_port`.

…bled and --force-non-ssl-session-port flags

…on from DB This addresses #3679 .

… HTTP API

It adds information about `force-tls-disabled` and `force-non-ssl-session-port` flags.

…-port cluster properties

The fix for #3707 includes extending the cluster kept in the DB with the 'host' property, which represents the initial host used when adding a cluster to Scylla Manager. Scylla-operator passes the DNS name here, making it immune to the ephemeral IPs in the Kubernetes environment. This commit adds the initial host as the first host to query in order to discover node IPs.

…ndpoint This endpoint returns 'consistent_cluster_management' option from scylla.yaml.

karol-kokoszka requested a review from Michal-Leszczynski as a code owner February 22, 2024 18:06

karol-kokoszka force-pushed the release-3.2.6 branch from 22a7140 to 2481b41 Compare February 22, 2024 19:39

dependabot bot and others added 26 commits February 26, 2024 09:27

fix(scyllaclient_test): make TestClientStatusIntegration more robust

b382685

This test used to be failing with Scylla 5.4 and IPV6 because of different IPV6 string representations.

fix(restore_test): fix TestRestoreSchemaVersionedIntegration flakiness

20575b7

add(workflows): run github actions against Scylla 5.4.0

2fda380

Since Scylla 5.4 is out, and we support only 2 recent minor releases, we don't need to test SM against Scylla 5.1 anymore.

fix(testing): improve make start-dev-env cluster setup

896da3a

Fixes #3654

add(workflows): include raft enabled in test matrix

033eeec

Since we support cluster with and without raft schema changes, we should test both of those cases (especially for restore schema tests). Fixes #3644

chore(go.mod): bump gocql to v1.12.0

478e84a

Fixes #3656

refactor(repair): add job_id to finished job log

dc07eff

test(repair): add tests with deprecated float intensity

9abddb0

add(testing): test against enterprise Scylla

9ef68aa

Also, add timeout on first node setup, as misconfiguration could lead to hanging at this step.

fix(testing): ensure that ssh server is started on nodes

b522ea5

Newer Scylla versions (e.g. 2024) docker images don't run ssh server on them own, but we require it for some of SM tests.

fix(repair): include system_replicated_keys in repair order

5173329

chore(backup_test): adjust tests to enterprise features

28ae563

chore(restore_test): adjust tests to enterprise features

a439d75

fix(restore): restore-schema, don't hide issues with raft schema

fd06cfc

Removing filtering was done so that our tests can pass with Scylla 5.4 and raft enabled, but it didn't improve the real life situations where agents don't have cross region remote location access.

fix(docs): update docker setup example

300be31

docs: remove 404 redirect

550964f

feat(db): add force_tls_disabled and force_non_ssl_session_port colum…

e42e88f

…ns to cluster

karol-kokoszka and others added 25 commits February 26, 2024 09:33

feat(cli): cluster add/update extended with explicit --force-tls-disa…

b402c47

…bled and --force-non-ssl-session-port flags

feat(cql): drive the TLS enablement basing on the cluster configurati…

c0a7d59

…on from DB This addresses #3679 .

feat(testing): additional type of integration tests to validate CLI /…

9a0ae98

… HTTP API

feat(testing): API tests for cluster add/update CLI

376d0af

fix(docs): update cluster add and cluster update docs

a9c6043

It adds information about `force-tls-disabled` and `force-non-ssl-session-port` flags.

fix(ci): split integration tests to separate workflows

a6950b8

fix(README): include CI badges

063a90d

fix(healtcheck): respect force-tls-disabled and force-non-ssl-session…

6587257

…-port cluster properties

fix(swagger): add deprecate comment to interval property

b26e599

fix(chore): deprecate Interval param on CLI and scheduler

7836158

fix(scheduler): mix cron with start date

39720d7

fix(doc): start_date is not deprecated

af7db29

fix(scheduler): don't error if start_date is before now for cron

0fa7dc2

fix(client): validate all known_hosts on ScyllaAPI client creation

f3a9bc7

fix(db): update cluster table with coordinator_host column

d55d214

fix(cache): make the validity timeout configurable

b8afed8

fix(deps): bump cenkalti/backoff to latest

82463a7

feat(swagger): scylla_v2, add /config/consistent_cluster_management e…

e3c8d07

…ndpoint This endpoint returns 'consistent_cluster_management' option from scylla.yaml.

feat(swagger): agent, extend NodeInfo with consistent_cluster_management

88ac4ea

feat(agent): fill consistent_cluster_management in NodeInfo

4b21e69

feat(restore): validate if restore schema is safe

1880875

feat(docs): add workaround for restoring raft schema

d5863fa

feat(docs): add Scylla 5.4/2024.1 to restore compatibility matrix

63f3c8a

karol-kokoszka force-pushed the release-3.2.6 branch from 78dee7d to 63f3c8a Compare February 26, 2024 08:39

fix(CI): point to branch-3.2

22e0a1f

karol-kokoszka merged commit 97e10f9 into branch-3.2 Feb 26, 2024
19 of 21 checks passed

karol-kokoszka deleted the release-3.2.6 branch February 26, 2024 09:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 3.2.6 #3727

Release 3.2.6 #3727

karol-kokoszka commented Feb 22, 2024 •

edited

Loading

Michal-Leszczynski commented Feb 23, 2024

karol-kokoszka commented Feb 26, 2024

Release 3.2.6 #3727

Release 3.2.6 #3727

Conversation

karol-kokoszka commented Feb 22, 2024 • edited Loading

Michal-Leszczynski commented Feb 23, 2024

karol-kokoszka commented Feb 26, 2024

karol-kokoszka commented Feb 22, 2024 •

edited

Loading