Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arm to docker #1

Open
wants to merge 149 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
ca186f0
docs: update theme 1.5
dgarcia360 Jun 2, 2023
a4cda46
Update setup-amazon-s3.rst
mykaul Jun 7, 2023
97ee657
docs: restore, update restore tables prerequisites
Michal-Leszczynski Jun 4, 2023
d58f207
cluster: CQL session created out of rpc_addresses
karol-kokoszka Jun 13, 2023
8f80daa
scyllaclient: close host pool on closing client (leaking goroutines)
karol-kokoszka Jun 15, 2023
b0b1f63
testing: update compose to support dual network + updated int-tests
karol-kokoszka Jun 15, 2023
7a54c05
chore(deps): bumps project dependencies (#3417)
charconstpointer Jun 19, 2023
a464a63
testing: change system_auth replication strategy
Michal-Leszczynski Jun 4, 2023
0a52f8f
restore: use separate cql user for restore tests
Michal-Leszczynski Jun 22, 2023
e9b6a18
testutils: extend hrt with respNotifier
Michal-Leszczynski Jun 4, 2023
4d6203d
repair_test: test single repair on single host
Michal-Leszczynski Jun 4, 2023
e256acb
testing: regenerate scylla-second-cluster.yaml
Michal-Leszczynski Jun 22, 2023
568a467
documented procedure of fixing broken SM 3.1.0-rc0 release
Michal-Leszczynski Jun 22, 2023
59a303e
issues: new place to keep solutions for known issues
karol-kokoszka Jun 26, 2023
1819fae
Prepare tests for ipv6 environment (#3436)
dkropachev Jun 26, 2023
3a5da38
readme: mention IPv6 environment (#3439)
dkropachev Jun 28, 2023
ff87051
docs: README.md service/scheduler
karol-kokoszka Jun 28, 2023
e7f9d3c
docs: README.md service/repair
karol-kokoszka Jun 28, 2023
872dd31
docs: README.md service/backup
karol-kokoszka Jun 28, 2023
9c027cf
docs: README.md service/restore
karol-kokoszka Jun 28, 2023
1544665
docs: backup, document sstable versioning
Michal-Leszczynski Jun 28, 2023
b59f7ed
restore: init stage code refactor
Michal-Leszczynski Jun 8, 2023
4cba2ea
main: add repairSvc to backupSvc
Michal-Leszczynski Jun 8, 2023
a88fdc2
schema: restore_run, add repair_task_id
Michal-Leszczynski Jun 9, 2023
4de47d4
restore: add automated repair after restore
Michal-Leszczynski Jun 9, 2023
8de6117
swagger: scylla-manager, add repair_progress to RestoreProgress
Michal-Leszczynski Jun 29, 2023
9b42b1f
managerclient: include repair info in repair progress
Michal-Leszczynski Jun 29, 2023
112d9ee
schema: add tombstone_gc to restore_table
Michal-Leszczynski Jun 9, 2023
0470cdd
restore: reset tombstone_gc after post-restore repair
Michal-Leszczynski Jun 29, 2023
8497729
restore: ensure that all nodes all available and UN before restoring …
Michal-Leszczynski Jun 14, 2023
9caa48d
docs: restore, all nodes should be in UN state before restore
Michal-Leszczynski Jun 14, 2023
6e38cf8
backup: proceed with backup when schema cannot be dumped
Michal-Leszczynski Jun 28, 2023
5420c4d
restore: close session in listAllViews
Michal-Leszczynski Jun 28, 2023
ea5ab02
restore: validate that all units are present after resume
Michal-Leszczynski Jul 3, 2023
7e8ae98
managerclient: fix restore progress display with nil progress
Michal-Leszczynski Jul 3, 2023
c2860a8
restore_test: add test for restoring into non-replicated keyspace
Michal-Leszczynski Jul 3, 2023
e15510d
restore: don't fail when there is nothing to repair
Michal-Leszczynski Jul 3, 2023
a21f675
docs: update restore-schema prerequisite
karol-kokoszka Jun 30, 2023
0c742f5
feat(restore): add progress gauge metric
charconstpointer Jun 5, 2023
e9829c2
dist/Makefile: VERSION changed to X.Y.0-dev
Annamikhlin Jul 9, 2023
640f706
docs: update list of supported platforms
karol-kokoszka Jul 10, 2023
471c3cd
fix(tests): make rest of the test parallel (#3464)
dkropachev Jul 12, 2023
98c7a79
docs: req for service restart after config change
karol-kokoszka Jul 12, 2023
5208897
chore(deps): bump google.golang.org/grpc from 1.48.0 to 1.53.0 in /mod
dependabot[bot] Jul 5, 2023
fd8417d
backup: default age_max to 24h
karol-kokoszka Jul 11, 2023
2823f5d
chore: bumb github.com/scylladb/gocql to v1.8.0
charconstpointer Jun 21, 2023
9036913
Revert "chore: bumb github.com/scylladb/gocql to v1.8.0"
karol-kokoszka Jul 14, 2023
0ea041c
deps: remove mod/go.mod and update install-dependencies.sh
karol-kokoszka Jul 14, 2023
ccf4c97
generate: link to schemagen binary
karol-kokoszka Jul 14, 2023
fd55328
fix(testing): make scylla load faster (#3483)
dkropachev Jul 17, 2023
4006fd8
Revert "fix(testing): make scylla load faster (#3483)" (#3491)
dkropachev Jul 18, 2023
6d8abb5
ci: run tests on different Scylla versions (#3481)
Michal-Leszczynski Jul 19, 2023
22d7e33
Fix restore tests (#3494)
Michal-Leszczynski Jul 21, 2023
88fd445
fix(scyllaclient): report correct password protection status with ful…
dkropachev Jul 24, 2023
6fe838c
testing: increase memory to 1GB per shard
Michal-Leszczynski Jul 18, 2023
8f2ccca
doc: clarify the sctool syntax
annastuchlik Jul 18, 2023
6ed9c24
doc: replace Scylla with ScyllaDB
annastuchlik Jul 18, 2023
4c1317c
fix(backup_test): chown only necessary directories (#3502)
Michal-Leszczynski Jul 25, 2023
ddd41ac
schema: add Views to RestoreRun
Michal-Leszczynski Jul 6, 2023
cfed9e0
restore: save restored views definition in run
Michal-Leszczynski Jul 11, 2023
efeb1f8
restore: introduced retry wrappers
Michal-Leszczynski Jul 13, 2023
fa5962e
scyllaclient: added ViewBuildStatus
Michal-Leszczynski Jul 13, 2023
96e468b
restore: added functions to interact with views
Michal-Leszczynski Jul 13, 2023
a0f38f4
restore: drop and recreate restored views
Michal-Leszczynski Jul 11, 2023
c6d6e49
swagger: scylla-manager, add views and tombstone_gc to restore progress
Michal-Leszczynski Jul 10, 2023
ee894bf
managerclient: display tombstone_gc in restore progress
Michal-Leszczynski Jul 10, 2023
c71e413
restore: include views in restore progress
Michal-Leszczynski Jul 10, 2023
906507d
managerclient: display views in restore progress
Michal-Leszczynski Jul 10, 2023
1ec9e02
swagger: scylla-manager, added views to RestoreTarget
Michal-Leszczynski Jul 11, 2023
fb929e5
restore: include views in restore dry run
Michal-Leszczynski Jul 11, 2023
38ba4c8
managerclient: display restore views in dry run
Michal-Leszczynski Jul 13, 2023
bbfcfdb
metrics: restore, added view_build_status metric
Michal-Leszczynski Jul 13, 2023
c16fc79
restore_test: added tests for restoring views
Michal-Leszczynski Jul 12, 2023
2076539
utils/slice: introduce Index
Michal-Leszczynski Jul 18, 2023
389b408
Update golangci and address complaints (#3500)
dkropachev Jul 26, 2023
459d3c5
fix(backup): update to match new sstable naming format (#3461)
dkropachev Jul 26, 2023
5315662
fix(backup): make gocritic happy
Jul 26, 2023
332524d
doc: fix the description of repair parallel
karol-kokoszka Jul 27, 2023
49dbd17
repo: add CODEOWNERS
karol-kokoszka Jul 28, 2023
78cdb27
docs: restore, updated restore tables documentation
Michal-Leszczynski Jul 18, 2023
9b3527f
doc: fix the upgrade instructions
annastuchlik Jul 17, 2023
ced7999
feat(repair): don't repair system_traces by default
Michal-Leszczynski Jul 28, 2023
5ad23bb
feat(repair): describeRing, aggregate token ranges by replicas
Michal-Leszczynski Jul 28, 2023
6f739b8
feat(repair): add cluster session to repair service
Michal-Leszczynski Aug 11, 2023
c536655
feat(schema): prepare schema for repair refactor
Michal-Leszczynski Aug 11, 2023
a14fdba
feat(restore): extract fetching views to utils
Michal-Leszczynski Aug 11, 2023
40c6e69
fix(repair): make repair follow docs
Michal-Leszczynski Aug 11, 2023
4605cca
feat(repair): sort repair plan
Michal-Leszczynski Aug 11, 2023
f25ff52
feat(repair): choose repair master by smallest shard cnt
Michal-Leszczynski Aug 11, 2023
a32c2b4
feat(repair): deprecate (0,1) intensity
Michal-Leszczynski Aug 11, 2023
7033b3d
feat(repair): additional integration-tests
karol-kokoszka Aug 11, 2023
c40041e
fix(repair): update params of disabled tasks using its names only
karol-kokoszka Aug 14, 2023
967ae6d
fix(db): improve test helper functions
Michal-Leszczynski Aug 21, 2023
4052191
feat(repair): upgraded TestServiceRepairOrderIntegration
Michal-Leszczynski Aug 21, 2023
8afaf06
feat(repair): added TestServiceRepairResumeAllRangesIntegration
Michal-Leszczynski Aug 18, 2023
8bccc9b
fix(docs): deprecate (0, 1) intensity
Michal-Leszczynski Aug 16, 2023
fea8478
fix(docs): describe repair order and bring back 1 job per 1 host stat…
Michal-Leszczynski Aug 17, 2023
4cde7e9
docs: update docs to 3.2
Michal-Leszczynski Aug 17, 2023
8c33b74
feat(docs): repair, improve task/job terminology
Michal-Leszczynski Aug 18, 2023
c7c9ba0
feat(repair): push system_traces back in repair order
Michal-Leszczynski Aug 30, 2023
c3f82d6
feat(repair): isolate TablePreference by interface
Michal-Leszczynski Aug 30, 2023
21ce3c8
feat(repair): make repair system keyspace first more robust
Michal-Leszczynski Aug 30, 2023
639d330
feat(repair): add TablePreference tests
Michal-Leszczynski Aug 30, 2023
7670703
feat(repair_test): make all test use properly generated target
Michal-Leszczynski Aug 24, 2023
e095cfd
feat(repair): move plan generation to GetTarget
Michal-Leszczynski Aug 24, 2023
6c70754
fix(repair): improve calculating max parallel
Michal-Leszczynski Aug 24, 2023
1990a69
feat(repair): move max parallel/intensity into intensityHandler
Michal-Leszczynski Aug 30, 2023
bf485c4
feat(swagger): scylla-manager, extend repair progress with max parall…
Michal-Leszczynski Aug 30, 2023
72a5f99
feat(repair): display max parallel/intensity in repair progress
Michal-Leszczynski Aug 30, 2023
94f654e
feat(store): add Check method
Michal-Leszczynski Aug 21, 2023
388ed6a
feat(cluster): add CheckCQLCredentials
Michal-Leszczynski Aug 21, 2023
e51531d
feat(cluster): extend 'sctool cluster list' with CQL credentials
Michal-Leszczynski Aug 21, 2023
395f0e7
fix(repair): return both repair and ctx errors
Michal-Leszczynski Aug 24, 2023
b297741
fix(repair): set end time only for successful runs
Michal-Leszczynski Aug 24, 2023
f8d0522
fix(repair): fix looking for prev run ID
Michal-Leszczynski Aug 31, 2023
4578b95
fix(repair): check for table deletion on repair master
Michal-Leszczynski Aug 31, 2023
2fd28a5
fix(healtcheck): make native cql to retry if connection is failed (#3…
dkropachev Sep 2, 2023
421536e
fix(backup): generate a proper UUIDv1 for sstable identifier
tchaikov Sep 7, 2023
1aa0837
feat(agent): reduce request log spam
Michal-Leszczynski Sep 7, 2023
a5a8879
fix(backup_test): increase timeout in case of schema disagreement
Michal-Leszczynski Sep 8, 2023
01428a9
feat(scylladb): increase memory limit for ScyllaDB to 500M
charconstpointer Sep 12, 2023
59bd74f
ci: bump golangci-lint to current latest
karol-kokoszka Sep 12, 2023
93a31fd
chore: bump golang to 1.21
karol-kokoszka Sep 12, 2023
01fe4a1
chore: deprecate rand.Seed and rand.Read
karol-kokoszka Sep 12, 2023
6ad7334
fix(systemd-files): make it restart on non-zero exit
Sep 13, 2023
78ad309
feat(repair): better explanation on host down
karol-kokoszka Sep 13, 2023
86731b7
feat(healthcheck): new status value -2 for unavailable agent (#3555)
Michal-Leszczynski Sep 12, 2023
2afe643
feat(docs): make it easier to learn about repair intensity/parallel p…
Michal-Leszczynski Aug 30, 2023
6657018
feat(docs): improve 'sctool tasks --show-properties' docs
Michal-Leszczynski Sep 8, 2023
c1f42df
feat(docs): example with disabling, listing and enabling task
Michal-Leszczynski Sep 8, 2023
46f66c0
fix(repair_test): extend duration cmp limit (#3573)
Michal-Leszczynski Sep 13, 2023
c2ee93c
feat(scheduler): add suspended metric (#3567)
Michal-Leszczynski Sep 12, 2023
b29b8fa
feat(repair): add total progress metric
Michal-Leszczynski Sep 11, 2023
0f47112
feat(schema): extend repair_run_progress with table size
Michal-Leszczynski Sep 11, 2023
a90a524
feat(repair): fill RunProgress size
Michal-Leszczynski Sep 11, 2023
67f317e
feat(swagger): scylla-manager, extend RepairProgress with success/err…
Michal-Leszczynski Sep 11, 2023
7071fb2
feat(repair): calculate and display weighted total repair progress
Michal-Leszczynski Sep 11, 2023
04d5b78
feat(repair_test): update aggregate progress tests
Michal-Leszczynski Sep 12, 2023
f39914b
feat(docs): repair, explain when repair control changes are applied
Michal-Leszczynski Sep 24, 2023
060ca8b
feat(docs): progress, explain %/% progress display
Michal-Leszczynski Sep 24, 2023
bcd9b3f
fix(managerclient): progress, round success % down and error % up
Michal-Leszczynski Sep 24, 2023
298144f
add(docs): repair, expand example on calculating max intensity/parallel
Michal-Leszczynski Sep 24, 2023
dc04c66
fix(repair): compare real table size in small table optimization
Michal-Leszczynski Sep 27, 2023
0ee9808
add(repair_test): test for checking if big table are not optimized
Michal-Leszczynski Sep 27, 2023
01c4ad6
feat(repair): log table size
Michal-Leszczynski Sep 27, 2023
ec95eb4
fix(repair): optimize tables strictly smaller than threshold
Michal-Leszczynski Sep 27, 2023
c820881
Makefile: bump dev version to 3.3.0
yaronkaikov Oct 3, 2023
60cfa16
Remove warning during build
yaronkaikov Oct 3, 2023
3b79a40
.goreleaser: add support for arm based docker image
yaronkaikov Jan 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
feat(healthcheck): new status value -2 for unavailable agent (scyllad…
…b#3555)

Up to this point there were two possible values of ping status metric: 1 for success and -1 for failure. The problem was that -1 could mean that either Scylla or Agent are unavailable. To make it more descriptive, now new value -2 describes unavailable Agent and -1 describes unavailable Scylla.
  • Loading branch information
Michal-Leszczynski committed Sep 20, 2023
commit 86731b7954f1319184f892ab265c5af28eda7eec
33 changes: 33 additions & 0 deletions pkg/scyllaclient/client_ping.go
Original file line number Diff line number Diff line change
@@ -3,10 +3,13 @@
package scyllaclient

import (
"bytes"
"context"
"fmt"
"math"
"math/rand"
"net"
"net/http"
"net/url"
"runtime"
"sort"
@@ -265,3 +268,33 @@ func min(a, b time.Duration) time.Duration {
}
return a
}

// PingAgent is a simple heartbeat ping to agent.
func (c *Client) PingAgent(ctx context.Context, host string, timeout time.Duration) (time.Duration, error) {
if timeout == 0 {
timeout = c.config.Timeout
}
if ctxTimeout, hasCustomTimeout := hasCustomTimeout(ctx); hasCustomTimeout {
timeout = min(ctxTimeout, timeout)
}
ctx = customTimeout(ctx, timeout)
ctx = noRetry(ctx)

u := c.newURL(host, "/ping")
req, err := http.NewRequestWithContext(forceHost(ctx, host), http.MethodGet, u.String(), bytes.NewReader(nil))
if err != nil {
return 0, err
}

t := timeutc.Now()
resp, err := c.client.Do("PingAgent", req)
if err != nil {
return 0, err
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusNoContent {
return 0, fmt.Errorf("expected %d status code from ping response, got %d", http.StatusNoContent, resp.StatusCode)
}
return timeutc.Since(t), nil
}
12 changes: 6 additions & 6 deletions pkg/service/healthcheck/metrics.go
Original file line number Diff line number Diff line change
@@ -21,42 +21,42 @@ var (
Namespace: "scylla_manager",
Subsystem: "healthcheck",
Name: "cql_status",
Help: "Host native port status",
Help: "Host native port status. -2 stands for unavailable agent, -1 for unavailable Scylla and 1 for everything is fine.",
}, []string{clusterKey, hostKey})

cqlRTT = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: "scylla_manager",
Subsystem: "healthcheck",
Name: "cql_rtt_ms",
Help: "Host native port RTT",
Help: "Host native port RTT.",
}, []string{clusterKey, hostKey})

restStatus = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: "scylla_manager",
Subsystem: "healthcheck",
Name: "rest_status",
Help: "Host REST status",
Help: "Host REST status. -2 stands for unavailable agent, -1 for unavailable Scylla and 1 for everything is fine.",
}, []string{clusterKey, hostKey})

restRTT = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: "scylla_manager",
Subsystem: "healthcheck",
Name: "rest_rtt_ms",
Help: "Host REST RTT",
Help: "Host REST RTT.",
}, []string{clusterKey, hostKey})

alternatorStatus = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: "scylla_manager",
Subsystem: "healthcheck",
Name: "alternator_status",
Help: "Host Alternator status",
Help: "Host Alternator status. -2 stands for unavailable agent, -1 for unavailable Scylla and 1 for everything is fine.",
}, []string{clusterKey, hostKey})

alternatorRTT = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: "scylla_manager",
Subsystem: "healthcheck",
Name: "alternator_rtt_ms",
Help: "Host Alternator RTT",
Help: "Host Alternator RTT.",
}, []string{clusterKey, hostKey})
)

9 changes: 8 additions & 1 deletion pkg/service/healthcheck/runner.go
Original file line number Diff line number Diff line change
@@ -47,6 +47,7 @@ type runner struct {
timeout time.Duration
metrics *runnerMetrics
ping func(ctx context.Context, clusterID uuid.UUID, host string, timeout time.Duration) (rtt time.Duration, err error)
pingAgent func(ctx context.Context, clusterID uuid.UUID, host string, timeout time.Duration) (rtt time.Duration, err error)
}

type runnerMetrics struct {
@@ -90,7 +91,13 @@ func (r runner) checkHosts(ctx context.Context, clusterID uuid.UUID, status []sc

rtt, err := r.ping(ctx, clusterID, status[i].Addr, r.timeout)
if err != nil {
r.metrics.status.With(hl).Set(-1)
// Set -2 for unavailable agent and -1 for unavailable Scylla
_, err := r.pingAgent(ctx, clusterID, status[i].Addr, r.timeout)
if err != nil {
r.metrics.status.With(hl).Set(-2)
} else {
r.metrics.status.With(hl).Set(-1)
}
} else {
r.metrics.status.With(hl).Set(1)
}
18 changes: 15 additions & 3 deletions pkg/service/healthcheck/service.go
Original file line number Diff line number Diff line change
@@ -93,7 +93,8 @@ func (s *Service) Runner() Runner {
status: cqlStatus,
rtt: cqlRTT,
},
ping: s.pingCQL,
ping: s.pingCQL,
pingAgent: s.pingAgent,
},
rest: runner{
scyllaClient: s.scyllaClient,
@@ -102,7 +103,8 @@ func (s *Service) Runner() Runner {
status: restStatus,
rtt: restRTT,
},
ping: s.pingREST,
ping: s.pingREST,
pingAgent: s.pingAgent,
},
alternator: runner{
scyllaClient: s.scyllaClient,
@@ -111,7 +113,8 @@ func (s *Service) Runner() Runner {
status: alternatorStatus,
rtt: alternatorRTT,
},
ping: s.pingAlternator,
ping: s.pingAlternator,
pingAgent: s.pingAgent,
},
}
}
@@ -376,6 +379,15 @@ func (s *Service) pingREST(ctx context.Context, clusterID uuid.UUID, host string
return client.Ping(ctx, host, timeout)
}

func (s *Service) pingAgent(ctx context.Context, clusterID uuid.UUID, host string, timeout time.Duration) (time.Duration, error) {
client, err := s.scyllaClient(ctx, clusterID)
if err != nil {
return 0, errors.Wrapf(err, "get client for cluster with id %s", clusterID)
}

return client.PingAgent(ctx, host, timeout)
}

func (s *Service) nodeInfo(ctx context.Context, clusterID uuid.UUID, host string) (nodeInfo, error) {
s.cacheMu.Lock()
defer s.cacheMu.Unlock()