Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
125276: kvserver: constrain store gossip chatter r=nvanbenschoten a=kvoli

Previously, it was possible that store's would gossip their store
descriptors rapidly on capacity changes, without limit. This patch
introduces a hard frequency limit of 2s on capacity change triggered
gossip. In other words, irrespective of any capacity changes that may
have taken place, a store will gossip at most every 2s if satisfying the
capacity delta trigger.

Resolves: cockroachdb#125210
Release note: None


---


Increase the relative requirement for triggering store gossip on
capacity changes for lease and ranges. Previously, the requirement was a
+-5% delta from the previously gossiped value, in addition to a minimum
(5). Increase the relative delta from 5 to 10%.

Informs: cockroachdb#125227
Release note: None

---

Introduce `kv.store_gossip.capacity_delta_threshold` and
`kv.store_gossip.max_frequency`, which control the capacity delta
required to trigger eager gossip for lease/range counts and the maximum
frequency of eager store gossip.

The defaults for these settings remained unchanged from the previously
hard-coded defaults:

`kv.store_gossip.capacity_delta_threshold`: 0.10
`kv.store_gossip.max_frequency`: 2s

Resolves: cockroachdb#125227
Release note: None

Co-authored-by: Austen McClernon <[email protected]>
  • Loading branch information
craig[bot] and kvoli committed Jun 8, 2024
2 parents f94be0c + 7ac13e4 commit c7f37e8
Show file tree
Hide file tree
Showing 15 changed files with 206 additions and 107 deletions.
1 change: 1 addition & 0 deletions pkg/kv/kvserver/asim/gossip/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ go_library(
"//pkg/settings/cluster",
"//pkg/util/hlc",
"//pkg/util/protoutil",
"//pkg/util/timeutil",
],
)

Expand Down
9 changes: 6 additions & 3 deletions pkg/kv/kvserver/asim/gossip/gossip.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/settings/cluster"
"github.com/cockroachdb/cockroach/pkg/util/protoutil"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
)

// Gossip collects and updates the storepools of the cluster upon capacity
Expand Down Expand Up @@ -53,15 +54,17 @@ type storeGossiper struct {
addingStore bool
}

func newStoreGossiper(descriptorGetter func(cached bool) roachpb.StoreDescriptor) *storeGossiper {
func newStoreGossiper(
descriptorGetter func(cached bool) roachpb.StoreDescriptor, clock timeutil.TimeSource,
) *storeGossiper {
sg := &storeGossiper{
lastIntervalGossip: time.Time{},
descriptorGetter: descriptorGetter,
}

desc := sg.descriptorGetter(false /* cached */)
knobs := kvserver.StoreGossipTestingKnobs{AsyncDisabled: true}
sg.local = kvserver.NewStoreGossip(sg, sg, knobs, &cluster.MakeTestingClusterSettings().SV)
sg.local = kvserver.NewStoreGossip(sg, sg, knobs, &cluster.MakeTestingClusterSettings().SV, clock)
sg.local.Ident = roachpb.StoreIdent{StoreID: desc.StoreID, NodeID: desc.Node.NodeID}

return sg
Expand Down Expand Up @@ -124,7 +127,7 @@ func (g *gossip) addStoreToGossip(s state.State, storeID state.StoreID) {
g.storeGossip[storeID] = &storeGossiper{addingStore: true}
g.storeGossip[storeID] = newStoreGossiper(func(cached bool) roachpb.StoreDescriptor {
return s.StoreDescriptors(cached, storeID)[0]
})
}, s.Clock())
}

// Tick checks for completed gossip updates and triggers new gossip
Expand Down
13 changes: 12 additions & 1 deletion pkg/kv/kvserver/asim/gossip/gossip_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ package gossip
import (
"context"
"testing"
"time"

"github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/storepool"
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/asim/config"
Expand Down Expand Up @@ -87,11 +88,17 @@ func TestGossip(t *testing.T) {
require.Len(t, gossip.exchange.pending, 0)
assertStorePool(assertSameFn)

// Tick state by a large duration to ensure the below capacity changes don't
// run into the max gossip frequency limit.
storeTick := tick

// Update the usage info leases for s1 and s2, so that it exceeds the delta
// required to trigger a gossip update. We do this by transferring every
// lease to s2.
for _, rng := range s.Ranges() {
s.TransferLease(rng.RangeID(), 2)
storeTick = storeTick.Add(3 * time.Second)
s.TickClock(storeTick)
}
gossip.Tick(ctx, tick, s)
// There should be just store 1 and 2 pending gossip updates in the exchanger.
Expand All @@ -108,6 +115,10 @@ func TestGossip(t *testing.T) {
// Assert that the lease counts are as expected after transferring all of
// the leases to s2.
require.Equal(t, int32(0), (*details[1])[1].Desc.Capacity.LeaseCount)
require.Equal(t, int32(100), (*details[1])[2].Desc.Capacity.LeaseCount)
// Depending on the capacity delta threshold, s2 may not have gossiped
// exactly when it reached 100 leases, as it earlier gossiped at 90+ leases,
// so 100 may be < lastGossip * capacityDeltaThreshold, not triggering
// gossip. Assert that the lease count gossiped is at least 90.
require.Greater(t, (*details[1])[2].Desc.Capacity.LeaseCount, int32(90))
require.Equal(t, int32(0), (*details[1])[3].Desc.Capacity.LeaseCount)
}
5 changes: 5 additions & 0 deletions pkg/kv/kvserver/asim/state/impl.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ import (
"github.com/cockroachdb/cockroach/pkg/spanconfig"
"github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigreporter"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/timeutil"
"github.com/google/btree"
)

Expand Down Expand Up @@ -1066,6 +1067,10 @@ func (s *state) TickClock(tick time.Time) {
s.clock.Set(tick.UnixNano())
}

func (s *state) Clock() timeutil.TimeSource {
return s.clock
}

// UpdateStorePool modifies the state of the StorePool for the Store with
// ID StoreID.
func (s *state) UpdateStorePool(
Expand Down
16 changes: 16 additions & 0 deletions pkg/kv/kvserver/asim/state/state.go
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,8 @@ type State interface {
ClusterUsageInfo() *ClusterUsageInfo
// TickClock modifies the state Clock time to Tick.
TickClock(time.Time)
// Clock returns the state Clock.
Clock() timeutil.TimeSource
// UpdateStorePool modifies the state of the StorePool for the Store with
// ID StoreID.
UpdateStorePool(StoreID, map[roachpb.StoreID]*storepool.StoreDetail)
Expand Down Expand Up @@ -278,6 +280,8 @@ type ManualSimClock struct {
nanos int64
}

var _ timeutil.TimeSource = &ManualSimClock{}

// Now returns the current time.
func (m *ManualSimClock) Now() time.Time {
return timeutil.Unix(0, m.nanos)
Expand All @@ -288,6 +292,18 @@ func (m *ManualSimClock) Set(tsNanos int64) {
m.nanos = tsNanos
}

func (m *ManualSimClock) Since(t time.Time) time.Duration {
return m.Now().Sub(t)
}

func (m *ManualSimClock) NewTimer() timeutil.TimerI {
panic("unimplemented")
}

func (m *ManualSimClock) NewTicker(duration time.Duration) timeutil.TickerI {
panic("unimplemented")
}

// Keys in the simulator are 64 bit integers. They are mapped to Keys in
// cockroach as the decimal representation, with 0 padding such that they are
// lexicographically ordered as strings. The simplification to limit keys to
Expand Down
14 changes: 7 additions & 7 deletions pkg/kv/kvserver/asim/tests/testdata/non_rand/example_add_node
Original file line number Diff line number Diff line change
Expand Up @@ -37,19 +37,19 @@ plot stat=replicas sample=1
----
----

301 ┼──────────────────────────────────────╭────────────────────────────────────────
281 ┤ ╭─╯
261 ┤ ╭╭─
301 ┼─────────────────────────────────────╭─╭───────────────────────────────────────
281 ┤ ╭───╯
261 ┤ ╭╭─╯
241 ┤ ╭──╯
221 ┤ ╭──╯
201 ┤ ╭─╯
181 ┤ ╭─
201 ┤ ╭──
181 ┤ ╭─
161 ┤ ╭──╯
140 ┤ ╭──╯╯
120 ┤ ╭─╯╯
100 ┤ ╭──╯
80 ┤ ─╯╯
60 ┤ ╭─
80 ┤ ╭──╯╯
60 ┤ ╭─
40 ┤ ╭─╯
20 ┤ ╭──╯
0 ┼─╯
Expand Down
40 changes: 20 additions & 20 deletions pkg/kv/kvserver/asim/tests/testdata/non_rand/example_fulldisk
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,22 @@ plot stat=replicas
----
----

345╭╮ ╭─╮ ╭╮ ╭╮╭╭
333 ╭─╮╭─────────╯─╰─────────────────────────╮╭╯╰─╯
320 ╭─╭───╯╰╰╯╯ ╰╯ ╰╯ ╰╯ ╰╯ ╰╯
308 ┤╭─────────────────────╭────╯╰
296 ┼───────────────────╮──
284
271╰───
259 ┤ ╰╮
247
235╰────╮
222 ┤ ╰╮
210
198───╮
186╰─╮ ╭─╮╭╮ ╭╮╭
173╰───╮╭─╮╭───╮│ ╰╯╰───────╮╭──────╯╰╯╰──
161╰╯ ╰╯ ╰╯ ╰╯ ╰─
342 ╭╮ ╭╮ ╭╮ ╭╮╭─╭─╮
329╭╮╭╭╮╭╮─╭╮╭──╮╭╭─────────────────────────╯╰──╯╯╰─
317╭╮╭──╮─╭─╯╰╯╰─╯╰╯──╰─╯╰╯╰╯
304 ┼╮──────────────────────╯╰╯╰╯╰─
291 ┤╰───────────────────╮
278 ╰──
266
253 ┤ ╰╮
240╰──╮
227─╮
215 ┤ ╰────
202 ╰───╮╭───
189 ╰╯ ╰────╮
176 ╰───────╮ ╭─
164 ╰─╯ ╰──╮╭─────╮╭
151 ╰╯╯╰────
replicas
----
----
Expand All @@ -47,12 +47,12 @@ plot stat=disk_fraction_used
----
----

0.98 ┤ ╭─╮ ╮ ╭╮ ╭╮ ╭╮ ╭──╮ ╭╮ ╭╮ ╭╮ ╭╮ ╭╮ ╭─
0.92 ┤ ╭───────────────╯ ╰──╯╰─╯╰─╯╰──╯╰─╯ ╰───╯╰─╯╰──╯╰───╯╰──╯╰───────╯ ╰──
0.98 ┤ ╭╮ ╭╮ ╭─╮ ╭───╮ ╭╮╭╮ ╭╮ ╭╮╭──╮╭╮╭╮ ╭─╮ ╭╮ ╭╮ ╭╮ ╭
0.91 ┤ ╭────────╯╰─╯╰─╯ ╰─╰─╯╰╯╰─╯╰───╯╰╯ ╰╯╰╯╰───╯ ╰─╯╰────╯╰──────╰──╯╰
0.85 ┼───────╯
0.79
0.78
0.72 ┤
0.66
0.65
0.59 ┤
0.52 ┤
0.46 ┤
Expand Down
12 changes: 6 additions & 6 deletions pkg/kv/kvserver/asim/tests/testdata/non_rand/example_io_overload
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ plot stat=replicas
500 ┼───╮──╭─╮
467 ┤ ╰──╯─╰──────────╮╮ ╭─╮
433 ┤ ╰──╰────────────╮╮╮
400 ┤ ╰╰───────────╮─╮ ╭╮ ╭╮╭──╮╭──╮
367 ┤ ╰─╭─────╯╰──╯╰╯──╰╯──╰─────────────
400 ┤ ╰╰───────────╮─╮ ╭╮ ╭╮╭──╮╭───╮─
367 ┤ ╰─╭─────╯╰──╯╰╯──╰╯───╰────────────
333 ┤ ╭────╯
300 ┤ ╭───╯
267 ┤ ╭───╯
Expand All @@ -51,14 +51,14 @@ plot stat=leases
367 ┤ ╰───────────╮
333 ┤ ╰─────╮
300 ┤ ╰──────╮
267 ┤ ╰──────╮
233 ┤ ╰───────────
267 ┤ ╰─────────
233 ┤ ────────
200 ┤
167 ┤
133 ┤ ╭╮ ╭──────────────────────╮
100 ┤ ╭─────────────╯╰─╯ ╰─────────────
67 ┤ ╭─────────╯ ╭──╭───────────────
33 ┤ ╭─────────╯ ╭╭───────────╯
67 ┤ ╭─────────╯ ╭───╭──────────────
33 ┤ ╭─────────╯ ╭╭───────────
0 ┼───────────────────────────────────────────────────────────────────────────────
leases
----
Expand Down
12 changes: 6 additions & 6 deletions pkg/kv/kvserver/asim/tests/testdata/non_rand/example_multi_store
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@ plot stat=leases
7.47 ┤ │
6.53 ┤ │
5.60 ┤ ╰╮
4.67 ┤ ╰───────────╮
3.73 ┤
2.80 ┤ ╭────────────╮
1.87 ┤ ──────────────────────────╮╭──╮────────────╮
0.93 ┤ ╭─╯──────────────────────────────────────────────────────────────────────────
0.00 ┼──────────────────────────────────────────────╯
4.67 ┤
3.73 ┤ ╰╮
2.80 ┤ ╭────────────╭───────────────
1.87 ┤ ──────────────────────────╮│───────────────╮
0.93 ┤ ╭─╯──────────────────────────────────────────────────────────────────────────
0.00 ┼──────────────────────────────────────────────╯
leases
----
----
Loading

0 comments on commit c7f37e8

Please sign in to comment.