Skip to content

Commit

Permalink
feat: add support for slashing parameters and missed blocks per signi…
Browse files Browse the repository at this point in the history
…ng window (#85)

This pull request introduces a new set of features for exporting network
slashing parameters to Prometheus, along with additional metrics to
monitor validator performance and slashing conditions.

**Key Changes**
Prometheus Metrics for Slashing Parameters:

Exports key network slashing parameters, such as the Signing Window size
and Slashing Penalties to Prometheus.
Validator “Missed Blocks per Signing Window” Tracking:

Adds the ability to query the current count of "Missed Blocks per
Signing Window" for each validator. This metric aligns with the
network’s threshold for determining whether a validator should be
jailed.
Optional Slashing Metrics Collection:

Users can disable the collection of slashing-related metrics by using
the --no-slashing argument.
Consensus Address Matching for Validator Metrics:

The validator's public key (PubKey) and the network's detected
human-readable prefix are used to derive the consensus address. This
enables proper mapping of metrics returned from the SigningInfo module,
which provides metrics in the Bech32 Consensus Address format.

**Testing and Validation**
This change has been tested across several networks, ensuring
compatibility and accurate metric collection:

Networks Tested:
- Axelar
- Cosmos
- Band
- Osmosis
- Injective
- Zeta
- Agoric

Additionally a unit tests have been added and updated.

**Additional Notes**

These additions will enhance monitoring and alerting capabilities for
validator operators by providing real-time insights into slashing
parameters and validator performance. This feature set will improve
operational awareness, helping operators avoid slashing events.

---
New prometheus metrics added:
```
.
.
.
# HELP cosmos_validator_watcher_signed_blocks_window Number of blocks per signing window
# TYPE cosmos_validator_watcher_signed_blocks_window gauge
cosmos_validator_watcher_signed_blocks_window{chain_id="cosmoshub-4"} 10000
# HELP cosmos_validator_watcher_min_signed_blocks_per_window Minimum number of blocks required to be signed per signing window
# TYPE cosmos_validator_watcher_min_signed_blocks_per_window gauge
cosmos_validator_watcher_min_signed_blocks_per_window{chain_id="cosmoshub-4"} 0.05
# HELP cosmos_validator_watcher_min_signed_blocks_per_window Minimum number of blocks required to be signed per signing window
# HELP cosmos_validator_watcher_downtime_jail_duration Duration of the jail period for a validator in seconds
# TYPE cosmos_validator_watcher_downtime_jail_duration gauge
cosmos_validator_watcher_downtime_jail_duration{chain_id="cosmoshub-4"} 600
# TYPE cosmos_validator_watcher_min_signed_blocks_per_window gauge
cosmos_validator_watcher_min_signed_blocks_per_window{chain_id="cosmoshub-4"} 0.05
# HELP cosmos_validator_watcher_slash_fraction_double_sign Slash penaltiy for double-signing
# TYPE cosmos_validator_watcher_slash_fraction_double_sign gauge
cosmos_validator_watcher_slash_fraction_double_sign{chain_id="cosmoshub-4"} 0.05
# HELP cosmos_validator_watcher_slash_fraction_downtime Slash penaltiy for downtime
# TYPE cosmos_validator_watcher_slash_fraction_downtime gauge
cosmos_validator_watcher_slash_fraction_downtime{chain_id="cosmoshub-4"} 0.0001
.
.
.

```

Example Log for cosmoshub-4
```
12:30AM INF connected to https://cosmos-rpc.publicnode.com:443 chainID=cosmoshub-4 height=22945025
12:30AM INF connected to https://cosmos-rpc.polkachu.com:443 chainID=cosmoshub-4 height=22945026
12:30AM INF tracking validator D9F8A41B782AA6A66ADC81F953923C7DCE7B6001 alias=figment moniker=Figment
12:30AM INF starting HTTP server on :8080
12:30AM INF Updating slashing metrics for chain cosmoshub-4 Downtime jail duration:=10m0s Min signed per window:=0.050000000000000000 Signed blocks window:=10000 Slash fraction double sign:=0.050000000000000000 Slash fraction downtime:=0.000100000000000000 Slashing parameters for chain:=cosmoshub-4
12:30AM INF Tracked validator missed blocks: 20
12:30AM INF fetched staking validators and signing infos
#22945025 180/180 validators ✅ figment
#22945026 180/180 validators ✅ figment
#22945027 180/180 validators ✅ figment
#22945028 180/180 validators ✅ figment
#22945029 180/180 validators ✅ figment
#22945030 180/180 validators ✅ figment
```
---

Please let me know if I should add some additional code changes.

---------

Signed-off-by: Simon Lichtenauer <[email protected]>
Co-authored-by: Matt Ketmo <[email protected]>
  • Loading branch information
qwertzlbert and MattKetmo authored Nov 15, 2024
1 parent 62b7fbe commit 541890a
Show file tree
Hide file tree
Showing 9 changed files with 338 additions and 34 deletions.
4 changes: 4 additions & 0 deletions pkg/app/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,10 @@ var Flags = []cli.Flag{
Name: "no-staking",
Usage: "disable calls to staking module (useful for consumer chains)",
},
&cli.BoolFlag{
Name: "no-slashing",
Usage: "disable calls to slashing module",
},
&cli.BoolFlag{
Name: "no-commission",
Usage: "disable calls to get validator commission (useful for chains without distribution module)",
Expand Down
15 changes: 15 additions & 0 deletions pkg/app/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ func RunFunc(cCtx *cli.Context) error {
noStaking = cCtx.Bool("no-staking")
noUpgrade = cCtx.Bool("no-upgrade")
noCommission = cCtx.Bool("no-commission")
noSlashing = cCtx.Bool("no-slashing")
denom = cCtx.String("denom")
denomExpon = cCtx.Uint("denom-exponent")
startTimeout = cCtx.Duration("start-timeout")
Expand Down Expand Up @@ -128,13 +129,24 @@ func RunFunc(cCtx *cli.Context) error {
})
}

//
// Slashing watchers
//
if !noSlashing {
slashingWatcher := watcher.NewSlashingWatcher(metrics, pool)
errg.Go(func() error {
return slashingWatcher.Start(ctx)
})
}

//
// Pool watchers
//
if !noStaking {
validatorsWatcher := watcher.NewValidatorsWatcher(trackedValidators, metrics, pool, watcher.ValidatorsWatcherOptions{
Denom: denom,
DenomExponent: denomExpon,
NoSlashing: noSlashing,
})
errg.Go(func() error {
return validatorsWatcher.Start(ctx)
Expand Down Expand Up @@ -320,8 +332,10 @@ func createTrackedValidators(ctx context.Context, pool *rpc.Pool, validators []s
for _, stakingVal := range stakingValidators {
address := crypto.PubKeyAddress(stakingVal.ConsensusPubkey)
if address == val.Address {
hrp := crypto.GetHrpPrefix(stakingVal.OperatorAddress) + "valcons"
val.Moniker = stakingVal.Description.Moniker
val.OperatorAddress = stakingVal.OperatorAddress
val.ConsensusAddress = crypto.PubKeyBech32Address(stakingVal.ConsensusPubkey, hrp)
}
}

Expand All @@ -336,6 +350,7 @@ func createTrackedValidators(ctx context.Context, pool *rpc.Pool, validators []s
Str("alias", val.Name).
Str("moniker", val.Moniker).
Str("operator", val.OperatorAddress).
Str("consensus", val.ConsensusAddress).
Msgf("validator info")

return val
Expand Down
38 changes: 34 additions & 4 deletions pkg/crypto/utils.go
Original file line number Diff line number Diff line change
@@ -1,21 +1,51 @@
package crypto

import (
"strings"

"github.com/cometbft/cometbft/libs/bytes"
types1 "github.com/cosmos/cosmos-sdk/codec/types"
"github.com/cosmos/cosmos-sdk/crypto/keys/ed25519"
"github.com/cosmos/cosmos-sdk/crypto/keys/secp256k1"
"github.com/cosmos/cosmos-sdk/types/bech32"
)

func PubKeyAddress(consensusPubkey *types1.Any) string {
func PubKeyAddressHelper(consensusPubkey *types1.Any) bytes.HexBytes {
switch consensusPubkey.TypeUrl {
case "/cosmos.crypto.ed25519.PubKey":
key := ed25519.PubKey{Key: consensusPubkey.Value[2:]}
return key.Address().String()
return key.Address()

case "/cosmos.crypto.secp256k1.PubKey":
key := secp256k1.PubKey{Key: consensusPubkey.Value[2:]}
return key.Address().String()
return key.Address()
}

panic("unknown pubkey type: " + consensusPubkey.TypeUrl)
}

func PubKeyAddress(consensusPubkey *types1.Any) string {
key := PubKeyAddressHelper(consensusPubkey)
return key.String()
}

func PubKeyBech32Address(consensusPubkey *types1.Any, prefix string) string {
key := PubKeyAddressHelper(consensusPubkey)
address, _ := bech32.ConvertAndEncode(prefix, key)
return address
}

// GetHrpPrefix returns the human-readable prefix for a given address.
// Examples of valid address HRPs are "cosmosvalcons", "cosmosvaloper".
// So this will return "cosmos" as the prefix
func GetHrpPrefix(a string) string {

hrp, _, err := bech32.DecodeAndConvert(a)
if err != nil {
return err.Error()
}

for _, v := range []string{"valoper", "cncl", "valcons"} {
hrp = strings.TrimSuffix(hrp, v)
}
return hrp
}
96 changes: 78 additions & 18 deletions pkg/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,27 +9,33 @@ type Metrics struct {
Registry *prometheus.Registry

// Global metrics
ActiveSet *prometheus.GaugeVec
BlockHeight *prometheus.GaugeVec
ProposalEndTime *prometheus.GaugeVec
SeatPrice *prometheus.GaugeVec
SkippedBlocks *prometheus.CounterVec
TrackedBlocks *prometheus.CounterVec
Transactions *prometheus.CounterVec
UpgradePlan *prometheus.GaugeVec
ActiveSet *prometheus.GaugeVec
BlockHeight *prometheus.GaugeVec
ProposalEndTime *prometheus.GaugeVec
SeatPrice *prometheus.GaugeVec
SkippedBlocks *prometheus.CounterVec
TrackedBlocks *prometheus.CounterVec
Transactions *prometheus.CounterVec
UpgradePlan *prometheus.GaugeVec
SignedBlocksWindow *prometheus.GaugeVec
MinSignedBlocksPerWindow *prometheus.GaugeVec
DowntimeJailDuration *prometheus.GaugeVec
SlashFractionDoubleSign *prometheus.GaugeVec
SlashFractionDowntime *prometheus.GaugeVec

// Validator metrics
Rank *prometheus.GaugeVec
ProposedBlocks *prometheus.CounterVec
ValidatedBlocks *prometheus.CounterVec
MissedBlocks *prometheus.CounterVec
SoloMissedBlocks *prometheus.CounterVec
Rank *prometheus.GaugeVec
ProposedBlocks *prometheus.CounterVec
ValidatedBlocks *prometheus.CounterVec
MissedBlocks *prometheus.CounterVec
SoloMissedBlocks *prometheus.CounterVec
ConsecutiveMissedBlocks *prometheus.GaugeVec
Tokens *prometheus.GaugeVec
IsBonded *prometheus.GaugeVec
IsJailed *prometheus.GaugeVec
Commission *prometheus.GaugeVec
Vote *prometheus.GaugeVec
MissedBlocksWindow *prometheus.GaugeVec
Tokens *prometheus.GaugeVec
IsBonded *prometheus.GaugeVec
IsJailed *prometheus.GaugeVec
Commission *prometheus.GaugeVec
Vote *prometheus.GaugeVec

// Node metrics
NodeBlockHeight *prometheus.GaugeVec
Expand Down Expand Up @@ -111,6 +117,14 @@ func New(namespace string) *Metrics {
},
[]string{"chain_id", "address", "name"},
),
MissedBlocksWindow: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: namespace,
Name: "missed_blocks_window",
Help: "Number of missed blocks per validator for the current signing window (for a bonded validator)",
},
[]string{"chain_id", "address", "name"},
),
TrackedBlocks: prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: namespace,
Expand Down Expand Up @@ -207,6 +221,46 @@ func New(namespace string) *Metrics {
},
[]string{"chain_id", "proposal_id"},
),
SignedBlocksWindow: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: namespace,
Name: "signed_blocks_window",
Help: "Number of blocks per signing window",
},
[]string{"chain_id"},
),
MinSignedBlocksPerWindow: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: namespace,
Name: "min_signed_blocks_per_window",
Help: "Minimum number of blocks required to be signed per signing window",
},
[]string{"chain_id"},
),
DowntimeJailDuration: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: namespace,
Name: "downtime_jail_duration",
Help: "Duration of the jail period for a validator in seconds",
},
[]string{"chain_id"},
),
SlashFractionDoubleSign: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: namespace,
Name: "slash_fraction_double_sign",
Help: "Slash penaltiy for double-signing",
},
[]string{"chain_id"},
),
SlashFractionDowntime: prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: namespace,
Name: "slash_fraction_downtime",
Help: "Slash penaltiy for downtime",
},
[]string{"chain_id"},
),
}

return metrics
Expand All @@ -225,6 +279,7 @@ func (m *Metrics) Register() {
m.Registry.MustRegister(m.MissedBlocks)
m.Registry.MustRegister(m.SoloMissedBlocks)
m.Registry.MustRegister(m.ConsecutiveMissedBlocks)
m.Registry.MustRegister(m.MissedBlocksWindow)
m.Registry.MustRegister(m.TrackedBlocks)
m.Registry.MustRegister(m.Transactions)
m.Registry.MustRegister(m.SkippedBlocks)
Expand All @@ -237,4 +292,9 @@ func (m *Metrics) Register() {
m.Registry.MustRegister(m.NodeSynced)
m.Registry.MustRegister(m.UpgradePlan)
m.Registry.MustRegister(m.ProposalEndTime)
m.Registry.MustRegister(m.SignedBlocksWindow)
m.Registry.MustRegister(m.MinSignedBlocksPerWindow)
m.Registry.MustRegister(m.DowntimeJailDuration)
m.Registry.MustRegister(m.SlashFractionDoubleSign)
m.Registry.MustRegister(m.SlashFractionDowntime)
}
90 changes: 90 additions & 0 deletions pkg/watcher/slashing.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
package watcher

import (
"context"
"fmt"
"time"

"github.com/cosmos/cosmos-sdk/client"
slashing "github.com/cosmos/cosmos-sdk/x/slashing/types"
"github.com/kilnfi/cosmos-validator-watcher/pkg/metrics"
"github.com/kilnfi/cosmos-validator-watcher/pkg/rpc"
"github.com/rs/zerolog/log"
)

type SlashingWatcher struct {
metrics *metrics.Metrics
pool *rpc.Pool

signedBlocksWindow int64
minSignedPerWindow float64
downtimeJailDuration float64
slashFractionDoubleSign float64
slashFractionDowntime float64
}

func NewSlashingWatcher(metrics *metrics.Metrics, pool *rpc.Pool) *SlashingWatcher {
return &SlashingWatcher{
metrics: metrics,
pool: pool,
}
}

func (w *SlashingWatcher) Start(ctx context.Context) error {
// update metrics every 30 minutes
ticker := time.NewTicker(30 * time.Minute)

for {
node := w.pool.GetSyncedNode()
if node == nil {
log.Warn().Msg("no node available to fetch slashing parameters")
} else if err := w.fetchSlashingParameters(ctx, node); err != nil {
log.Error().Err(err).
Str("node", node.Redacted()).
Msg("failed to fetch slashing parameters")
}

select {
case <-ctx.Done():
return nil
case <-ticker.C:
}
}
}

func (w *SlashingWatcher) fetchSlashingParameters(ctx context.Context, node *rpc.Node) error {
clientCtx := (client.Context{}).WithClient(node.Client)
queryClient := slashing.NewQueryClient(clientCtx)
sigininParams, err := queryClient.Params(ctx, &slashing.QueryParamsRequest{})
if err != nil {
return fmt.Errorf("failed to get slashing parameters: %w", err)
}

w.handleSlashingParams(node.ChainID(), sigininParams.Params)

return nil

}

func (w *SlashingWatcher) handleSlashingParams(chainID string, params slashing.Params) {
log.Debug().
Str("chainID", chainID).
Str("downtimeJailDuration", params.DowntimeJailDuration.String()).
Str("minSignedPerWindow", fmt.Sprintf("%.2f", params.MinSignedPerWindow.MustFloat64())).
Str("signedBlocksWindow", fmt.Sprint(params.SignedBlocksWindow)).
Str("slashFractionDoubleSign", fmt.Sprintf("%.2f", params.SlashFractionDoubleSign.MustFloat64())).
Str("slashFractionDowntime", fmt.Sprintf("%.2f", params.SlashFractionDowntime.MustFloat64())).
Msgf("updating slashing metrics")

w.signedBlocksWindow = params.SignedBlocksWindow
w.minSignedPerWindow, _ = params.MinSignedPerWindow.Float64()
w.downtimeJailDuration = params.DowntimeJailDuration.Seconds()
w.slashFractionDoubleSign, _ = params.SlashFractionDoubleSign.Float64()
w.slashFractionDowntime, _ = params.SlashFractionDowntime.Float64()

w.metrics.SignedBlocksWindow.WithLabelValues(chainID).Set(float64(w.signedBlocksWindow))
w.metrics.MinSignedBlocksPerWindow.WithLabelValues(chainID).Set(w.minSignedPerWindow)
w.metrics.DowntimeJailDuration.WithLabelValues(chainID).Set(w.downtimeJailDuration)
w.metrics.SlashFractionDoubleSign.WithLabelValues(chainID).Set(w.slashFractionDoubleSign)
w.metrics.SlashFractionDowntime.WithLabelValues(chainID).Set(w.slashFractionDowntime)
}
45 changes: 45 additions & 0 deletions pkg/watcher/slashing_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
package watcher

import (
"testing"
"time"

cosmossdk_io_math "cosmossdk.io/math"
slashing "github.com/cosmos/cosmos-sdk/x/slashing/types"
"github.com/kilnfi/cosmos-validator-watcher/pkg/metrics"
"github.com/prometheus/client_golang/prometheus/testutil"
"gotest.tools/assert"
)

func TestSlashingWatcher(t *testing.T) {
var chainID = "test-chain"

watcher := NewSlashingWatcher(
metrics.New("cosmos_validator_watcher"),
nil,
)

t.Run("Handle Slashing Parameters", func(t *testing.T) {

minSignedPerWindow := cosmossdk_io_math.LegacyMustNewDecFromStr("0.1")
slashFractionDoubleSign := cosmossdk_io_math.LegacyMustNewDecFromStr("0.01")
slashFractionDowntime := cosmossdk_io_math.LegacyMustNewDecFromStr("0.001")

params := slashing.Params{
SignedBlocksWindow: int64(1000),
MinSignedPerWindow: minSignedPerWindow,
DowntimeJailDuration: time.Duration(10) * time.Second,
SlashFractionDoubleSign: slashFractionDoubleSign,
SlashFractionDowntime: slashFractionDowntime,
}

watcher.handleSlashingParams(chainID, params)

assert.Equal(t, float64(1000), testutil.ToFloat64(watcher.metrics.SignedBlocksWindow.WithLabelValues(chainID)))
assert.Equal(t, float64(0.1), testutil.ToFloat64(watcher.metrics.MinSignedBlocksPerWindow.WithLabelValues(chainID)))
assert.Equal(t, float64(10), testutil.ToFloat64(watcher.metrics.DowntimeJailDuration.WithLabelValues(chainID)))
assert.Equal(t, float64(0.01), testutil.ToFloat64(watcher.metrics.SlashFractionDoubleSign.WithLabelValues(chainID)))
assert.Equal(t, float64(0.001), testutil.ToFloat64(watcher.metrics.SlashFractionDowntime.WithLabelValues(chainID)))
})

}
Loading

0 comments on commit 541890a

Please sign in to comment.