Observability of validators for relayers #3057

tkporter · 2023-12-14T12:15:10Z

Description

Goal of this was to have insight into validators of important sets being "up"

Introduces a new metric used by relayers: hyperlane_observed_validator_latest_index, e.g.:

hyperlane_observed_validator_latest_index{agent="relayer",app_context="default_ism",destination="test1",hyperlane_baselib_version="0.1.0",origin="test2",validator="0x9965507d1a55bcc2695c58ba16fb37d819b0a4dc"} 664
hyperlane_observed_validator_latest_index{agent="relayer",app_context="default_ism",destination="test1",hyperlane_baselib_version="0.1.0",origin="test3",validator="0x976ea74026e726554db657fa54763abd0c3a0aa9"} 641
hyperlane_observed_validator_latest_index{agent="relayer",app_context="default_ism",destination="test2",hyperlane_baselib_version="0.1.0",origin="test1",validator="0x15d34aaf54267db7d7c367839aaf71a00a2c6a65"} 670
hyperlane_observed_validator_latest_index{agent="relayer",app_context="default_ism",destination="test2",hyperlane_baselib_version="0.1.0",origin="test3",validator="0x976ea74026e726554db657fa54763abd0c3a0aa9"} 665
hyperlane_observed_validator_latest_index{agent="relayer",app_context="default_ism",destination="test3",hyperlane_baselib_version="0.1.0",origin="test1",validator="0x15d34aaf54267db7d7c367839aaf71a00a2c6a65"} 652
hyperlane_observed_validator_latest_index{agent="relayer",app_context="default_ism",destination="test3",hyperlane_baselib_version="0.1.0",origin="test2",validator="0x9965507d1a55bcc2695c58ba16fb37d819b0a4dc"} 664
hyperlane_observed_validator_latest_index{agent="relayer",app_context="testapp",destination="test1",hyperlane_baselib_version="0.1.0",origin="test2",validator="0x9965507d1a55bcc2695c58ba16fb37d819b0a4dc"} 658
hyperlane_observed_validator_latest_index{agent="relayer",app_context="testapp",destination="test1",hyperlane_baselib_version="0.1.0",origin="test3",validator="0x976ea74026e726554db657fa54763abd0c3a0aa9"} 641

Tapping into metadata building for multisig ISMs, the relayer will update the metric with the latest indices for the validators in a set. In order to prevent the cardinality being ridiculously high, only certain validator sets are tracked. This is done by introducing an app_context label (I'm very open to other names here, for some reason whenever idk how to name some kind of identifier I end up calling it a context 😆)

The app context can either be:

if a new setting, --metricAppContexts, is specified, a message will be classified based off the first matching list it matches. E.g. --metricAppContexts '[{"name": "testapp", "matchingList": [{"recipient_address": "0xd84379ceae14aa33c123af12424a37803f885889", "destination_domain": 13371 }] }]'. This is nice for e.g. warp route deployments, where the ISM is maybe not a default ISM, and can be changed
if a message doesn't get classified this way, it can also be classified with the "default_ism" app context, which is just for any message that happens to use the default ISM as its "root" ISM

This way we have insight in to the default ISM and any application-specific ISMs.

Some things to note:

it's possible for a message to actually have more than one validator set, e.g. if it's using an aggregation ISM. In this case, we'll have metrics on the union of all validator sets for that app context
Some effort is required to make sure that metrics don't stick around for a validator that has actually been removed from the set. To handle this, we cache the validator set for an app context and clear out the entire set each time we set the metrics

Drive-by changes

Zod's nonempty function for strings is deprecated, moves to .min(1) instead

Related issues

Fixes Observe 3rd Party validators are signing at the tip in the relayer #1762

Backward compatibility

yes

Testing

Ran locally - I think i'll probably add something in e2e tests, but opening now

… trevor/relayer-validator-metrics

changeset-bot · 2023-12-14T12:15:14Z

⚠️ No Changeset found

Latest commit: 4bb3afd

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

rust/agents/relayer/src/msg/metadata/base.rs

typescript/sdk/src/metadata/agentConfig.ts

… trevor/relayer-validator-metrics

rust/agents/relayer/src/msg/metadata/base.rs

rust/agents/relayer/src/settings/mod.rs

rust/hyperlane-base/src/types/multisig.rs

rust/hyperlane-base/src/metrics/core.rs

rust/agents/relayer/src/msg/metadata/base.rs

… trevor/relayer-validator-metrics

### Description Includes the `app_context` classification in `PendingMessage`, and adds trait methods on `PendingOperation` to require always having such a label on `OpQueue` operations. This is done by reusing the matching list logic from the validator checkpoint labels (#3057). The nice thing is that this enables later support for retrying a group of `OpQueue` operations just by specifying the `app_context` label, without adding any new logic, since these labels are essentially matching list results. One downside to using `app_context` for retries is that the endpoint caller is constrained to only the matching lists defined by the relayer operator - however imo only the relayer operator that should be able to trigger retries. ### Drive-by changes The `OpQueue` type alias is converted to an actual struct, that stores the queue label (for metrics purposes), and also the `IntGaugeVec` metric: the generic group of metrics associated with that queue (basically only `submitter_queue_length` currently). ### Related issues - Fixes #3240 ### Backward compatibility Yes ### Testing Manual, by spinning up a relayer for injective and inevm. Sample metrics, from `--metricAppContexts '[{"name": "injectivelabel", "matchingList": [{"destination_domain": 6909546 }] }, {"name": "inevmlabel", "matchingList": [{"destination_domain": 2525 }] }]'` ``` hyperlane_submitter_queue_length{agent="relayer",app_context="inevmlabel",hyperlane_baselib_version="0.1.0",queue_name="confirm_queue",remote="inevm"} 11 hyperlane_submitter_queue_length{agent="relayer",app_context="inevmlabel",hyperlane_baselib_version="0.1.0",queue_name="prepare_queue",remote="inevm"} 0 hyperlane_submitter_queue_length{agent="relayer",app_context="injectivelabel",hyperlane_baselib_version="0.1.0",queue_name="confirm_queue",remote="injective"} 63 hyperlane_submitter_queue_length{agent="relayer",app_context="injectivelabel",hyperlane_baselib_version="0.1.0",queue_name="prepare_queue",remote="injective"} 13281 ```

tkporter added 28 commits November 26, 2023 12:40

wip

abfa375

Merge branch 'main' of github.com:abacus-network/abacus-monorepo into…

c0cebe4

… trevor/relayer-validator-metrics

wip

2535b9b

oop

b9f66d7

wip

b988181

Keep getting pulled into other things

2885fbf

Merge branch 'main' of github.com:abacus-network/abacus-monorepo into…

f391b71

… trevor/relayer-validator-metrics

compiles

d158749

Wip

bf6bf6d

Super ugly WIP introduction of the MessageBaseMetadataBuilder

aa96b7f

moved over to MessageBaseMetadataBuilder

7dae3c5

Starting the cleanup process

b158483

more clean

7c67917

Clean up

dc7ea06

Cleanin up

501f717

use deref

8e0fd63

more cleaning

c211744

clean

c26fb3f

Move to base: Arc<BaseMetadataBuilder>

09170d7

cleaning up metrics

a0d75c5

tidying

1ca15ee

refactor default ism cache

a002391

fix

8180f23

comments

767e220

Add ts definition

ca14153

Merge branch 'main' of github.com:abacus-network/abacus-monorepo into…

b7ba63a

… trevor/relayer-validator-metrics

No need for the arc around the rwlock

7448197

reset run-locally changes

cad38f8

tkporter requested review from nambrot and yorhodes as code owners December 14, 2023 12:15

tkporter requested review from jmrossy and daniel-savu as code owners December 14, 2023 12:15

tkporter added 4 commits December 14, 2023 12:17

nit

d892b9c

nit

1e5ff5d

nit

58c24f8

undo hardhat change

6eda91c

tkporter commented Dec 14, 2023

View reviewed changes

rust/agents/relayer/src/msg/metadata/base.rs Show resolved Hide resolved

jmrossy reviewed Dec 14, 2023

View reviewed changes

typescript/sdk/src/metadata/agentConfig.ts Outdated Show resolved Hide resolved

typescript/sdk/src/metadata/agentConfig.ts Outdated Show resolved Hide resolved

tkporter added 2 commits December 15, 2023 13:33

Merge branch 'main' of github.com:abacus-network/abacus-monorepo into…

af5e47a

… trevor/relayer-validator-metrics

min(1) instead of nonempty

36801c8

daniel-savu approved these changes Dec 18, 2023

View reviewed changes

daniel-savu reviewed Dec 18, 2023

View reviewed changes

rust/agents/relayer/src/msg/metadata/base.rs Show resolved Hide resolved

tkporter added 6 commits December 20, 2023 14:11

Merge branch 'main' of github.com:abacus-network/abacus-monorepo into…

88b61e3

… trevor/relayer-validator-metrics

PR comments

5934886

comment about making generic

51d0515

fix default ISM cosmwasm

ff8692f

Merge branch 'main' into trevor/relayer-validator-metrics

aaa98e5

Merge branch 'main' into trevor/relayer-validator-metrics

b92095e

tkporter enabled auto-merge (squash) December 20, 2023 16:53

Merge branch 'main' into trevor/relayer-validator-metrics

f05ef89

tkporter mentioned this pull request Jan 2, 2024

Infra support & grafana alerts for validator observability #3109

Closed

tkporter disabled auto-merge January 2, 2024 13:31

Merge branch 'main' into trevor/relayer-validator-metrics

4bb3afd

tkporter enabled auto-merge (squash) January 2, 2024 16:13

tkporter merged commit 3f88aa6 into main Jan 2, 2024
13 of 19 checks passed

tkporter deleted the trevor/relayer-validator-metrics branch January 2, 2024 16:36

daniel-savu mentioned this pull request Mar 11, 2024

Use app context classifier in relayer submitter queues #3385

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observability of validators for relayers #3057

Observability of validators for relayers #3057

tkporter commented Dec 14, 2023 •

edited

Loading

changeset-bot bot commented Dec 14, 2023 •

edited

Loading

Observability of validators for relayers #3057

Observability of validators for relayers #3057

Conversation

tkporter commented Dec 14, 2023 • edited Loading

Description

Drive-by changes

Related issues

Backward compatibility

Testing

changeset-bot bot commented Dec 14, 2023 • edited Loading

⚠️ No Changeset found

tkporter commented Dec 14, 2023 •

edited

Loading

changeset-bot bot commented Dec 14, 2023 •

edited

Loading