-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observability of validators for relayers #3057
Conversation
… trevor/relayer-validator-metrics
… trevor/relayer-validator-metrics
… trevor/relayer-validator-metrics
|
… trevor/relayer-validator-metrics
… trevor/relayer-validator-metrics
### Description Includes the `app_context` classification in `PendingMessage`, and adds trait methods on `PendingOperation` to require always having such a label on `OpQueue` operations. This is done by reusing the matching list logic from the validator checkpoint labels (#3057). The nice thing is that this enables later support for retrying a group of `OpQueue` operations just by specifying the `app_context` label, without adding any new logic, since these labels are essentially matching list results. One downside to using `app_context` for retries is that the endpoint caller is constrained to only the matching lists defined by the relayer operator - however imo only the relayer operator that should be able to trigger retries. ### Drive-by changes The `OpQueue` type alias is converted to an actual struct, that stores the queue label (for metrics purposes), and also the `IntGaugeVec` metric: the generic group of metrics associated with that queue (basically only `submitter_queue_length` currently). ### Related issues - Fixes #3240 ### Backward compatibility Yes ### Testing Manual, by spinning up a relayer for injective and inevm. Sample metrics, from `--metricAppContexts '[{"name": "injectivelabel", "matchingList": [{"destination_domain": 6909546 }] }, {"name": "inevmlabel", "matchingList": [{"destination_domain": 2525 }] }]'` ``` hyperlane_submitter_queue_length{agent="relayer",app_context="inevmlabel",hyperlane_baselib_version="0.1.0",queue_name="confirm_queue",remote="inevm"} 11 hyperlane_submitter_queue_length{agent="relayer",app_context="inevmlabel",hyperlane_baselib_version="0.1.0",queue_name="prepare_queue",remote="inevm"} 0 hyperlane_submitter_queue_length{agent="relayer",app_context="injectivelabel",hyperlane_baselib_version="0.1.0",queue_name="confirm_queue",remote="injective"} 63 hyperlane_submitter_queue_length{agent="relayer",app_context="injectivelabel",hyperlane_baselib_version="0.1.0",queue_name="prepare_queue",remote="injective"} 13281 ```
### Description Includes the `app_context` classification in `PendingMessage`, and adds trait methods on `PendingOperation` to require always having such a label on `OpQueue` operations. This is done by reusing the matching list logic from the validator checkpoint labels (#3057). The nice thing is that this enables later support for retrying a group of `OpQueue` operations just by specifying the `app_context` label, without adding any new logic, since these labels are essentially matching list results. One downside to using `app_context` for retries is that the endpoint caller is constrained to only the matching lists defined by the relayer operator - however imo only the relayer operator that should be able to trigger retries. ### Drive-by changes The `OpQueue` type alias is converted to an actual struct, that stores the queue label (for metrics purposes), and also the `IntGaugeVec` metric: the generic group of metrics associated with that queue (basically only `submitter_queue_length` currently). ### Related issues - Fixes #3240 ### Backward compatibility Yes ### Testing Manual, by spinning up a relayer for injective and inevm. Sample metrics, from `--metricAppContexts '[{"name": "injectivelabel", "matchingList": [{"destination_domain": 6909546 }] }, {"name": "inevmlabel", "matchingList": [{"destination_domain": 2525 }] }]'` ``` hyperlane_submitter_queue_length{agent="relayer",app_context="inevmlabel",hyperlane_baselib_version="0.1.0",queue_name="confirm_queue",remote="inevm"} 11 hyperlane_submitter_queue_length{agent="relayer",app_context="inevmlabel",hyperlane_baselib_version="0.1.0",queue_name="prepare_queue",remote="inevm"} 0 hyperlane_submitter_queue_length{agent="relayer",app_context="injectivelabel",hyperlane_baselib_version="0.1.0",queue_name="confirm_queue",remote="injective"} 63 hyperlane_submitter_queue_length{agent="relayer",app_context="injectivelabel",hyperlane_baselib_version="0.1.0",queue_name="prepare_queue",remote="injective"} 13281 ```
Description
Goal of this was to have insight into validators of important sets being "up"
Introduces a new metric used by relayers:
hyperlane_observed_validator_latest_index
, e.g.:Tapping into metadata building for multisig ISMs, the relayer will update the metric with the latest indices for the validators in a set. In order to prevent the cardinality being ridiculously high, only certain validator sets are tracked. This is done by introducing an
app_context
label (I'm very open to other names here, for some reason whenever idk how to name some kind of identifier I end up calling it a context 😆)The app context can either be:
--metricAppContexts '[{"name": "testapp", "matchingList": [{"recipient_address": "0xd84379ceae14aa33c123af12424a37803f885889", "destination_domain": 13371 }] }]'
. This is nice for e.g. warp route deployments, where the ISM is maybe not a default ISM, and can be changedThis way we have insight in to the default ISM and any application-specific ISMs.
Some things to note:
Drive-by changes
.min(1)
insteadRelated issues
Backward compatibility
yes
Testing
Ran locally - I think i'll probably add something in e2e tests, but opening now