Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Observability] Foundation for load testing telemetry #832

Merged
merged 42 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
ff762ea
measure begin/end blockers
okdas Sep 24, 2024
1aad6ca
--wip-- [skip ci]
okdas Sep 24, 2024
ee84b91
Merge remote-tracking branch 'origin/main' into add-metrics
okdas Sep 25, 2024
025698a
metrixxx
okdas Sep 26, 2024
b7c5bd3
--wip-- [skip ci]
okdas Sep 26, 2024
549710e
bring back original
okdas Sep 26, 2024
208f824
TODO: figure out why prometheus doesn't scrape anymore
okdas Sep 27, 2024
90624df
log level and reduce verbosity of some logs
okdas Sep 27, 2024
8fa8e04
increase the stakes to run the load-test
okdas Sep 27, 2024
8bf2212
Merge branch 'main' into add-metrics
okdas Oct 2, 2024
0262062
--wip-- [skip ci]
okdas Oct 1, 2024
d39fb8f
--wip-- [skip ci]
okdas Oct 2, 2024
69f3df5
--wip-- [skip ci]
okdas Oct 3, 2024
fbfe6d1
--wip-- [skip ci]
okdas Oct 3, 2024
cfa7dc7
--wip-- [skip ci]
okdas Oct 4, 2024
050687a
add histogram
okdas Oct 4, 2024
4786097
add custom metrics config
okdas Oct 4, 2024
83f1b7d
break proofs by app and supplier
okdas Oct 4, 2024
10eb2be
--wip-- [skip ci]
okdas Oct 5, 2024
d28f514
Merge remote-tracking branch 'origin/main' into add-metrics
okdas Oct 7, 2024
cdf62fc
self-review
okdas Oct 7, 2024
6628c07
self-review
okdas Oct 7, 2024
7ab27ac
Merge branch 'main' into add-metrics
okdas Oct 21, 2024
72b3086
fix after merge
okdas Oct 22, 2024
aa61ace
self-review pass
okdas Oct 22, 2024
eb96ae6
Merge branch 'main' into add-metrics
okdas Oct 24, 2024
2cd91de
change retention time on localnet
okdas Oct 24, 2024
d966241
Merge branch 'main' into add-metrics
Olshansk Oct 24, 2024
ce77cc7
add psql datasource to grafana
okdas Oct 25, 2024
e5cc4f6
Merge remote-tracking branch 'origin/main' into add-metrics
okdas Oct 28, 2024
4a0b12b
localnet_up after merge
okdas Oct 28, 2024
497e0e3
Merge remote-tracking branch 'origin/add-metrics' into use-pocketdex-…
okdas Oct 28, 2024
814d0a2
more dashboards
okdas Oct 30, 2024
cd438dc
Merge remote-tracking branch 'origin/main' into add-metrics
okdas Oct 30, 2024
54d6c0f
Update Tiltfile
okdas Oct 30, 2024
dd5276d
address the feedback
okdas Oct 30, 2024
fc53e96
address feedback
okdas Oct 30, 2024
a247a8e
clarify comments
okdas Oct 30, 2024
5f693e5
clarify comments
okdas Oct 30, 2024
6cac26c
Merge remote-tracking branch 'origin/main' into add-metrics
okdas Oct 30, 2024
a13c2ec
address feedback
okdas Oct 30, 2024
6cd43f0
fix the cycle
okdas Oct 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions Tiltfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ load("ext://deployment", "deployment_create")
load("ext://execute_in_pod", "execute_in_pod")

# A list of directories where changes trigger a hot-reload of the validator
hot_reload_dirs = ["app", "cmd", "tools", "x", "pkg"]
hot_reload_dirs = ["app", "cmd", "tools", "x", "pkg", "telemetry"]


def merge_dicts(base, updates):
Expand Down Expand Up @@ -38,14 +38,26 @@ localnet_config_defaults = {
"enabled": True,
"grafana": {"defaultDashboardsEnabled": False},
},
"relayminers": {"count": 1, "delve": {"enabled": False}},
"relayminers": {
"count": 1,
"delve": {"enabled": False},
"logs": {
"level": "debug",
},
},
"gateways": {
"count": 1,
"delve": {"enabled": False},
"logs": {
"level": "debug",
},
},
"appgateservers": {
"count": 1,
"delve": {"enabled": False},
"logs": {
"level": "debug",
},
},
"ollama": {
"enabled": False,
Expand Down Expand Up @@ -100,8 +112,9 @@ if localnet_config["observability"]["enabled"]:
helm_repo("prometheus-community", "https://prometheus-community.github.io/helm-charts")
helm_repo("grafana-helm-repo", "https://grafana.github.io/helm-charts")

# Increase timeout for building the image
update_settings(k8s_upsert_timeout_secs=60)
# Increase timeout for building the imagedefault is 30, which can be too low for slow internet connections to pull
okdas marked this conversation as resolved.
Show resolved Hide resolved
# container images.
update_settings(k8s_upsert_timeout_secs=120)

helm_resource(
"observability",
Expand Down Expand Up @@ -226,6 +239,7 @@ helm_resource(
"--set=logs.format=" + str(localnet_config["validator"]["logs"]["format"]),
"--set=serviceMonitor.enabled=" + str(localnet_config["observability"]["enabled"]),
"--set=development.delve.enabled=" + str(localnet_config["validator"]["delve"]["enabled"]),
"--set=image.repository=poktrolld",
],
image_deps=["poktrolld"],
image_keys=[("image.repository", "image.tag")],
Expand All @@ -244,6 +258,8 @@ for x in range(localnet_config["relayminers"]["count"]):
"--values=./localnet/kubernetes/values-relayminer-" + str(actor_number) + ".yaml",
"--set=metrics.serviceMonitor.enabled=" + str(localnet_config["observability"]["enabled"]),
"--set=development.delve.enabled=" + str(localnet_config["relayminers"]["delve"]["enabled"]),
"--set=logLevel=" + str(localnet_config["relayminers"]["logs"]["level"]),
"--set=image.repository=poktrolld",
],
image_deps=["poktrolld"],
image_keys=[("image.repository", "image.tag")],
Expand Down Expand Up @@ -284,6 +300,8 @@ for x in range(localnet_config["appgateservers"]["count"]):
"--set=config.signing_key=app" + str(actor_number),
"--set=metrics.serviceMonitor.enabled=" + str(localnet_config["observability"]["enabled"]),
"--set=development.delve.enabled=" + str(localnet_config["appgateservers"]["delve"]["enabled"]),
"--set=logLevel=" + str(localnet_config["appgateservers"]["logs"]["level"]),
"--set=image.repository=poktrolld",
],
image_deps=["poktrolld"],
image_keys=[("image.repository", "image.tag")],
Expand Down Expand Up @@ -325,6 +343,8 @@ for x in range(localnet_config["gateways"]["count"]):
"--set=config.signing_key=gateway" + str(actor_number),
"--set=metrics.serviceMonitor.enabled=" + str(localnet_config["observability"]["enabled"]),
"--set=development.delve.enabled=" + str(localnet_config["gateways"]["delve"]["enabled"]),
"--set=logLevel=" + str(localnet_config["gateways"]["logs"]["level"]),
"--set=image.repository=poktrolld",
],
image_deps=["poktrolld"],
image_keys=[("image.repository", "image.tag")],
Expand Down
14 changes: 6 additions & 8 deletions api/poktroll/application/types.pulsar.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions app/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,11 @@ func New(
return nil, err
}

// Set up poktroll telemetry using `app.toml` configuration options (in addition to cosmos-sdk telemetry config).
if err := telemetry.New(appOpts); err != nil {
return nil, err
}

return app, nil
}

Expand Down
64 changes: 50 additions & 14 deletions cmd/poktrolld/cmd/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,26 @@ import (
sdk "github.com/cosmos/cosmos-sdk/types"

"github.com/pokt-network/poktroll/app"
"github.com/pokt-network/poktroll/telemetry"
)

var once sync.Once

// PoktrollAdditionalConfig represents a poktroll-specific part of `app.toml` file.
// See the `customAppConfigTemplate()` for additional information about each setting.
type PoktrollAdditionalConfig struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked in Cosmos & Comet and the names for these are usually RPCConfig, CustomAppConfig, serverconfig.Config, etc...

Wdyt of PoktrollAppConfig?

Additional feels weird...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CN(aming)O at work

Telemetry telemetry.PoktrollTelemetryConfig `mapstructure:"telemetry"`
}

// poktrollAdditionalConfigDefaults sets default values to render in `app.toml`.
okdas marked this conversation as resolved.
Show resolved Hide resolved
func poktrollAdditionalConfigDefaults() PoktrollAdditionalConfig {
return PoktrollAdditionalConfig{
Telemetry: telemetry.PoktrollTelemetryConfig{
CardinalityLevel: "medium",
Olshansk marked this conversation as resolved.
Show resolved Hide resolved
},
}
}

func InitSDKConfig() {
once.Do(func() {
checkOrInitSDKConfig()
Expand Down Expand Up @@ -90,6 +106,7 @@ func initAppConfig() (string, interface{}) {
// The following code snippet is just for reference.
type CustomAppConfig struct {
serverconfig.Config `mapstructure:",squash"`
Poktroll PoktrollAdditionalConfig `mapstructure:"poktroll"`
}

// Optionally allow the chain developer to overwrite the SDK's default
Expand All @@ -113,27 +130,46 @@ func initAppConfig() (string, interface{}) {
srvCfg.MinGasPrices = "0.000000001upokt" // Also adjust ignite's `config.yml`.
srvCfg.Mempool.MaxTxs = 10000
srvCfg.Telemetry.Enabled = true
srvCfg.Telemetry.PrometheusRetentionTime = 60 // in seconds. This turns on Prometheus support.
// Positive value turns on prometheus support. Prometheus metrics are removed from the exporter when retention time is reached.
srvCfg.Telemetry.PrometheusRetentionTime = 60 * 60 * 24 // in seconds.
okdas marked this conversation as resolved.
Show resolved Hide resolved
srvCfg.Telemetry.MetricsSink = "mem"
srvCfg.Pruning = "nothing" // archiving node by default
srvCfg.API.Enable = true
srvCfg.GRPC.Enable = true
srvCfg.GRPCWeb.Enable = true

customAppConfig := CustomAppConfig{
Config: *srvCfg,
Config: *srvCfg,
Poktroll: poktrollAdditionalConfigDefaults(),
}

customAppTemplate := serverconfig.DefaultConfigTemplate
// Edit the default template file
//
// customAppTemplate := serverconfig.DefaultConfigTemplate + `
// [wasm]
// # This is the maximum sdk gas (wasm and storage) that we allow for any x/wasm "smart" queries
// query_gas_limit = 300000
// # This is the number of wasm vm instances we keep cached in memory for speed-up
// # Warning: this is currently unstable and may lead to crashes, best to keep for 0 unless testing locally
// lru_size = 0`

return customAppTemplate, customAppConfig
return customAppConfigTemplate(), customAppConfig
}

// customAppConfigTemplate extends the default configuration `app.toml` file with our own configs. They are going to be
okdas marked this conversation as resolved.
Show resolved Hide resolved
// used on validators and full-nodes, and they render using default values from `poktrollAdditionalConfigDefaults()`.
okdas marked this conversation as resolved.
Show resolved Hide resolved
func customAppConfigTemplate() string {
return serverconfig.DefaultConfigTemplate + `
###############################################################################
### Poktroll ###
###############################################################################

# Poktroll-specific configuration for Full Nodes and Validators.
okdas marked this conversation as resolved.
Show resolved Hide resolved
[poktroll]

# Telemetry configuration in addition to the [telemetry] settings.
[poktroll.telemetry]
okdas marked this conversation as resolved.
Show resolved Hide resolved

# Cardinality level for telemetry metrics collection
# This controls the level of detail (number of unique labels) in metrics.
# Options:
# - "low": Collects basic metrics with low cardinality.
# Suitable for production environments with tight performance constraints.
# - "medium": Collects a moderate number of labels, balancing detail and performance.
# Suitable for moderate workloads or staging environments.
# - "high": WARNING: WILL CAUSE STRESS TO YOUR MONITORING ENVIRONMENT! Collects detailed metrics with high
# cardinality, including labels with many unique values (e.g., application_id, session_id).
# Recommended for debugging or testing environments.
cardinality-level = "{{ .Poktroll.Telemetry.CardinalityLevel }}"
`
}
19 changes: 10 additions & 9 deletions config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,10 @@ validators:
# minimum-gas-prices: 0.000000001upokt
telemetry:
enabled: true
prometheus-retention-time: "600" # seconds
poktroll:
telemetry:
# "high" produces a lot of timeseries. Only suitable for small networks such as LocalNet.
okdas marked this conversation as resolved.
Show resolved Hide resolved
cardinality-level: high
config:
moniker: "validator1"
consensus:
Expand Down Expand Up @@ -139,13 +142,13 @@ genesis:
denom: upokt
bank:
supply:
- amount: "1003000204"
- amount: "1102000204"
denom: upokt
balances:
# Application module
- address: pokt1rl3gjgzexmplmds3tq3r3yk84zlwdl6djzgsvm
coins:
- amount: "1000068" # Equals to the total of all app stakes below
- amount: "100000068" # Equals to the total of all app stakes below
okdas marked this conversation as resolved.
Show resolved Hide resolved
denom: upokt
# Supplier module
- address: pokt1j40dzzmn6cn9kxku7a5tjnud6hv37vesr5ccaa
Expand All @@ -171,9 +174,8 @@ genesis:
denom: upokt
applicationList:
- address: pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4
delegatee_gateway_addresses: [
pokt15vzxjqklzjtlz7lahe8z2dfe9nm5vxwwmscne4
]
delegatee_gateway_addresses:
[pokt15vzxjqklzjtlz7lahe8z2dfe9nm5vxwwmscne4]
service_configs:
- service_id: anvil
stake:
Expand All @@ -182,9 +184,8 @@ genesis:
amount: "100000068" # ~100 POKT
denom: upokt
- address: pokt184zvylazwu4queyzpl0gyz9yf5yxm2kdhh9hpm
delegatee_gateway_addresses: [
pokt15vzxjqklzjtlz7lahe8z2dfe9nm5vxwwmscne4
]
delegatee_gateway_addresses:
[pokt15vzxjqklzjtlz7lahe8z2dfe9nm5vxwwmscne4]
service_configs:
- service_id: rest
stake:
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ require (
require (
cosmossdk.io/x/tx v0.13.4
github.com/jhump/protoreflect v1.16.0
github.com/mitchellh/mapstructure v1.5.0
)

require (
Expand Down Expand Up @@ -224,7 +225,6 @@ require (
github.com/minio/highwayhash v1.0.2 // indirect
github.com/mitchellh/go-homedir v1.1.0 // indirect
github.com/mitchellh/go-testing-interface v1.14.1 // indirect
github.com/mitchellh/mapstructure v1.5.0 // indirect
github.com/moby/docker-image-spec v1.3.1 // indirect
github.com/moby/term v0.5.0 // indirect
github.com/morikuni/aec v1.0.0 // indirect
Expand Down
Loading
Loading