Skip to content

Commit

Permalink
Merge pull request #5975 from IntersectMBO/baldurb/ekg-restart
Browse files Browse the repository at this point in the history
cardano-tracer: Allow switching EKG service between different nodes.
  • Loading branch information
mgmeier authored Sep 25, 2024
2 parents c13177f + 7ba25c5 commit ea98b81
Show file tree
Hide file tree
Showing 19 changed files with 431 additions and 410 deletions.
4 changes: 2 additions & 2 deletions cabal.project
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ package cryptonite
flags: -support_rdrand

package snap-server
flags: +openssl
flags: -openssl

package bitvec
flags: -simd
Expand All @@ -62,8 +62,8 @@ constraints:

allow-newer:
, katip:Win32
, ekg-wai:time

-- IMPORTANT
-- Do NOT add more source-repository-package stanzas here unless they are strictly
-- temporary! Please read the section in CONTRIBUTING about updating dependencies.

11 changes: 11 additions & 0 deletions cardano-tracer/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# ChangeLog

## 0.3 (September 20, 2024)

* Abondon `snap` webserver in favour of `wai`/`warp` for Prometheus and EKG Monitoring.
* Add dynamic routing to EKG stores of all connected nodes.
* Derive URL compliant routes from connected node names (instead of plain node names).
* Remove the requirement of two distinct ports for the EKG backend (changing `hasEKG` config type).
* For optional RTView component only: Disable SSL/https connections. Force `snap-server`
dependency to build with `-flag -openssl`.
* Add JSON responses when listing connected nodes for both Prometheus and EKG Monitoring.
* Add consistency check for redundant port values in the config.

## 0.2.4 (August 13, 2024)

* `systemd` is enabled by default. To disable it use the cabal
Expand Down
19 changes: 10 additions & 9 deletions cardano-tracer/cardano-tracer.cabal
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
cabal-version: 3.0

name: cardano-tracer
version: 0.2.4
version: 0.3
synopsis: A service for logging and monitoring over Cardano nodes
description: A service for logging and monitoring over Cardano nodes.
category: Cardano,
Expand Down Expand Up @@ -155,11 +155,12 @@ library
cardano-git-rev ^>=0.2.2
, cassava
, threepenny-gui
, utf8-string
, vector

build-depends: aeson
, async
, async-extras
, auto-update
, bimap
, blaze-html
, bytestring
Expand All @@ -168,21 +169,20 @@ library
, containers
, contra-tracer
, directory
, ekg
, ekg-core
, ekg-forward ^>= 0.5
, ekg-forward >= 0.5
, ekg-wai
, extra
, filepath
, http-types
, mime-mail
, optparse-applicative
, ouroboros-network ^>= 0.17
, ouroboros-network-api
, ouroboros-network-framework
, signal
, slugify
, smtp-mail ^>= 0.5
, snap-blaze
, snap-core
, snap-server
, stm
, string-qq
, text
Expand All @@ -191,6 +191,8 @@ library
, trace-forward
, trace-resources
, unordered-containers
, wai ^>= 3.2
, warp ^>= 3.4
, yaml

if flag(systemd) && os(linux)
Expand Down Expand Up @@ -281,8 +283,7 @@ library demo-acceptor-lib

exposed-modules: Cardano.Tracer.Test.Acceptor

build-depends: async-extras
, bytestring
build-depends: bytestring
, cardano-tracer
, containers
, extra
Expand Down
14 changes: 4 additions & 10 deletions cardano-tracer/configuration/complete-example.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,10 @@
},
"loRequestNum": 100,
"ekgRequestFreq": 2,
"hasEKG": [
{
"epHost": "127.0.0.1",
"epPort": 3100
},
{
"epHost": "127.0.0.1",
"epPort": 3101
}
],
"hasEKG": {
"epHost": "127.0.0.1",
"epPort": 3100
},
"hasPrometheus": {
"epHost": "127.0.0.1",
"epPort": 3000
Expand Down
4 changes: 1 addition & 3 deletions cardano-tracer/configuration/complete-example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,8 @@ network:
loRequestNum: 100
ekgRequestFreq: 2
hasEKG:
- epHost: 127.0.0.1
epHost: 127.0.0.1
epPort: 3100
- epHost: 127.0.0.1
epPort: 3101
hasPrometheus:
epHost: 127.0.0.1
epPort: 3000
Expand Down
10 changes: 2 additions & 8 deletions cardano-tracer/demo/multi/active-tracer-config.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,10 @@
"/run/user/1000/cardano-tracer-demo-3.sock"
]
},
"hasEKG": [
{
"hasEKG": {
"epHost": "127.0.0.1",
"epPort": 3100
},
{
"epHost": "127.0.0.1",
"epPort": 3101
}
],
},
"hasPrometheus": {
"epHost": "127.0.0.1",
"epPort": 3000
Expand Down
14 changes: 4 additions & 10 deletions cardano-tracer/demo/multi/passive-tracer-config.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,10 @@
"tag": "AcceptAt",
"contents": "/run/user/1000/cardano-tracer-demo-1.sock"
},
"hasEKG": [
{
"epHost": "127.0.0.1",
"epPort": 3100
},
{
"epHost": "127.0.0.1",
"epPort": 3101
}
],
"hasEKG": {
"epHost": "127.0.0.1",
"epPort": 3100
},
"hasPrometheus": {
"epHost": "127.0.0.1",
"epPort": 3000
Expand Down
137 changes: 100 additions & 37 deletions cardano-tracer/docs/cardano-tracer.md
Original file line number Diff line number Diff line change
Expand Up @@ -337,72 +337,135 @@ The fields `rpMaxAgeMinutes`, `rpMaxAgeHours` specify the lifetime of the log fi

## Prometheus

The optional field `hasPrometheus` specifies the host and port of the web page with metrics. For example:
At top-level route `/` Promtheus gives a list of connected nodes.

The responses are either human-readable names (HTML) with clickable
links, or JSON mapping from connected node names to relative URLs,
depending on desired content type (`Accept:` header of the request).

The routes dynamically depend on the connected nodes, the node names
are [sluggified](https://hackage.haskell.org/package/slugify).

The optional field `hasPrometheus` specifies the host and port of the
web page with Prometheus metrics. For example:

```
"hasPrometheus": {
"epHost": "127.0.0.1",
"epPort": 3000
"epPort": 3200
}
```

Here the web page is available at `http://127.0.0.1:3000`. Please note that if you skip this field, the web page will not be available.
With this example, the list of clickable identifiers of connected
nodes will be available at `http://127.0.0.1:3200`, such as:

```
* 127.0.0.1:30004
* 127.0.0.1:30001
* 127.0.0.1:30005
* 127.0.0.1:30000
* 127.0.0.1:30003
* 127.0.0.1:30002
* TxGenerator
```

Clicking an identifier will take you to its monitoring page. For
example clicking on `127.0.0.1:30004` displays the monitoring metrics
at `http://localhost:3200/12700130004`.

After you open `http://127.0.0.1:3000` in your browser, you will see the list of identifiers of connected nodes (or the warning message, if there are no connected nodes yet), for example:
Sending a HTTP GET request with a JSON Accept header gives the metrics
of the top-level route, or identifier as JSON. `jq '.'` pretty-prints
the JSON object.

```
* tmp-forwarder.sock@0
* tmp-forwarder.sock@1
* tmp-forwarder.sock@2
$ curl --silent -H "Accept: application/json" '127.0.0.1:3200' | jq '.'
{
"127.0.0.1:30000": "/12700130000",
"127.0.0.1:30001": "/12700130001",
"127.0.0.1:30002": "/12700130002",
"127.0.0.1:30003": "/12700130003",
"127.0.0.1:30004": "/12700130004",
"127.0.0.1:30005": "/12700130005",
"TxGenerator": "/txgenerator"
}
```

Each identifier is a hyperlink to the page where you will see the **current** list of metrics received from the corresponding node, in such a format:
The Promethus output is a map from Prometheus metric to value:

```
$ curl '127.0.0.1:3200/12700130004'
blockNum_int 35
rts_gc_init_cpu_ms 5
rts_gc_par_tot_bytes_copied 0
rts_gc_num_gcs 2
rts_gc_max_bytes_slop 15880
rts_gc_num_bytes_usage_samples 1
rts_gc_wall_ms 4005
...
rts_gc_par_max_bytes_copied 0
rts_gc_mutator_cpu_ms 57
rts_gc_mutator_wall_ms 4004
rts_gc_gc_cpu_ms 1
rts_gc_cumulative_bytes_used 184824
served_block_counter 31
submissions_accepted_counter 2771
density_real 5.7692307692307696e-2
blocksForged_int 6
```

## EKG Monitoring

The optional field `hasEKG` specifies the hosts and ports of two web pages:
At top-level route `/` EKG gives a list of connected nodes.

The responses are either human-readable names (HTML) with clickable
links, or JSON mapping from connected node names to relative URLs,
depending on desired content type (`Accept:` header of the request).

1. the list of identifiers of connected nodes,
2. EKG monitoring page.
The routes dynamically depend on the connected nodes, the node names
are [sluggified](https://hackage.haskell.org/package/slugify).

For example, if you use JSON configuration file:
The optional field `hasEKG` specifies the host and port of the web
page with EKG metrics. For example:

```
"hasEKG": [
{
"epHost": "127.0.0.1",
"epPort": 3100
},
{
"epHost": "127.0.0.1",
"epPort": 3101
}
]
"hasEKG": {
"epHost": "127.0.0.1",
"epPort": 3100
}
```

The page with the list of identifiers of connected nodes will be available at `http://127.0.0.1:3100`, for example:
With this example, the list of clickable identifiers of connected
nodes will be available at `http://127.0.0.1:3100`, such as:

```
* tmp-forwarder.sock@0
* tmp-forwarder.sock@1
* tmp-forwarder.sock@2
* 127.0.0.1:30004
* 127.0.0.1:30001
* 127.0.0.1:30005
* 127.0.0.1:30000
* 127.0.0.1:30003
* 127.0.0.1:30002
* TxGenerator
```

Each identifier is a hyperlink, after clicking to it you will be redirected to `http://127.0.0.1:3101` where you will see EKG monitoring page for corresponding node.
Clicking an identifier will take you to its monitoring page. For
example clicking on `127.0.0.1:30004` displays the monitoring metrics
at `http://localhost:3100/12700130004`.

Sending a HTTP GET request with a JSON Accept header gives the metrics
of an identifier as JSON. `jq '.'` pretty-prints the JSON object.

```
$ curl --silent -H 'Accept: application/json' '127.0.0.1:3100/12700130004' | jq '.'
{
"ChainSync": {
"HeadersServed_counter": {
"type": "c",
"val": 24
}
},
"Mem": {
"resident_int": {
"type": "g",
"val": 91877376
}
},
"RTS": {
"alloc_int": {
"type": "g",
"val": 1014189896
},
```

## Verbosity

Expand Down
26 changes: 20 additions & 6 deletions cardano-tracer/src/Cardano/Tracer/Acceptors/Utils.hs
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
{-# LANGUAGE NamedFieldPuns #-}
#if RTVIEW
{-# LANGUAGE OverloadedStrings #-}
#endif
{-# LANGUAGE TupleSections #-}

module Cardano.Tracer.Acceptors.Utils
( prepareDataPointRequestor
Expand All @@ -26,6 +25,7 @@ import Control.Concurrent.STM.TVar (TVar, modifyTVar', newTVarIO)
import qualified Data.Bimap as BM
import qualified Data.Map.Strict as M
import qualified Data.Set as S
import Data.Time.Clock.POSIX (getPOSIXTime)
#if RTVIEW
import Data.Time.Clock.System (getSystemTime, systemToUTCTime)
#endif
Expand All @@ -51,12 +51,26 @@ prepareMetricsStores
-> IO (EKG.Store, TVar MetricsLocalStore)
prepareMetricsStores TracerEnv{teConnectedNodes, teAcceptedMetrics} connId = do
addConnectedNode teConnectedNodes connId
storesForNewNode <- (,) <$> EKG.newStore
<*> newTVarIO emptyMetricsLocalStore
atomically $
modifyTVar' teAcceptedMetrics $ M.insert (connIdToNodeId connId) storesForNewNode
store <- EKG.newStore

EKG.registerCounter "ekg.server_timestamp_ms" getTimeMs store
storesForNewNode <- (store ,) <$> newTVarIO emptyMetricsLocalStore

atomically do
modifyTVar' teAcceptedMetrics do
M.insert (connIdToNodeId connId) storesForNewNode

return storesForNewNode

where
-- forkServer definition of `getTimeMs'. The ekg frontend relies
-- on the "ekg.server_timestamp_ms" metric being in every
-- store. While forkServer adds that that automatically we must
-- manually add it.
-- url
-- + https://github.com/tvh/ekg-wai/blob/master/System/Remote/Monitoring/Wai.hs#L237-L238
getTimeMs = (round . (* 1000)) `fmap` getPOSIXTime

addConnectedNode
:: ConnectedNodes
-> ConnectionId LocalAddress
Expand Down
Loading

0 comments on commit ea98b81

Please sign in to comment.