Skip to content

watch over our db to ensure up-to-date data

Notifications You must be signed in to change notification settings

Clever/analytics-monitor

Repository files navigation

analytics-monitor

diagnoses latency-related metrics issues

Overview

analytics-monitor (AM) runs as a worker that connects to a Postgres db and performs the following functions:

  • Surfaces data latency by querying all tables for the latest data timestamp. A recent timestamp means SOME data is fresh; an old timestamp means NO data is fresh.
  • Delivers actionable alerts by posting a message in slack whenever the data latency exceeds a configurable per-table threshold (ex. 2h).
  • Easily improves coverage by declaring latency checks and thresholds in a config file. No application code is required to add additional checks or alerts for future tables.
  • Default values for latency and timestamp column can be set per schema. This means that if new tables are added to a schema already checked by APM, then APM will latency check those tables using the default values without requiring any config changes. If the new table has characteristics differing from the schema defaults, then they can be overriden by making a config change (shown below).

Declaring New Latency Checks

Defining checks in analytics-monitor can be accomplished by adding a new entry to config/example_config.json in the following format:

  "postgres-checks": [
    {
      "schema": "mongo",
      "default_threshold": "24h",
      "default_timestamp_column": "_data_timestamp",
      "omit_tables": ["billing_03_31_snapshot"],
      "checks": [
        {
          "table": "districts",
          "latency": {
            "timestamp_column": "_data_timestamp",
            "threshold": "2h",
          }
        },
      ]
    }, ...
  ]

analytics-monitor then reads from this config to perform latency checks. schema + table identifies the table, and latency.timestamp_column identifies the time a row enters Redshift. latency.threshold configures the maximum amount of latency acceptable for the table's data in Go time format. If the threshold is exceeded, then analytics-monitor fires an alert in SignalFx.

For tables that are not explicitly declared in the config, default_threshold and default_timestamp_column will be used as substitutes for the above values. omit_tables allows tables to be whitelisted from latency checks.