Skip to content

Troubleshooting

Bob Vawter edited this page Aug 23, 2023 · 19 revisions

Troubleshooting

Questions

  • Are there errors in the cdc-sink logs?
    • Using --logFormat fluent and --logDestination path/to/cdc-sink.log are a good choice for setting up log aggregation, especially if cdc-sink is being run as a replicated network service.
  • Is the source changefeed able to deliver data to cdc-sink?
    • Check the output of SHOW CHANGEFEED JOBS
    • Are there (retryable) error messages reported by the changefeed senders?
    • Is the resolved timestamp advancing?
  • Is cdc-sink actively staging data?
    • Look at the row counts for the staging tables in the _cdc_sink database.
  • Is cdc-sink receiving resolved timestamps?
    • Look at the row count of _cdc_sink.resolved_timestamps
    • Performing a transactionally-consistent backfill is not recommended, since this would require the initial state of the database to be applied in a single transaction. Use --immediate or --backfillWindow modes, or bootstrap the destination from a BACKUP or EXPORT.
  • Are resolved timestamps falling behind?
    • SELECT now() - MAX(target_applied_at) FROM _cdc_sink.resolved_timestamps to show when a resolved window was last processed.

Actions

Internal diagnostic endpoint

( This section is contingent on PR #440 )

cdc-sink provides introspection of its internal datastructures though a diagnostic endpoint at /_/diag. This endpoint will return a JSON blob describing many of cdc-sink's internal datastructure. The payload main contain sensitive information; if cdc-sink authentication is enabled, the requestor must have access permissions to a schema named _.diag in order to make the request.

This same information can be sent to cdc-sink's logger by sending a SIGUSR1 to the cdc-sink process.

This data is for support purposes only and does not constitute a stable API.

Reset cdc-sink

  • Cancel all source changefeeds
  • DROP and re-CREATE the _cdc_sink database.

View resolver loop status

( This section will be obviated by PR #440)

The logical-loop resolver is responsible for processing resolved timestamps (i.e. moving data from staging to destination tables). Multiple instance of cdc-sink will use the _cdc_sink.leases table to ensure that only a single instance is actively processing timestamps.

The _cdc_sink.memo table will contain a snapshot of the resolver's internal state, used for resuming in the event of a crash or reschedule. SELECT * FROM memo WHERE key LIKE 'changefeed-%' and decode the value using pbpaste | xxd -r -p -.

Clone this wiki locally