Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INLONG-11201][Sort] Enhanced sink metric instrumentation for Flink StarRocks Connector #11206

Merged
merged 2 commits into from
Oct 9, 2024

Conversation

PeterZh6
Copy link
Contributor

Fixes [Feature][Sort] Enhanced sink metric instrumentation for InLong Sort Flink Connector #11201

Motivation

This PR aims to enhance sink metric instrumentation for the InLong Sort Flink Connector, particularly the StarRocks Connector. The enhancements are designed to improve observability by adding metrics that track serialization, snapshot states, and checkpoint completion.

Modifications

This feature focuses on Sink Metrics only

Serialization Metrics:

  • Added counters for successful and failed serialization attempts:
    • numSerializeSuccess
    • numSerializeError
  • Introduced a latency gauge to measure the time taken for serialization:
    • serializeTimeLag

Snapshot State Metrics:

  • Added counters for:
    • Number of snapshots created: numSnapshotCreate
    • Errors encountered during snapshot operations: numSnapshotError

NotifyComplete Metrics:

  • Introduced a counter for completed snapshots:
    • numSnapshotComplete
  • Added a latency gauge to measure the time from snapshot creation attempt to completion:
    • snapshotToCheckpointTimeLag

Verifying this change

(Please pick either of the following options)

  • This change is a trivial rework/code cleanup without any test coverage.

  • This change is already covered by existing tests, such as:
    (please describe tests)

  • This change added tests and can be verified as follows:
    sort-end-to-end-tests-v1.15

The result is shown in the screenshot, with the metrics marked in red

Click To View Image starrocks_metric_demo

Preparation:
To enable self-defined metrics, one has to add inlong.metric.labels in the inlong-sort/sort-end-to-end-tests/sort-end-to-end-tests-v1.15/src/test/resources/flinkSql
Method 1:
Add while(true){} code snippet to the end of any test under sort-end-to-end-test-v1.15 that involves starrocks connnector. For example, in Postgres2StarRocksTest, simply remove check result code and add a while loop at the end. And then visit localhost:8081, which is Flink Web Dashboard, and find Metrics column.

@Test
public void testPostgresUpdateAndDelete() throws Exception {
    // test logic omitted...

    // Infinite loop to prevent container teardown
    while (true) {}
   // result checking part is unnecessary here
}

Method 2:

  • Configure Flink taskmanager to report metrics using the Slf4jReporter. Add the following to conf/flink-conf.yaml:
metrics.reporter.slf4j.class: org.apache.flink.metrics.slf4j.Slf4jReporter 
metrics.reporter.slf4j.interval: 15 SECONDS

Documentation

  • Does this pull request introduce a new feature? Yes
  • If yes, how is the feature documented? JavaDocs

@PeterZh6 PeterZh6 changed the title [INLONG11201][Sort] Enhanced sink metric instrumentation for Flink StarRocks Connector [INLONG-11201][Sort] Enhanced sink metric instrumentation for Flink StarRocks Connector Sep 26, 2024
@github-actions github-actions bot added the service/ci Automatically confirm that the code is error-free label Sep 29, 2024
@aloyszhang aloyszhang merged commit 14463d1 into apache:master Oct 9, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/sort service/ci Automatically confirm that the code is error-free
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature][Sort] Enhanced sink metric instrumentation for InLong Sort Flink Connector
3 participants