Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set replication factor for kafka stability #1606

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

fedeabih
Copy link

Description

This change resolves the issue Failed to get watermark offsets: Local: Unknown partition. The root cause was related to the Kafka replication configuration. By setting the replicationFactor to 3 (matching the number of Kafka brokers/controllers), this fix ensures consistent behavior when retrieving high watermark offsets. This issue was reported in sentry-kubernetes/charts#1458.

Technical Explanation

The issue arises because the replicationFactor was previously set to 1, meaning that each partition only had a single replica. In this configuration, the high watermark offset—a key value in Kafka that indicates the maximum offset successfully replicated to all in-sync replicas (ISRs)—becomes unreliable.

Without sufficient replication, the loss of a single broker or temporary unavailability can result in Kafka being unable to compute or provide the high watermark for affected partitions. This leads to the error:
Failed to get watermark offsets: Local: Unknown partition.
By increasing the replicationFactor from 1 to 3, each partition is replicated across all three brokers/controllers. This ensures that the high watermark offset remains consistently available, even if a broker becomes unavailable or experiences minor instability. Additionally, the increased replication enhances fault tolerance and improves the overall availability of partition data across the cluster.

For more details on how Kafka replication works and the role of the high watermark, refer to the official documentation: Replication in Apache Kafka.

@fedeabih fedeabih mentioned this pull request Nov 22, 2024
1 task
@patsevanton
Copy link
Contributor

@Mokto Please review the changes.

Copy link
Contributor

@Mokto Mokto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it backward compatible?

@fedeabih
Copy link
Author

fedeabih commented Dec 2, 2024

Is it backward compatible?

Yes, the sentry-kafka-provisioning job creates topics using the --if-not-exists flag. This ensures that existing topics remain unaffected when the job runs; the configuration only applies to newly created topics:

                "/opt/bitnami/kafka/bin/kafka-topics.sh \
                    --create \
                    --if-not-exists \
                    --bootstrap-server ${KAFKA_SERVICE} \
                    --replication-factor 3 \
                    --partitions 1 \
                    --command-config ${CLIENT_CONF} \
                    --topic event-replacements"

There are two ways to apply the new configuration to an existing cluster:

  • Recreate Kafka configuration (losing unprocessed data, decide you if it is critical or not):

    1. Delete PVCs named data-sentry-kafka-controller-*.
    2. Delete pods named sentry-kafka-controller-*.
  • Alter each topic manually:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants