Task threw an uncaught and unrecoverable exception/NullPointerException #30

mbrettschneider · 2019-03-22T14:13:37Z

I tried to start the connector with

./kafka_2.11-2.1.1/bin/connect-standalone.sh worker.properties kinesis-streams-kafka-connector.properties

Unfortunately after some promising messages:

[logging.cc:170] Set AWS Log Level to WARN (com.amazonaws.services.kinesis.producer.LogInputStreamReader:59)
[main.cc:346] Setting CA path to /tmp/amazon-kinesis-producer-native-binaries/cacerts (com.amazonaws.services.kinesis.producer.LogInputStreamReader:59)
[main.cc:382] Starting up main producer (com.amazonaws.services.kinesis.producer.LogInputStreamReader:59)
[kinesis_producer.cc:101] Using Region: eu-central-1 (com.amazonaws.services.kinesis.producer.LogInputStreamReader:59)
[kinesis_producer.cc:120] Using per request threading model. (com.amazonaws.services.kinesis.producer.LogInputStreamReader:59)
[kinesis_producer.cc:52] Using default Kinesis endpoint (com.amazonaws.services.kinesis.producer.LogInputStreamReader:59)
[kinesis_producer.cc:101] Using Region: eu-central-1 (com.amazonaws.services.kinesis.producer.LogInputStreamReader:59)
[kinesis_producer.cc:120] Using per request threading model. (com.amazonaws.services.kinesis.producer.LogInputStreamReader:59)
[kinesis_producer.cc:52] Using default CloudWatch endpoint (com.amazonaws.services.kinesis.producer.LogInputStreamReader:59)
[main.cc:393] Entering join (com.amazonaws.services.kinesis.producer.LogInputStreamReader:59)

I get the following messages:

[2019-03-12 16:54:55,833] ERROR WorkerSinkTask{id=KinesisKafkaPartnerRankConnector-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
java.lang.NullPointerException
	at com.amazon.kinesis.kafka.AmazonKinesisSinkTask.close(AmazonKinesisSinkTask.java:283)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.commitOffsets(WorkerSinkTask.java:398)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.closePartitions(WorkerSinkTask.java:617)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:198)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[2019-03-12 16:54:55,837] ERROR WorkerSinkTask{id=KinesisKafkaPartnerRankConnector-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:178)

What can I do to make it work?

The text was updated successfully, but these errors were encountered:

zhiyanliu · 2020-09-29T12:36:09Z

I met the same case.

After some dig, I can see rebalancing might cause the the topic partitions param of close() [0] contains more items than the opened producer map [1] in the sink task.

I think the root cause is:

TL;DR, at [2] the currentOffsets is used to call task.close() however it can be updated at [3] where is before to open the partition and prepare the producer for the sink task at [4], this finally causes producerMap in the sink task out-of-sync with the currentOffsets in the sink task work.

In the detail, during the consumer polls the message from the broker at [5], the rebalanceException can happen at [6] and can happen more than one time, however doesn't like [3] openPartitions() will be called only once at first time.

I can see an issue report about kafka-connect-storage-cloud cased by a similar out-of-sync case at [7].

The solution:

Instead of just to handle all the the local producer/partition items but ignore the param partitions (topicPartitionWriters at [8]), I prefer to do an easy check like [9], will verify it.

[0] https://github.com/awslabs/kinesis-kafka-connector/blob/master/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java#L280
[1] https://github.com/awslabs/kinesis-kafka-connector/blob/master/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java#L275
[2] https://github.com/apache/kafka/blob/2.3.1/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L397
[3] https://github.com/apache/kafka/blob/2.3.1/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L630
[4] https://github.com/apache/kafka/blob/2.3.1/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L652
[5] https://github.com/apache/kafka/blob/2.3.1/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L447
[6] https://github.com/apache/kafka/blob/2.3.1/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L650
[7] https://github.com/confluentinc/kafka-connect-storage-cloud/pull/322/files
[8] https://github.com/confluentinc/kafka-connect-storage-cloud/pull/322/files#diff-16b6be2d931b0825d79f3b4c517327b4R225
[9] https://github.com/zhiyanliu/kinesis-kafka-connector/blob/master/src/main/java/com/amazon/kinesis/kafka/AmazonKinesisSinkTask.java#L284

bdesert · 2020-10-01T18:36:26Z

@zhiyanliu what would be the fastest way to reproduce the issue? Just make it rebalancing all the time until async happens? Topic repartitioning?

zhiyanliu · 2020-10-02T02:48:53Z

@bdesert I think so at the moment, trigger rebalancing continually until it happens. According to the 3 points listed in the doc [0], beside topic repartitioning you can also trigger it by:

a consumer join the group
a consumer shutdown
a consumer is considered dead by the group coordinator

For me, we met this in a dedicated environment I can't touch, I didn't try to reproduce it locally and analyzed it by static code logic.

[0] https://www.confluent.io/blog/kafka-rebalance-protocol-static-membership/

Fix awslabs#30

bdesert · 2022-10-12T22:49:56Z

@akhon, can you confirm from your side the fix is working as expected?

mbrettschneider · 2024-05-13T13:56:08Z

Not relevant anymore.

zhiyanliu added a commit to zhiyanliu/kinesis-kafka-connector that referenced this issue Oct 10, 2020

BUG: don't close the producer for the out-of-sync partitions

181cc90

Fix awslabs#30

zhiyanliu linked a pull request Oct 10, 2020 that will close this issue

BUG: don't close the producer for the out-of-sync partitions #47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task threw an uncaught and unrecoverable exception/NullPointerException #30

Task threw an uncaught and unrecoverable exception/NullPointerException #30

mbrettschneider commented Mar 22, 2019 •

edited

Loading

zhiyanliu commented Sep 29, 2020 •

edited

Loading

bdesert commented Oct 1, 2020

zhiyanliu commented Oct 2, 2020

bdesert commented Oct 12, 2022

mbrettschneider commented May 13, 2024

Task threw an uncaught and unrecoverable exception/NullPointerException #30

Task threw an uncaught and unrecoverable exception/NullPointerException #30

Comments

mbrettschneider commented Mar 22, 2019 • edited Loading

zhiyanliu commented Sep 29, 2020 • edited Loading

bdesert commented Oct 1, 2020

zhiyanliu commented Oct 2, 2020

bdesert commented Oct 12, 2022

mbrettschneider commented May 13, 2024

mbrettschneider commented Mar 22, 2019 •

edited

Loading

zhiyanliu commented Sep 29, 2020 •

edited

Loading