You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently reviewed an issue where the data in kafka was encoded in ISO8859-1 and we could not correctly decode it using charset => "ISO8859-1" in the codec.
it appears that when using the org.apache.kafka.common.serialization.StringDeserializer (the default) the kafka lib will assume UTF-8 data resulting in receiving incorrectly encoded strings in the kafka input.
String encoding defaults to UTF8 and can be customized by setting the property key.deserializer.encoding, value.deserializer.encoding or deserializer.encoding. The first two take precedence over the last.
I believe (not tested) that setting the property value.deserializer.encoding to ISO8859 would have worked.
OTOH, by using the org.apache.kafka.common.serialization.ByteArrayDeserializer and setting charset => "ISO8859-1" worked correctly.
This leads me to think that we should probably use the ByteArrayDeserializer by default if we want that to be compatible by default with our codecs + charset conversion.
In any case we should also have a note about this in the docs.
The text was updated successfully, but these errors were encountered:
I recently reviewed an issue where the data in kafka was encoded in
ISO8859-1
and we could not correctly decode it usingcharset => "ISO8859-1"
in the codec.it appears that when using the
org.apache.kafka.common.serialization.StringDeserializer
(the default) the kafka lib will assume UTF-8 data resulting in receiving incorrectly encoded strings in the kafka input.Per the kafka docs https://kafka.apache.org/10/javadoc/org/apache/kafka/common/serialization/StringDeserializer.html
value.deserializer.encoding
toISO8859
would have worked.org.apache.kafka.common.serialization.ByteArrayDeserializer
and settingcharset => "ISO8859-1"
worked correctly.This leads me to think that we should probably use the
ByteArrayDeserializer
by default if we want that to be compatible by default with our codecs + charset conversion.In any case we should also have a note about this in the docs.
The text was updated successfully, but these errors were encountered: