You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when using HTTP compression (with compression_level > 0), the event get rejected if it has invalid non UTF-8 byte sequences;
when non using HTTP compression (with compression_level = 0), the event get accepted even though it has invalid non UTF-8 byte sequences. The reason behind, manticore HTTP client under the hood replaces them (1-byte with 3-bytes, 2 extra bytes appear) when it uses the apache StringEntity
Logstash information:
Please include the following information:
Logstash version (e.g. bin/logstash --version) - any, including main (v8.14) branch, es-output-v11.22.2
Logstash installation source (e.g. built from source, with a package manager: DEB/RPM, expanded from tar or zip archive, docker) - any, including main (v8.14) branch, es-output-v11.22.2
How is Logstash being run (e.g. as a service/service manager: systemd, upstart, etc. Via command line, docker/kubernetes)
How was the Logstash Plugin installed - default, current es-output-v11.22.2
JVM (e.g. java -version):
If the affected version of Logstash is 7.9 (or earlier), or if it is NOT using the bundled JDK or using the 'no-jdk' version in 7.10 (or higher), please provide the following information:
JVM version (java -version)
JVM installation source (e.g. from the Operating System's package manager, from source, etc).
Value of the JAVA_HOME environment variable if set.
OS version (uname -a if on a Unix-like system):
Description of the problem including expected versus actual behavior:
Steps to reproduce:
Please include a minimal but complete recreation of the problem,
including (e.g.) pipeline definition(s), settings, locale, etc. The easier
you make for us to reproduce it, the more likely that somebody will take the
time to look at it.
Use following pipeline config, save as encoding_test.conf in config folder
Run with HTTP compression enabled with HTTP_COMPRESSION=true bin/logstash -f config/encoding_test.conf and observe that ES rejects the event because of invalid UTF-8 payload
Run with HTTP compression enabled with HTTP_COMPRESSION=false bin/logstash -f config/encoding_test.conf and observe that ES indexes the event without issue.
Provide logs (if relevant):
# HTTP_COMPRESSION=true bin/logstash -f config/encoding_test.conf --enable-local-plugin-development
[2024-03-15T15:22:19,117][DEBUG][org.apache.http.impl.conn.PoolingHttpClientConnectionManager][main][999000c22ac1744372923039d3bee405a92df01b3dafcd64f0830a24ad60acc6] Connection released: [id: 0][route: {s}->https://host.elastic-cloud.com:443][total available: 1; route allocated: 1 of 100; total allocated: 1 of 1000]
[2024-03-15T15:22:19,119][ERROR][logstash.outputs.elasticsearch][main][999000c22ac1744372923039d3bee405a92df01b3dafcd64f0830a24ad60acc6] Encountered a retryable error (will retry with exponential backoff) {:code=>400, :url=>"https://host.elastic-cloud.com:443/_bulk?filter_path=errors,items.*.error,items.*.status", :content_length=>248, :body=>"{\"error\":{\"root_cause\":[{\"type\":\"parse_exception\",\"reason\":\"Failed to parse content to type\"}],\"type\":\"parse_exception\",\"reason\":\"Failed to parse content to type\",\"caused_by\":{\"type\":\"json_parse_exception\",\"reason\":\"Invalid UTF-8 start byte 0xac\\n at [Source: (byte[])\\\"{\\\"@version\\\":\\\"1\\\",\\\"host\\\":{\\\"name\\\":\\\"MacBook-Pro.local\\\"},\\\"@timestamp\\\":\\\"2024-03-15T22:22:18.892422Z\\\",\\\"message\\\":\\\"�\\\",\\\"event\\\":{\\\"original\\\":\\\"Hello world!\\\",\\\"sequence\\\":0},\\\"data_stream\\\":{\\\"type\\\":\\\"logs\\\",\\\"dataset\\\":\\\"generic\\\",\\\"namespace\\\":\\\"default\\\"}}\\\"; line: 1, column: 117]\"}},\"status\":400}"}
# HTTP_COMPRESSION=false bin/logstash -f config/encoding_test.conf
{
"host" => {
"name" => "MacBook-Pro.local"
},
"event" => {
"original" => "Hello world!",
"sequence" => 0
},
"@version" => "1",
"message" => "\xAC",
"@timestamp" => 2024-03-15T22:27:03.706976Z
}
Acceptance Criteria
Regardless of HTTP compression mode, the behaviour should stay same, either reject or accept. The possible better option would be considering the acceptance as it may provide benefits in many ways to users. However, filtering out of invalid byte sequence would be a bit dangerous.
The text was updated successfully, but these errors were encountered:
Description
Current buggy behaviours:
compression_level > 0
), the event get rejected if it has invalid non UTF-8 byte sequences;compression_level = 0
), the event get accepted even though it has invalid non UTF-8 byte sequences. The reason behind,manticore
HTTP client under the hood replaces them (1-byte with 3-bytes, 2 extra bytes appear) when it uses the apacheStringEntity
Logstash information:
Please include the following information:
bin/logstash --version
) - any, includingmain
(v8.14) branch,es-output-v11.22.2
main
(v8.14) branch,es-output-v11.22.2
es-output-v11.22.2
JVM (e.g.
java -version
):If the affected version of Logstash is 7.9 (or earlier), or if it is NOT using the bundled JDK or using the 'no-jdk' version in 7.10 (or higher), please provide the following information:
java -version
)JAVA_HOME
environment variable if set.OS version (
uname -a
if on a Unix-like system):Description of the problem including expected versus actual behavior:
Steps to reproduce:
Please include a minimal but complete recreation of the problem,
including (e.g.) pipeline definition(s), settings, locale, etc. The easier
you make for us to reproduce it, the more likely that somebody will take the
time to look at it.
encoding_test.conf
inconfig
folderHTTP_COMPRESSION=true bin/logstash -f config/encoding_test.conf
and observe that ES rejects the event because of invalid UTF-8 payloadHTTP_COMPRESSION=false bin/logstash -f config/encoding_test.conf
and observe that ES indexes the event without issue.Provide logs (if relevant):
Acceptance Criteria
Regardless of HTTP compression mode, the behaviour should stay same, either reject or accept. The possible better option would be considering the acceptance as it may provide benefits in many ways to users. However, filtering out of invalid byte sequence would be a bit dangerous.
The text was updated successfully, but these errors were encountered: