Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash throws "Incompatible Encodings" error when querying NVARCHAR-Fields from MSSQL-Server #14679

Open
s137 opened this issue Oct 21, 2022 · 14 comments

Comments

@s137
Copy link

s137 commented Oct 21, 2022

Logstash information:

  1. Logstash version: 8.4.3 (i checked the issue came in from v8.3.3 to v8.4.0)
  2. Logstash installation source: expanded from tar or zip archive
  3. How is Logstash being run: via commandline or as a windows service

Plugins installed: no extra plugins were installed

JVM (e.g. java -version): Bundled JDK:
openjdk 17.0.4 2022-07-19
OpenJDK Runtime Environment Temurin-17.0.4+8 (build 17.0.4+8)
OpenJDK 64-Bit Server VM Temurin-17.0.4+8 (build 17.0.4+8, mixed mode, sharing)
-> but also tested with:
openjdk 11.0.15 2022-04-19
OpenJDK Runtime Environment Temurin-11.0.15+10 (build 11.0.15+10)
OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (build 11.0.15+10, mixed mode)

OS version: Windows 10

Description of the problem including expected versus actual behavior:

If I query some some NVARCHAR-Fields from a Microsoft-SQL-Server (which in MSSQL-Server are always encoded in UTF-16) via logstash-jdbc-input plugin without specifying any special encoding or charset settings, neither in the input nor in the output logstash plugins, logstash failes to transfer the events to ElasticSearch by throwing this error over and over again for every document:

[2022-10-20T17:01:57,061][ERROR][logstash.outputs.elasticsearch][index_name][9648a8b8c103d11863b72d1b6d9624b2c3b8d672ae4baf73a17af87e6cc0c3e7] 
An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError, 
:backtrace=>[
    "org/jruby/ext/stringio/StringIO.java:1162:in `write'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in `block in bulk'", 
    "org/jruby/RubyArray.java:1865:in `each'", 
    "org/jruby/RubyEnumerable.java:1143:in `each_with_index'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in `bulk'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in `safe_bulk'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in `submit'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in `retrying_submit'", 
    "C:/Program Files/ElasticSearch/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in `multi_receive'", 
    "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in `multi_receive'", "C:/Program Files/ElasticSearch/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in `block in start_workers'"]}

This worked fine up until Version 8.3.3 of Logstash, since Version 8.4.0 it doesn't work anymore.
I also tried specifying the Encoding as UTF-16 with jdbc-input-plugins columns_charset option, but this doesn't affect the behaviour of logstash at all.

Steps to reproduce:

  1. Create a pipeline with the following input.conf (you have to change the Connection String to any Microsoft SQL-Server of course):
input {
  jdbc {
    jdbc_driver_library => "C:\\ProgramData\\ElasticSearch\\logstash\\drivers\\mssql-jdbc-10.2.0.jre8.jar"
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_connection_string => "jdbc:sqlserver://server\instance;databasename=database;trustServerCertificate=true"
    jdbc_default_timezone => "Europe/Berlin"
    jdbc_user => "user"
    jdbc_password => "pw"
    schedule => "*/5 6-19 * * *"
    statement_filepath => "C:\\ProgramData\\ElasticSearch\\logstash\\pipelines\\index_name\\queries\\sqlQuery.sql"
    clean_run => false
    use_column_value => true
    tracking_column => "editdate"
    tracking_column_type => "timestamp"
    last_run_metadata_path => "C:\\ProgramData\\ElasticSearch\\logstash\\pipelines\\index_name\\.logstash_jdbc_last_run"
  }
}
  1. In the "sqlQuery.sql" just write any SQL-Query that queries at least one NVARCHAR-Field from any table.
  2. Create the following output.conf for the pipeline (with different user/pw of course):
output {
  elasticsearch {
    hosts => [ "http://localhost:9200" ]
    index => "index_name"
    document_id => "%{document_id}"
    action => "update"
    doc_as_upsert => true
    data_stream => "false"
    user => "elastic"
    password => "pw"
  }
}
  1. Run logstash either manually from the commandline or as a windows service and you'll get the above error.

(Oddly enough if you run the logstash.bat from the commandline and redirect stdout (and/or) stderr to a file, it works perfectly without any errors and indexes everything as it should. I have no idea how it is possible though, that output redirection affects the behaviour of logstash here, to be honest it just makes no sense.)

@s137
Copy link
Author

s137 commented Oct 21, 2022

I'm pretty sure, this PullRequest might have been the breaking change here: #13523

@stevedearl
Copy link

Just tested and confirmed that this issue still exists in Logstash 8.5.0.

@hmoratopcs
Copy link

hmoratopcs commented Nov 24, 2022

Tested in logstash 8.5.1 on Windows 10, with a Filebeat(filestream)-->Logstash-->ElasticSearch pipeline. It seems unrelated to SQL Server or jdbc.

Filebeat config:

filebeat.inputs:
- type: filestream
  id: my-filestream-id
  enabled: true
  paths:
    - ..\example.log
  encoding: utf-8

Logstash config:

input {
  beats {
    port => 5044
  }
}

output {
  elasticsearch {
    hosts => "https://localhost:9200"
    user => elastic
    password => "..."
    ssl_certificate_verification => false
  }

  file {
    path => "out.txt"
  }
}

The file example.log is UTF-8 encoded (confirmed using an hex editor).

If I run (from a PowerShell console inside VS Code) logstash.bat, the output "out.txt" is written with UTF-8 encoding, but the POST to the ElasticSearch _bulk endpoint is encoded with Windows-1252 (checked with Telerik Fiddler's HexView) (Windows-1252 is my system's encoding, checked with [System.Text.Encoding]::Default), and ElasticSearch returns an "Invalid UTF-8 start byte" error.

If I run (from the same console) logstash > log.txt, the POST to ElasticSearch is encoded with UTF-8 and accepted by ElasticSearch.

@TonySoderbergRMT
Copy link

Hi! We are experience the same bug with Logstash 8.5.2 running with the logstash-jdbc-input plugin. Downgrading to 8.3.3 fixed the issue without changing the configuration.

@stevedearl
Copy link

Any idea when this issue might be resolved? We stuck on Logstash 8.3.x until it is I think.

@fuadbagus
Copy link

fuadbagus commented Feb 12, 2023

This is still occurring on logstash 8.5.1. Tested on windows server 2019 with logstash 8.5.1, mssql-jdbc-12.2.0.jre11.jar driver and java 17 that ships with logstash 8.5.1. Any updates on this issue ?

This is the error in logstash logs, when running the jdbc logstash input.

An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: CP850 and UTF-8", :exception=>Encoding::CompatibilityError

@arunv84
Copy link

arunv84 commented Feb 22, 2023

I recently upgraded to 8.6.1 and started facing the very same issue described in this thread.

An unknown error occurred sending a bulk request to Elasticsearch (will retry indefinitely) {:message=>"incompatible encodings: IBM437 and UTF-8", :exception=>Encoding::CompatibilityError, :backtrace=>["org/jruby/ext/stringio/StringIO.java:1162:in write'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:142:in block in bulk'", "org/jruby/RubyArray.java:1865:in each'", "org/jruby/RubyEnumerable.java:1143:in each_with_index'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch/http_client.rb:125:in bulk'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:296:in safe_bulk'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:228:in submit'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/plugin_mixins/elasticsearch/common.rb:177:in retrying_submit'", "C:/logstash/vendor/bundle/jruby/2.6.0/gems/logstash-output-elasticsearch-11.6.0-java/lib/logstash/outputs/elasticsearch.rb:342:in multi_receive'", "org/logstash/config/ir/compiler/AbstractOutputDelegatorExt.java:121:in multi_receive'", "C:/logstash/logstash-core/lib/logstash/java_pipeline.rb:300:in block in start_workers'"]}`

@stevedearl
Copy link

Hi - any update on when this bug is going to be patched. I see that Elastic 8.7 has been released recently but unless this bug is patched I'll need to keep our servers on the older Logstash 8.3.x version.

thanks,
Steve

@s137
Copy link
Author

s137 commented Oct 23, 2023

Any updates on this? This bug is preventing me and other users from updating to newer versions of logstash and therefore Elasticsearch until it is resolved. The Pullrequest #13523 most likely caused this to not work anymore. @andsel Maybe you can take a look at this one? I would really appreciate it. Thanks in advance.

@s137
Copy link
Author

s137 commented Jan 22, 2024

Any updates on this? I'd really appreciate if you could maybe take a quick look at this. @andsel
Thanks in advance! I'm also happy to provide more information if neccessary.

@stevedearl
Copy link

Still hoping for an update on this issue, which as far as I'm aware has never been resolved. I (and probably a bunch of others) are still stuck on Logstash 8.3.3 and not able to upgrade to anything more recent.

Thanks in advance...

@mashhurs
Copy link
Contributor

mashhurs commented Apr 4, 2024

The error is coming from logstash-output-elasticsearch-v11.6.0 when creating a body stream, stream_writer.write(as_json) to send to Elasticsearch. Elasticsearch always expects the payload(s) as an UTF-8 format and in this case as_json payload is non-UTF-8.
We have recently improved handling invalid UTF-8 cases. Any invalid UTF-8 bytes will be replaced by a replacement char \uFFFD.
The change is included in v11.22.3 plugin version or it is default in Logstash 8.13+ versions.
The plugin can be installed without upgrading Logstash core with bin/logstash-plugin update logstash-output-elasticsearch command.
P.S: I haven't test it but if it brings any Logstash core API compatibility issue, you may need to upgrade the Logstash core.

Please try and let us know.

@mashhurs mashhurs self-assigned this Apr 4, 2024
@stevedearl
Copy link

Hi @mashhurs ,

I can confirm that this issue appears to be resolved in Logstash 8.13.1. Thanks very much for the update, and apologies for the delay getting back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants