Backport PR #16482 to 8.15: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16579

github-actions · 2024-10-17T10:42:09Z

Backport PR #16482 to 8.15 branch, original message:

Release notes

[rn:skip]

What does this PR do?

Updates BufferedTokenizerExt so that can accumulate token fragments coming from different data segments. When a "buffer full" condition is matched, it record this state in a local field so that on next data segment it can consume all the token fragments till the next token delimiter.
Updated the accumulation variable from RubyArray containing strings to a StringBuilder which contains the head token, plus the remaining token fragments are stored in the input array.
Port the tests present at

logstash/logstash-core/spec/logstash/util/buftok_spec.rb

Line 20 in f35e10d

describe FileWatch::BufferedTokenizer do

in Java.

Why is it important/What is the impact to the user?

Fixes the behaviour of the tokenizer to be able to work properly when buffer full conditions are met.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files (and/or docker env variables)~~
I have added tests that prove my fix is effective or that my feature works

Author's Checklist

test as described in BufferedTokenizer doesn't dice correctly the payload when restart processing after buffer full error #16483

How to test this PR locally

Follow the instructions in #16483

Related issues

Closes BufferedTokenizer doesn't dice correctly the payload when restart processing after buffer full error #16483

Use cases

Screenshots

Logs

…ines bigger then sizeLimit (#16482) Fixes the behaviour of the tokenizer to be able to work properly when buffer full conditions are met. Updates BufferedTokenizerExt so that can accumulate token fragments coming from different data segments. When a "buffer full" condition is matched, it record this state in a local field so that on next data segment it can consume all the token fragments till the next token delimiter. Updated the accumulation variable from RubyArray containing strings to a StringBuilder which contains the head token, plus the remaining token fragments are stored in the input array. Furthermore it translates the `buftok_spec` tests into JUnit tests. (cherry picked from commit 85493ce)

elastic-sonarqube · 2024-10-17T10:57:23Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarQube

elasticmachine · 2024-10-17T11:07:49Z

💚 Build Succeeded

Buildkite Build
Commit: e1bc2ca

cc @andsel

…ase of lines bigger then sizeLimit (elastic#16482) (elastic#16579)" This reverts commit cef98e4.

…ase of lines bigger then sizeLimit (#16482) (#16579)" (#16687) This reverts commit cef98e4.

github-actions bot added backport v8.15.3 labels Oct 17, 2024

andsel self-assigned this Oct 17, 2024

andsel added v8.15.4 and removed v8.15.3 labels Oct 17, 2024

jsvd added the status:needs-triage label Oct 17, 2024

edmocosta merged commit cef98e4 into 8.15 Oct 18, 2024
5 checks passed

jsvd removed the status:needs-triage label Oct 18, 2024

yaauie added a commit to yaauie/logstash that referenced this pull request Nov 18, 2024

Revert "Bugfix for BufferedTokenizer to completely consume lines in c…

96f0e32

…ase of lines bigger then sizeLimit (elastic#16482) (elastic#16579)" This reverts commit cef98e4.

donoghuc pushed a commit that referenced this pull request Nov 19, 2024

Revert "Bugfix for BufferedTokenizer to completely consume lines in c…

2200f16

…ase of lines bigger then sizeLimit (#16482) (#16579)" (#16687) This reverts commit cef98e4.

donoghuc deleted the backport_16482_8.15 branch November 21, 2024 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport PR #16482 to 8.15: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16579

Backport PR #16482 to 8.15: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16579

github-actions bot commented Oct 17, 2024

elastic-sonarqube bot commented Oct 17, 2024

elasticmachine commented Oct 17, 2024

Backport PR #16482 to 8.15: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16579

Backport PR #16482 to 8.15: Bugfix for BufferedTokenizer to completely consume lines in case of lines bigger then sizeLimit #16579

Conversation

github-actions bot commented Oct 17, 2024

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

elastic-sonarqube bot commented Oct 17, 2024

Quality Gate passed

elasticmachine commented Oct 17, 2024

💚 Build Succeeded