Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When datastream is enabled then no version related attributes fills into document actions #1161

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Nov 20, 2023

Release notes

Avoid to populate version and version type attributes when processing integration metadata and datastream is enabled

What does this PR do?

During PR #1155 the resolution of version and version_type ES parameters was moved from the index only event action tuple creation to the common method. This changed was due to do the intent to collect all integration-aware metadata fields in one place, but the common method is used also by the datastream part this result in populating event and event_type request parameters not only for normal index operations.
During an index operation on a datastream, if one of those parameters is valued, generated an error on ES indexing, resulting in request fail.

This PR move the processing and creation of event and event_type parameters, back in its original position, splitting the capture of integration metadata in 2 parts.

Why is it important/What is the impact to the user?

It's a bugfix that let the plugin to work properly on datastreams.

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • check with a real ES and integrations

How to test this PR locally

Follow the same plan as described in PR #1155

Related issues

… populate version and versiont_type fields for datastreams
@andsel andsel self-assigned this Nov 20, 2023
@andsel andsel added the bug label Nov 20, 2023
Copy link
Contributor

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the issue of incompatible fields being sent when data streams are enabled. In that sense, I am okay merging as-is.

I've left one note about trimming some commented-out code in the specs.


But this fix also reveals a bit of a gap in testability, since the specs need to rely on outside knowledge to test either event_action_tuple or data_stream_event_action_tuple, depending on whether we expect data streams to be enabled.

These specs currently do not set up data streams, but test the result of a method that would be called if data streams were enabled. The problem is that this diverges from testing plugin behaviour as it would exist in real life.

I would like to at least file for follow-up work to clean this up.

This could be resolved by introducing a new method like action_tuple(event) that uses the @event_mapper (which presently wraps either event_action_tuple or data_stream_event_action_tuple depending on other configuration):

  def action_tuple(event)
    @event_mapper.call(event)
  end

And then changing the current only call-site of @event_mapper.call to go through our new unified boundary:

diff --git a/lib/logstash/outputs/elasticsearch.rb b/lib/logstash/outputs/elasticsearch.rb
index 598265d..55e9a87 100644
--- a/lib/logstash/outputs/elasticsearch.rb
+++ b/lib/logstash/outputs/elasticsearch.rb
@@ -424,7 +424,7 @@ class LogStash::Outputs::ElasticSearch < LogStash::Outputs::Base
     event_mapping_errors = [] # list of FailedEventMapping
     events.each do |event|
       begin
-        successful_events << @event_mapper.call(event)
+        successful_events << action_tuple(event)
       rescue EventMappingError => ie
         event_mapping_errors << FailedEventMapping.new(event, ie.message)
       end

We could then test action_tuple directly for a given plugin config without needing to know whether we should be testing event_action_tuple or data_stream_event_action_tuple.

Another possible way would be to pull the datastream-specific and legacy-specific behaviour out into a self-contained Modules so that setup_mapper_and_target wouldn't need to store procs-that-call-specific-methods.

Comment on lines 307 to 320
# let(:options) {
# {
# "hosts" => ["localhost","localhost:9202"],
# "data_stream" => "true",
# "data_stream_type" => "logs",
# "data_stream_dataset" => "generic",
# "data_stream_namespace" => "default"
# }
# }
#
# data_stream => "true"
# data_stream_type => "metrics"
# data_stream_dataset => "foo"
# data_stream_namespace => "bar"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out setup can be removed, but we should leave a note that the validation is done using datastream-specific method (or: find a way to keep the validations in this context group at the same level of abstraction as each other).

Suggested change
# let(:options) {
# {
# "hosts" => ["localhost","localhost:9202"],
# "data_stream" => "true",
# "data_stream_type" => "logs",
# "data_stream_dataset" => "generic",
# "data_stream_namespace" => "default"
# }
# }
#
# data_stream => "true"
# data_stream_type => "metrics"
# data_stream_dataset => "foo"
# data_stream_namespace => "bar"
# NOTE: we validate with datastream-specific `data_stream_event_action_tuple`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New management of version and version_type corrupt datastreams processing
3 participants