Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use integration metadata to create ES actions #1155

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Oct 25, 2023

Release notes

Use integration metadata to interact with Elasticsearch.

What does this PR do?

Change the creation of actions that are passed down to Elasticsearch to use also the metadata fields set by an integration.
The interested fields are id (document_id), index and pipeline, the field values are taken verbatim without placeholders resolution.
The index, document_id and pipeline that are configured in the plugin settings have precedence on the integration ones because manifest an explicit choice made by the user.

Why is it important/What is the impact to the user?

This PR fixes an interoperability issue with Agent's integrations, where some metadata valued by integration has to be used down to Elasticsearch.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • use the plugin with an integration that configures the document id, and in case of same id no new documents are indexed.

How to test this PR locally

  • create a new deployment in Elastic cloud, it's the easiest way to have Elasticsearch, Kibana, and Fleet. Take note of the credentials when create the deployment, because later must be used in the configuration of logstash-outpiut-elasticsearch.
  • install an ElasticAgent and enroll in Fleet. It's only used to create a policy to install an integration (m365_defender), so that all the necessary pipelines are installed in Elastisearch.
  • in Logstash configure a pipeline like the following:
input {
  file {
    path => "/tmp/defender_singleline.json"
    sincedb_path => "/tmp/defender_sincedb"
    mode => "read"
    file_completed_action => "log"
    file_completed_log_path => "/tmp/processed.log"
    codec => json
  }
}

filter {
  elastic_integration {
    cloud_id   => "<clopud_id of your deployment>"
    cloud_auth => "elastic:<credentials showed you when the deployment was created>"
    geoip_database_directory => "/<your Logstash home>/vendor/bundle/jruby/3.1.0/gems/logstash-filter-geoip-7.2.13-java/vendor/GeoLite2-City.mmdb"
  }
}

output {
  stdout {
    codec => rubydebug { metadata => true }
  }

  elasticsearch {
cloud_id => "<clopud_id of your deployment>"
    api_key => "<retrieve from cloud deployment>"
    data_stream => true
    ssl => true
  }
}
  • now use a sample data event (like this) and create a one-line json file (named /tmp/defender_singleline.json). To squash all lines in one use:
cat <file_in>.json | awk '{for(i=1;i<=NF;i++) printf "%s",$i}' > <file_out>.json

or use the file defender_singleline.json

  • install this plugin, setting the path to this branch into Gemfile.
  • run Logstash with
bin/logstash -f "/path_to/pipeline.conf"
  • stop, rm the sincedb file with rm /tmp/defender_sincedb
  • start again Logstash with same command line.
  • verify in a datastream index related to defender, something like .ds-logs-m365_defender.incident-ep-, that only one document is present.

This means that despite the 2 distinct runs, the Defender integration that generate an unique id from the Incident fields was correctly executed and used.
The proof can be done by executing the same flow above, with shipped ES output plugin, and verify that the document result duplicated, so no unique document_id is generated by the integration.

Related issues

Use cases

Screenshots

Logs

@andsel andsel self-assigned this Oct 25, 2023
@andsel andsel changed the title Feature/use integration metadata to create es actions Use integration metadata to create ES actions Oct 27, 2023
@andsel andsel linked an issue Oct 27, 2023 that may be closed by this pull request
@andsel andsel marked this pull request as ready for review October 27, 2023 16:02
@roaksoax roaksoax requested review from yaauie and jsvd October 30, 2023 13:49
Copy link
Contributor

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have a mix of precedences.

The pipeline and id seem to work as-specified with explicit plugin configuration always overriding implicit discovery of the event, but index is reversed and allows implicitly-discovered values to override explicit plugin configuration.

There are also some gaps in the spec coverage, which should cover all of:

  • index:
    • when plugin's index is specified
      • when event metadata contains index
        • plugin's index is used
      • when event metadata does not contain index
        • plugin's index is used
    • when plugin's index is NOT specified
      • when event metadata contains index
        • event metadata index is used
      • when event metadata does not contain index
        • plugin's default index mechanism is used (including data streams variants)
  • id:
    • when plugin's document_id is specified
      • when event metadata contains id
        • plugin's document_id is used
      • when event metadata does not contain id
        • plugin's document_id is used
    • when plugin's document_id is NOT specified
      • when event metadata contains id
        • event metadata id is used
      • when event metadata does not contain id
        • plugin's default id mechanism is used (action tuple excludes an id)
  • pipeline:
    • when plugin's pipeline is specified
      • when event metadata contains pipeline
        • plugin's pipeline is used
      • when event metadata does not contain pipeline
        • plugin's pipeline is used
    • when plugin's pipeline is NOT specified
      • when event metadata contains pipeline
        • event metadata pipeline is used
      • when event metadata does not contain pipeline
        • plugin's default pipeline mechanism is used (action tuple excludes a pipeline)

@@ -271,6 +271,45 @@
end
end

describe "with event integration metadata" do
context "when there isn't any index setting specified and the event contains an integration metadata index" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From line 192, which appears to be the outer context group for this example context, the options includes an index directive.

When an index is explicitly given in the plugin configuration, it must supercede the metadata.

As-specified, explicit configuration always takes precedence over implicit discovery.

@andsel
Copy link
Contributor Author

andsel commented Nov 6, 2023

Hi @yaauie, thank's a lot for your review. I've integrated your suggestion and the PR is ready for a second round of review 🙏

@andsel andsel requested a review from yaauie November 6, 2023 18:02
Copy link
Contributor

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks clean and the spec coverage makes the behaviour clear.

We need a changelog entry and a version bump, and should consider whether to document this in the user-facing docs.

Suggestion for the changelog:

 - Adds support for propagating event processing metadata when this output is downstream of an Elastic Integration Filter and configured _without_ explicit `index`, `document_id`, or `pipeline` directives [#1155](https://github.com/logstash-plugins/logstash-output-elasticsearch/pull/1155)

I am not strongly opposed to including a user-facing blurb in the docs, but I don't think it adds much value and could lead to abuse.

@andsel andsel force-pushed the feature/use_integration_metadata_to_create_es_actions branch from d5d8180 to 8d486fa Compare November 8, 2023 11:24
@andsel andsel requested a review from yaauie November 8, 2023 11:28
Copy link
Contributor

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@andsel andsel force-pushed the feature/use_integration_metadata_to_create_es_actions branch from 8d486fa to 583af0e Compare November 10, 2023 08:26
@andsel andsel merged commit 47a5169 into logstash-plugins:main Nov 10, 2023
1 check passed
karenzone pushed a commit to karenzone/logstash-output-elasticsearch that referenced this pull request Dec 1, 2023
… integration metadata and datastream is enabled (logstash-plugins#1161)

During PR logstash-plugins#1155 the resolution of version and version_type ES parameters was moved from the index only event action tuple creation to the common method. This changed was due to do the intent to collect all integration-aware metadata fields in one place, but the common method is used also by the datastream part this result in populating event and event_type request parameters not only for normal index operations.
During an index operation on a datastream, if one of those parameters is valued, generated an error on ES indexing, resulting in request fail.

This PR move the processing and creation of event and event_type parameters, back in its original position, splitting the capture of integration metadata in 2 parts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use integration's metadata fields (id, index, pipeline) when present
3 participants