Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filter_path to bulk messages #1154

Merged
merged 6 commits into from
Oct 25, 2023
Merged

Conversation

robbavey
Copy link
Contributor

This commit sets the filter_path query parameter when sending messages
to Elasticsearch using the bulk API. This should significantly reduce
the size of the query response from Elasticsearch, which should help
reduce bandwidth usage, and improve response processing speed due to
the lesser amount of JSON to deserialize

Resolves: #1153

This commit sets the `filter_path` query parameter when sending messages
to Elasticsearch using the bulk API. This should significantly reduce
the size of the query response from Elasticsearch, which should help
reduce bandwidth usage, and improve response processing speed due to
the lesser amount of JSON to deserialize

Resolves: logstash-plugins#1153
Copy link
Contributor

@roaksoax roaksoax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm~!

@@ -176,7 +176,11 @@ def join_bulk_responses(bulk_responses)
end

def bulk_send(body_stream, batch_actions)
params = compression_level? ? {:headers => {"Content-Encoding" => "gzip"}} : {}
params = {
:query => {"filter_path" => "errors,items.*.error,items.*.status"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the only side effect here is that a user setting bulk_path => "/filter_path=errors" for some strange reason may see their setting overwritten

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that, and was trying to think of a valid reason why they would change the filter path, other than to filter the response to reduce the payload - this feels like it is an implementation details, which I think we only exposed for the monitoring use case, and the payload reduction we already covered.

Possibly debugging?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like manticore (and httpclient) will allow both to coexist:

Manticore.get("http://localhost:3333/?q=test", query: {q: "kittens"}).body

causes:

❯ nc -l 3333           
GET /?q=test&q=kittens HTTP/1.1
Connection: Keep-Alive
Content-Length: 0
Host: localhost:3333
User-Agent: Manticore 0.9.1
Accept-Encoding: gzip,deflate

So it depends how ES chooses to treat these cases since the RFC doesn't seem to prohibit this case, nor explain what the server should do.
Either way we should confirm how ES handles this situation and add a note to the docs about how query parameters in bulk_path interact with parameters, w/ a special mention about filter_path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we have options

  • Keep it as it is in the PR
  • Detect filter_path in the existing URL and use that instead of the new setting, and log appropriately.
  • Detect filter_path in the existing URL and drop it, enforcing the new setting, and log appropriately.

I'm easy with any of them, as long as we document it clearly

CHANGELOG.md Show resolved Hide resolved
@@ -197,5 +197,14 @@ def self.setup_api_key(logger, params)
def self.dedup_slashes(url)
url.gsub(/\/+/, "/")
end

def self.resolve_filter_path(url)
return url if url.nil? || url.match?(/(?:[&|?])filter_path=/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dedup_slashes can never return nil since it performs gsub on a string without checking it it's nil:

Suggested change
return url if url.nil? || url.match?(/(?:[&|?])filter_path=/)
return url if url.match?(/(?:[&|?])filter_path=/)

Comment on lines +202 to +203
return url if url.match?(/(?:[&|?])filter_path=/)
("#{url}#{query_param_separator(url)}filter_path=errors,items.*.error,items.*.status")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note about this injection of filter_path and it's skipping in the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jsvd jsvd merged commit 70654fb into logstash-plugins:main Oct 25, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce ES response size through use of filter_path
4 participants