-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filter_path to bulk messages #1154
Conversation
This commit sets the `filter_path` query parameter when sending messages to Elasticsearch using the bulk API. This should significantly reduce the size of the query response from Elasticsearch, which should help reduce bandwidth usage, and improve response processing speed due to the lesser amount of JSON to deserialize Resolves: logstash-plugins#1153
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm~!
@@ -176,7 +176,11 @@ def join_bulk_responses(bulk_responses) | |||
end | |||
|
|||
def bulk_send(body_stream, batch_actions) | |||
params = compression_level? ? {:headers => {"Content-Encoding" => "gzip"}} : {} | |||
params = { | |||
:query => {"filter_path" => "errors,items.*.error,items.*.status"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the only side effect here is that a user setting bulk_path => "/filter_path=errors"
for some strange reason may see their setting overwritten
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about that, and was trying to think of a valid reason why they would change the filter path, other than to filter the response to reduce the payload - this feels like it is an implementation details, which I think we only exposed for the monitoring use case, and the payload reduction we already covered.
Possibly debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like manticore (and httpclient) will allow both to coexist:
Manticore.get("http://localhost:3333/?q=test", query: {q: "kittens"}).body
causes:
❯ nc -l 3333
GET /?q=test&q=kittens HTTP/1.1
Connection: Keep-Alive
Content-Length: 0
Host: localhost:3333
User-Agent: Manticore 0.9.1
Accept-Encoding: gzip,deflate
So it depends how ES chooses to treat these cases since the RFC doesn't seem to prohibit this case, nor explain what the server should do.
Either way we should confirm how ES handles this situation and add a note to the docs about how query parameters in bulk_path
interact with parameters
, w/ a special mention about filter_path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we have options
- Keep it as it is in the PR
- Detect
filter_path
in the existing URL and use that instead of the new setting, and log appropriately. - Detect
filter_path
in the existing URL and drop it, enforcing the new setting, and log appropriately.
I'm easy with any of them, as long as we document it clearly
f06d776
to
4f706f0
Compare
85a87f4
to
6bed6dc
Compare
@@ -197,5 +197,14 @@ def self.setup_api_key(logger, params) | |||
def self.dedup_slashes(url) | |||
url.gsub(/\/+/, "/") | |||
end | |||
|
|||
def self.resolve_filter_path(url) | |||
return url if url.nil? || url.match?(/(?:[&|?])filter_path=/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dedup_slashes can never return nil since it performs gsub on a string without checking it it's nil:
return url if url.nil? || url.match?(/(?:[&|?])filter_path=/) | |
return url if url.match?(/(?:[&|?])filter_path=/) |
return url if url.match?(/(?:[&|?])filter_path=/) | ||
("#{url}#{query_param_separator(url)}filter_path=errors,items.*.error,items.*.status") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note about this injection of filter_path and it's skipping in the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit sets the
filter_path
query parameter when sending messagesto Elasticsearch using the bulk API. This should significantly reduce
the size of the query response from Elasticsearch, which should help
reduce bandwidth usage, and improve response processing speed due to
the lesser amount of JSON to deserialize
Resolves: #1153