-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
geoip: extract database manager to stand-alone feature #15348
Conversation
Introduces an Elastic-licensed GeoipDatabaseManagement tool that can be used by ANY plugin running on Elastic-licensed Logstash to retrieve a subscription to a GeoIP database that ensures EULA-compliance and frequent updates, and migrates the previous Elastic-licensed code-in-Logstash-core extension to the Geoip Filter to use this new tool, requiring ZERO changes to in-the-wild versions of the plugin. The implementation of the new tool follows the previous implementation as closely as possible, but presents a new interface that ensures that a consumer can ATOMICALLY subscribe to a database path without risk that the subscriber will receive an update or expiry before it is finished applying the initial value: ~~~ ruby geoip_manager = LogStash::GeoipDatabaseManagement::Manager.instance subscription = geoip_manager.subscribe('City') subscription.observe(construct: ->(initial_dbinfo){ }, on_update: ->(updated_dbinfo){ }, on_expire: ->( _ ){ }) subscription.release! ~~~
📃 DOCS PREVIEW ✨ https://logstash_15348.docs-preview.app.elstc.co/diff |
#xpack.geoip.downloader.enabled: true | ||
#xpack.geoip.downloader.endpoint: "https://geoip.elastic.co/v1/database" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, intentionally:
-#xpack.geoip.download.endpoint: "https://geoip.elastic.co/v1/database"
+#xpack.geoip.downloader.endpoint: "https://geoip.elastic.co/v1/database"
To align with naming internally and with the similar feature in Elasticsearch's Geoip database manager. The deprecated pathway is included in geoip_database_management/extension.rb
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add poll interval here
#xpack.geoip.downloader.poll.interval: 24h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#xpack.geoip.downloader.endpoint: "https://geoip.elastic.co/v1/database" | |
#xpack.geoip.downloader.endpoint: "https://geoip.elastic.co/v1/database" | |
#xpack.geoip.downloader.poll.interval: 24h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add the poll interval intentionally, and would prefer deferring that discussion until later. I found the poll interval very helpful in manual testing, but there are some very sharp edges (like our TimeValue
setting not supporting upper/lower bounds on the total value or on granularity) that limit the value of exposing this to users.
settings.register(LogStash::Setting::NullableString.new("xpack.geoip.download.endpoint")) | ||
settings.register(LogStash::Setting::String.new("xpack.geoip.downloader.endpoint", "https://geoip.elastic.co/v1/database") | ||
.with_deprecated_alias("xpack.geoip.download.endpoint")) | ||
settings.register(LogStash::Setting::TimeValue.new("xpack.geoip.downloader.poll.interval", "24h")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aligns in both name and value-type with Elasticsearch's same feature. Intentionally not exposing it publicly because there is little value in doing so, and would likely require enhancing TimeValue
with min/max bounds.
📃 DOCS PREVIEW ✨ https://logstash_15348.docs-preview.app.elstc.co/diff |
📃 DOCS PREVIEW ✨ https://logstash_15348.docs-preview.app.elstc.co/diff |
simplifies using Subscription#observe from Java
📃 DOCS PREVIEW ✨ https://logstash_15348.docs-preview.app.elstc.co/diff |
📃 DOCS PREVIEW ✨ https://logstash_15348.docs-preview.app.elstc.co/diff |
📃 DOCS PREVIEW ✨ https://logstash_15348.docs-preview.app.elstc.co/diff |
jenkins test this again please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm noticing something when the databases expire. I'm running a test with > 10 pipelines, and if one pipeline expires we see the tagging warning show an empty list of pipeline_ids:
[2023-10-19T12:52:05,469][ERROR][logstash.geoipdatabasemanagement.manager] failed to sync databases {:exception=>"Read timed out"}
[2023-10-19T12:52:05,472][ERROR][logstash.geoipdatabasemanagement.manager] The managed MaxMind GeoIP ASN database hasn't been synchronized in 5787 days and will be removed in order to remain compliant with the MaxMind EULA. Logstash is unable to get newer version from internet. Please check the network settings and allow Logstash accesses the internet to download the latest database. Alternatively you can switch to a self-managed GeoIP database service (`xpack.geoip.download.endpoint`), or configure each plugin with a self-managed database which you can download from https://dev.maxmind.com/geoip/geoip2/geolite2/
[2023-10-19T12:52:05,473][WARN ][logstash.filters.geoip.databasemanager] geoip filter plugin will stop filtering and will tag all events with the '_geoip_expired_database' tag. {:database_type=>"ASN", :pipeline_ids=>[]}
But if both expire, the second warning does list all the pipelines that are affected:
[2023-10-19T12:53:35,585][ERROR][logstash.geoipdatabasemanagement.manager] failed to sync databases {:exception=>"geoip.elastic.co: nodename nor servname provided, or not known"}
[2023-10-19T12:53:35,588][ERROR][logstash.geoipdatabasemanagement.manager] The managed MaxMind GeoIP ASN database hasn't been synchronized in 5787 days and has been removed in order to remain compliant with the MaxMind EULA. Logstash is unable to get newer version from internet. Please check the network settings and allow Logstash accesses the internet to download the latest database. Alternatively you can switch to a self-managed GeoIP database service (`xpack.geoip.download.endpoint`), or configure each plugin with a self-managed database which you can download from https://dev.maxmind.com/geoip/geoip2/geolite2/
[2023-10-19T12:53:35,589][WARN ][logstash.filters.geoip.databasemanager] geoip filter plugin will stop filtering and will tag all events with the '_geoip_expired_database' tag. {:database_type=>"ASN", :pipeline_ids=>[]}
[2023-10-19T12:53:35,593][ERROR][logstash.geoipdatabasemanagement.manager] The managed MaxMind GeoIP City database hasn't been synchronized in 5787 days and will be removed in order to remain compliant with the MaxMind EULA. Logstash is unable to get newer version from internet. Please check the network settings and allow Logstash accesses the internet to download the latest database. Alternatively you can switch to a self-managed GeoIP database service (`xpack.geoip.download.endpoint`), or configure each plugin with a self-managed database which you can download from https://dev.maxmind.com/geoip/geoip2/geolite2/
[2023-10-19T12:53:35,593][WARN ][logstash.filters.geoip.databasemanager] geoip filter plugin will stop filtering and will tag all events with the '_geoip_expired_database' tag. {:database_type=>"City", :pipeline_ids=>["pipeline_1", "pipeline_10", "pipeline_11", "pipeline_12", "pipeline_13", "pipeline_14", "pipeline_15", "pipeline_16", "pipeline_2", "pipeline_3", "pipeline_4", "pipeline_5", "pipeline_6", "pipeline_7", "test"]}
[2023-10-19T12:53:35,601][INFO ][logstash.geoipdatabasemanagement.manager] Stale database directory `/Users/joaoduarte/elastic/logstash/data/geoip_database_management/1697716159` has been deleted
I haven't yet check why this is.
DB_EXT = 'mmdb'.freeze | ||
GEOLITE = 'GeoLite2-'.freeze | ||
|
||
CITY = "City".freeze | ||
ASN = "ASN".freeze | ||
DB_TYPES = [ASN, CITY].freeze | ||
|
||
CC = "CC".freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use the ones at x-pack/lib/geoip_database_management/constants.rb ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one got tricky: the pre-existing specs also declared CITY
and ASN
etc. as references to these constants without namespacing them, so the specs used global constants that weren't available to runtime code, and with the refactor these constants were no longer referenced otherwise (or referenced a single time).
I've inlined the single-use ones, removed the unreferenced ones, and made some small changes to the specs to just use the string values. Including a module into an example group does not make that module's constants available, and I figured this was the path of least resistance. I did consider using symbols instead of strings (safely handling strings if-and-only-if they are already in the list of symbols), but the scope of that feels like a bigger change than we need.
There are separate messages for each of the subscribed databases, and there were zero pipelines subscribed to the ASN database so its message had zero, while the City database had 10 pipleines subscribed and included all 10 in its log message. Do you have thoughts on how to better present this? |
wow that was dumb. nevermind :D |
SonarQube Quality Gate |
💚 Build Succeeded
History
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tremendous refactor work here.
I've manually tested scenarios that worked correctly:
- databases expiring
- daily checks failing
- new databases showing up
- tagging of events due to database expiration
- recovery after network failure
LGTM
Release notes
Adds a general Geoip database manager that can be used by any plugin that is running an Elastic-licensed disribution of Logstash.
What does this PR do?
Introduces an Elastic-licensed GeoipDatabaseManagement tool that can be used by ANY plugin running on Elastic-licensed Logstash to retrieve a subscription to a GeoIP database that ensures EULA-compliance and frequent updates, and migrates the previous Elastic-licensed code-in-Logstash-core extension to the Geoip Filter to use this new tool, requiring ZERO changes to in-the-wild versions of the plugin.
The implementation of the new tool follows the previous implementation as closely as possible, but presents a new interface that ensures that a consumer can ATOMICALLY subscribe to a database path without risk that the subscriber will receive an update or expiry before it is finished applying the initial value:
The GeoIP Filter's extension has new specs to validate that its previously
specified behavior holds true:
Why is it important/What is the impact to the user?
While the Geoip filter has long had an extension inside Logstash core to add a GeoIP database manager for itself when running in an Elastic-licensed Logstash distro, this feature makes a comparable Geoip database manager available to any plugin, without the specific implementation details of the Geoip filter.
Checklist
Author's Checklist
How to test this PR locally
Run pipelines that include the Geoip filter as set to use a managed database, observing that databases are refreshed as expected and maintained across process restarts
Closes #15387