Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geoip: extract database manager to stand-alone feature #15348

Merged
merged 15 commits into from
Nov 6, 2023

Conversation

yaauie
Copy link
Member

@yaauie yaauie commented Sep 26, 2023

Release notes

Adds a general Geoip database manager that can be used by any plugin that is running an Elastic-licensed disribution of Logstash.

What does this PR do?

Introduces an Elastic-licensed GeoipDatabaseManagement tool that can be used by ANY plugin running on Elastic-licensed Logstash to retrieve a subscription to a GeoIP database that ensures EULA-compliance and frequent updates, and migrates the previous Elastic-licensed code-in-Logstash-core extension to the Geoip Filter to use this new tool, requiring ZERO changes to in-the-wild versions of the plugin.

The implementation of the new tool follows the previous implementation as closely as possible, but presents a new interface that ensures that a consumer can ATOMICALLY subscribe to a database path without risk that the subscriber will receive an update or expiry before it is finished applying the initial value:

geoip_manager = LogStash::GeoipDatabaseManagement::Manager.instance subscription = geoip_manager.subscribe('City')

subscription.observe(construct: ->(initial_dbinfo){ },
                     on_update: ->(updated_dbinfo){ },
                     on_expire: ->(       _      ){ })

subscription.release!

The GeoIP Filter's extension has new specs to validate that its previously
specified behavior holds true:

    LogStash::Filters::Geoip
      DatabaseManager
        initialize
          is not subscribed to the EULA manager
        subscribe database path
          when manager is disabled
            when user subscribes to explicit path
              logs about the path being configured manually
              returns user input path
              is not subscribed to the EULA manager
            when user does not specify an explict path
              logged about preparing CC
              returns the CC-licensed database
              is not subscribed to the EULA manager
          when manager is enabled
            when user does not specify an explicit path
              and EULA database is pending
                logged about preparing CC
                logs about the user implicitly accepting the MaxMind EULA
                returns the CC-licensed database
                when the manager updates the db
                  notifies the plugin
                when the manager expires the db
                  notifies the plugin
                when the manager updates a different DB
                  does not notify the plugin
                when the manager expires a different DB
                  does not notify the plugin
              and EULA database is expired
                logs about the user implicitly accepting the MaxMind EULA
                returns nil
                is subscribed for updates
                when the manager updates a different DB
                  does not notify the plugin
                when the manager updates the db
                  notifies the plugin
              and EULA database has a recent database
                logs about the user implicitly accepting the MaxMind EULA
                returns the path to the managed database
                is subscribed for updates
                when the manager updates the db
                  notifies the plugin
                when the manager expires the db
                  notifies the plugin
                when the manager updates a different DB
                  does not notify the plugin
                when the manager expires a different DB
                  does not notify the plugin
            when user subscribes to explicit path
              logs about the path being configured manually
              returns user input path
              is not subscribed to the EULA manager
        unsubscribe
          removes plugin in state
        shutdown
          unsubscribes gracefully

Why is it important/What is the impact to the user?

While the Geoip filter has long had an extension inside Logstash core to add a GeoIP database manager for itself when running in an Elastic-licensed Logstash distro, this feature makes a comparable Geoip database manager available to any plugin, without the specific implementation details of the Geoip filter.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • Self-review commentary

How to test this PR locally

Run pipelines that include the Geoip filter as set to use a managed database, observing that databases are refreshed as expected and maintained across process restarts

Closes #15387

Introduces an Elastic-licensed GeoipDatabaseManagement tool that can be used
by ANY plugin running on Elastic-licensed Logstash to retrieve a subscription
to a GeoIP database that ensures EULA-compliance and frequent updates, and
migrates the previous Elastic-licensed code-in-Logstash-core extension to
the Geoip Filter to use this new tool, requiring ZERO changes to in-the-wild
versions of the plugin.

The implementation of the new tool follows the previous implementation as
closely as possible, but presents a new interface that ensures that a
consumer can ATOMICALLY subscribe to a database path without risk that the
subscriber will receive an update or expiry before it is finished applying
the initial value:

~~~ ruby
geoip_manager = LogStash::GeoipDatabaseManagement::Manager.instance
subscription = geoip_manager.subscribe('City')

subscription.observe(construct: ->(initial_dbinfo){ },
                     on_update: ->(updated_dbinfo){ },
                     on_expire: ->(       _      ){ })

subscription.release!
~~~
@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_15348.docs-preview.app.elstc.co/diff

#xpack.geoip.downloader.enabled: true
#xpack.geoip.downloader.endpoint: "https://geoip.elastic.co/v1/database"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, intentionally:

-#xpack.geoip.download.endpoint: "https://geoip.elastic.co/v1/database"
+#xpack.geoip.downloader.endpoint: "https://geoip.elastic.co/v1/database"

To align with naming internally and with the similar feature in Elasticsearch's Geoip database manager. The deprecated pathway is included in geoip_database_management/extension.rb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add poll interval here
#xpack.geoip.downloader.poll.interval: 24h

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#xpack.geoip.downloader.endpoint: "https://geoip.elastic.co/v1/database"
#xpack.geoip.downloader.endpoint: "https://geoip.elastic.co/v1/database"
#xpack.geoip.downloader.poll.interval: 24h

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add the poll interval intentionally, and would prefer deferring that discussion until later. I found the poll interval very helpful in manual testing, but there are some very sharp edges (like our TimeValue setting not supporting upper/lower bounds on the total value or on granularity) that limit the value of exposing this to users.

x-pack/lib/geoip_database_management/data_path.rb Outdated Show resolved Hide resolved
x-pack/lib/geoip_database_management/data_path.rb Outdated Show resolved Hide resolved
settings.register(LogStash::Setting::NullableString.new("xpack.geoip.download.endpoint"))
settings.register(LogStash::Setting::String.new("xpack.geoip.downloader.endpoint", "https://geoip.elastic.co/v1/database")
.with_deprecated_alias("xpack.geoip.download.endpoint"))
settings.register(LogStash::Setting::TimeValue.new("xpack.geoip.downloader.poll.interval", "24h"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aligns in both name and value-type with Elasticsearch's same feature. Intentionally not exposing it publicly because there is little value in doing so, and would likely require enhancing TimeValue with min/max bounds.

docs/index.asciidoc Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_15348.docs-preview.app.elstc.co/diff

@github-actions
Copy link
Contributor

📃 DOCS PREVIEWhttps://logstash_15348.docs-preview.app.elstc.co/diff

@github-actions
Copy link
Contributor

github-actions bot commented Oct 2, 2023

📃 DOCS PREVIEWhttps://logstash_15348.docs-preview.app.elstc.co/diff

@yaauie yaauie marked this pull request as ready for review October 2, 2023 23:07
@github-actions
Copy link
Contributor

github-actions bot commented Oct 3, 2023

📃 DOCS PREVIEWhttps://logstash_15348.docs-preview.app.elstc.co/diff

@roaksoax roaksoax requested a review from jsvd October 4, 2023 15:22
@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2023

📃 DOCS PREVIEWhttps://logstash_15348.docs-preview.app.elstc.co/diff

@yaauie
Copy link
Member Author

yaauie commented Oct 11, 2023

jenkins test this again please

Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm noticing something when the databases expire. I'm running a test with > 10 pipelines, and if one pipeline expires we see the tagging warning show an empty list of pipeline_ids:

[2023-10-19T12:52:05,469][ERROR][logstash.geoipdatabasemanagement.manager] failed to sync databases {:exception=>"Read timed out"}
[2023-10-19T12:52:05,472][ERROR][logstash.geoipdatabasemanagement.manager] The managed MaxMind GeoIP ASN database hasn't been synchronized in 5787 days and will be removed in order to remain compliant with the MaxMind EULA. Logstash is unable to get newer version from internet. Please check the network settings and allow Logstash accesses the internet to download the latest database. Alternatively you can switch to a self-managed GeoIP database service (`xpack.geoip.download.endpoint`), or  configure each plugin with a self-managed database which you can download from https://dev.maxmind.com/geoip/geoip2/geolite2/ 
[2023-10-19T12:52:05,473][WARN ][logstash.filters.geoip.databasemanager] geoip filter plugin will stop filtering and will tag all events with the '_geoip_expired_database' tag. {:database_type=>"ASN", :pipeline_ids=>[]}

But if both expire, the second warning does list all the pipelines that are affected:

[2023-10-19T12:53:35,585][ERROR][logstash.geoipdatabasemanagement.manager] failed to sync databases {:exception=>"geoip.elastic.co: nodename nor servname provided, or not known"}
[2023-10-19T12:53:35,588][ERROR][logstash.geoipdatabasemanagement.manager] The managed MaxMind GeoIP ASN database hasn't been synchronized in 5787 days and has been removed in order to remain compliant with the MaxMind EULA. Logstash is unable to get newer version from internet. Please check the network settings and allow Logstash accesses the internet to download the latest database. Alternatively you can switch to a self-managed GeoIP database service (`xpack.geoip.download.endpoint`), or  configure each plugin with a self-managed database which you can download from https://dev.maxmind.com/geoip/geoip2/geolite2/ 
[2023-10-19T12:53:35,589][WARN ][logstash.filters.geoip.databasemanager] geoip filter plugin will stop filtering and will tag all events with the '_geoip_expired_database' tag. {:database_type=>"ASN", :pipeline_ids=>[]}
[2023-10-19T12:53:35,593][ERROR][logstash.geoipdatabasemanagement.manager] The managed MaxMind GeoIP City database hasn't been synchronized in 5787 days and will be removed in order to remain compliant with the MaxMind EULA. Logstash is unable to get newer version from internet. Please check the network settings and allow Logstash accesses the internet to download the latest database. Alternatively you can switch to a self-managed GeoIP database service (`xpack.geoip.download.endpoint`), or  configure each plugin with a self-managed database which you can download from https://dev.maxmind.com/geoip/geoip2/geolite2/ 
[2023-10-19T12:53:35,593][WARN ][logstash.filters.geoip.databasemanager] geoip filter plugin will stop filtering and will tag all events with the '_geoip_expired_database' tag. {:database_type=>"City", :pipeline_ids=>["pipeline_1", "pipeline_10", "pipeline_11", "pipeline_12", "pipeline_13", "pipeline_14", "pipeline_15", "pipeline_16", "pipeline_2", "pipeline_3", "pipeline_4", "pipeline_5", "pipeline_6", "pipeline_7", "test"]}
[2023-10-19T12:53:35,601][INFO ][logstash.geoipdatabasemanagement.manager] Stale database directory `/Users/joaoduarte/elastic/logstash/data/geoip_database_management/1697716159` has been deleted

I haven't yet check why this is.

Comment on lines 30 to 37
DB_EXT = 'mmdb'.freeze
GEOLITE = 'GeoLite2-'.freeze

CITY = "City".freeze
ASN = "ASN".freeze
DB_TYPES = [ASN, CITY].freeze

CC = "CC".freeze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the ones at x-pack/lib/geoip_database_management/constants.rb ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one got tricky: the pre-existing specs also declared CITY and ASN etc. as references to these constants without namespacing them, so the specs used global constants that weren't available to runtime code, and with the refactor these constants were no longer referenced otherwise (or referenced a single time).

I've inlined the single-use ones, removed the unreferenced ones, and made some small changes to the specs to just use the string values. Including a module into an example group does not make that module's constants available, and I figured this was the path of least resistance. I did consider using symbols instead of strings (safely handling strings if-and-only-if they are already in the list of symbols), but the scope of that feels like a bigger change than we need.

x-pack/lib/geoip_database_management/downloader.rb Outdated Show resolved Hide resolved
config/jvm.options Outdated Show resolved Hide resolved
x-pack/lib/geoip_database_management/manager.rb Outdated Show resolved Hide resolved
@yaauie
Copy link
Member Author

yaauie commented Oct 24, 2023

I'm noticing something when the databases expire. I'm running a test with > 10 pipelines, and if one pipeline expires we see the tagging warning show an empty list of pipeline_id [, but] if both expire, the second warning does list all the pipelines that are affected

There are separate messages for each of the subscribed databases, and there were zero pipelines subscribed to the ASN database so its message had zero, while the City database had 10 pipleines subscribed and included all 10 in its log message.

Do you have thoughts on how to better present this?

@jsvd
Copy link
Member

jsvd commented Oct 26, 2023

There are separate messages for each of the subscribed databases, and there were zero pipelines subscribed to the ASN database so its message had zero, while the City database had 10 pipelines subscribed and included all 10 in its log message.

wow that was dumb. nevermind :D

Copy link

SonarQube Quality Gate

Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

@yaauie yaauie requested a review from jsvd November 3, 2023 18:01
Copy link
Member

@jsvd jsvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tremendous refactor work here.
I've manually tested scenarios that worked correctly:

  • databases expiring
  • daily checks failing
  • new databases showing up
  • tagging of events due to database expiration
  • recovery after network failure

LGTM

@yaauie yaauie merged commit 51886b9 into elastic:main Nov 6, 2023
3 checks passed
@yaauie yaauie deleted the geoip-database-management branch November 6, 2023 17:22
@yaauie yaauie added the v8.12.0 label Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor: extract GeoIP database manager to stand-alone feature
5 participants