Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be able to export a full list of media clusters. #2024

Merged
merged 8 commits into from
Sep 10, 2024

Conversation

caiosba
Copy link
Contributor

@caiosba caiosba commented Sep 5, 2024

Description

The current export limit for media lists is 10.000 because this is the maximum size of a result window in ElasticSearch. The solution is to paginate the results.

Fixes: CV2-5205.

How has this been tested?

This is just a refactoring, the existing test should cover it.

Checklist

  • I have performed a self-review of my own code
  • I have added unit and feature tests, if the PR implements a new feature or otherwise would benefit from additional testing
  • I have added regression tests, if the PR fixes a bug
  • I have added logging, exception reporting, and custom tracing with any additional information required for debugging
  • I considered secure coding practices when writing this code. Any security concerns are noted above.
  • I have commented my code in hard-to-understand areas, if any
  • I have made needed changes to the README
  • My changes generate no new warnings
  • If I added a third party module, I included a rationale for doing so and followed our current guidelines

The current export limit for media lists is 10.000 because this is the maximum size of a result window in ElasticSearch. The solution is to paginate the results.

Fixes: CV2-5205.
Copy link
Contributor

@melsawy melsawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caiosba to get all existing items not recommended to use pagination as maximum item to get through pagination is 10.000 items but you should iterated using search_after option this is a reference to get items using search_after

@caiosba
Copy link
Contributor Author

caiosba commented Sep 5, 2024

Thanks Sawy, I'll make this change!

@DGaffney
Copy link
Contributor

DGaffney commented Sep 5, 2024

As long as this offset/limit syntax works it otherwise looks good to me:

      search.set_option('eslimit', page_size)
      search.set_option('esoffset', offset)

@caiosba caiosba requested a review from melsawy September 9, 2024 01:13
@caiosba
Copy link
Contributor Author

caiosba commented Sep 9, 2024

@caiosba to get all existing items not recommended to use pagination as maximum item to get through pagination is 10.000 items but you should iterated using search_after option this is a reference to get items using search_after

Thanks @melsawy ! Indeed, I tested it locally with more than 20.000 items and my previous approach only exported 10.000, with no alerts from ElasticSearch. I re-implemented it using search_after and then I was able to export more than 20.000 items in less than 15 minutes locally. Thanks a lot for the reference you provided, it helped me a lot!

Kindly re-review, @melsawy :)

Copy link
Contributor

@melsawy melsawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and I suggest to use do..whileLoop instead of while true to fix CC issue

@caiosba caiosba merged commit 7fed86c into develop Sep 10, 2024
10 checks passed
@caiosba caiosba deleted the fix/CV2-5205-export-full-media-list branch September 10, 2024 02:34
caiosba added a commit that referenced this pull request Sep 10, 2024
The current export limit for media lists is 10.000 because this is the maximum size of a result window in ElasticSearch. The solution is to paginate the results.

Fixes: CV2-5205.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants