Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sqoop rucio table dumps with dynamic num-mappers #221

Open
mrceyhun opened this issue Apr 29, 2023 · 0 comments
Open

Improve sqoop rucio table dumps with dynamic num-mappers #221

mrceyhun opened this issue Apr 29, 2023 · 0 comments

Comments

@mrceyhun
Copy link
Contributor

We are using sqoop --num-mappers=40[1] for all Rucio table dump queries. However, some tables like rses, subscriptions are so small tables and their run time can be decreased with --num-mappers=1.
Previous experiences showed that we can run 2 --num-mappers-40 parallel, we did not implement it for Rucio table dumps since we don't have a time pressure. However, we can expect that our users will come with more Rucio table dump request, so we can implement it now.

So, my suggestions are:

  • Run, at most, 2 Rucio table dumps parallel.
  • Set --num-mappers=1 for "rses" and "subscriptions" tables.

P.S.: dbs3_full_global.sh is already using parallel dumps and logic can be copied from there.

[1] https://github.com/dmwm/CMSMonitoring/blob/master/sqoop/scripts/rucio_table_dumps.sh#L49

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant