Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor using parallel workers for get/put operations #5

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

EXPEddrewery
Copy link

  • List operation adds keys to a queue
  • Workers retrieve key from queue and copy the key from the source to the destination backend
  • Queue Size configuration added (defaults to 10000)
  • Workers configuration added (defaults to 100)

- List operation adds keys to a queue
- Workers retrieve key from queue and copy the key from the source to the destination backend
- Queue Size configuration added (defaults to 10000)
- Workers configuration added (defaults to 100)
@criloz
Copy link
Member

criloz commented Mar 28, 2018

hi @EXPEddrewery, thanks for taking your time and make the pull request, unfortunately I don't believe that I can approve due that there is not much to gain adding go coroutines to the script, the task is bounded by the network io, but if you have a strong argument in favor to adopt coroutines I can reconsider. thanks!!

@EXPEddrewery
Copy link
Author

Thanks for replying. I closed this because it wasn't quite ready and also I haven't proven it works 100% yet.

Our use case is a very large Vault backend (900K objects) that we are moving from S3 to DynamoDB.

It's true that networkIO is one restriction, as is the throughput on DynamoDB.

In my initial tests with the original code, it was going to be hours to transfer that many objects.

I did a test with 10 workers and it came down to about an hour and then another test with 64 which was about 15 minutes.

I had to ramp up the DynamoDB throughput to over 5000 to do this and even then there was throttling, but it certainly shortened the change over time for our use case. I also used a C5 instance for maximum bandwidth.

Once I finish testing and I'm happy with it, I'll push the final code and you can decide whether it's worth including.

I've reduced the workers to default to 1 which should be the same level of performance as you have now.

@criloz
Copy link
Member

criloz commented Mar 28, 2018

@EXPEddrewery that numbers seem very nice, I think that it is worth 100% them, I would be ready to test it by my self.

- Move some logging to debug
- Add debug logging switch to configuration
@EXPEddrewery
Copy link
Author

EXPEddrewery commented Mar 29, 2018

Hi again. I've tested it a few times now with my S3 to DynamoDB migration and the numbers seem reasonably good.

I don't have any logs to provide but essentially I've left your original logging the same unless you pass the new debug configuration parameter as true.

Below is a screenshot of the metrics from the migration test I did today with 10K write capacity on DynamoDB and 64 workers with our Vault cluster that has approximately 950K objects in S3.

10k_test

It took just over 15 minutes to run on a c5.large and as you can see from the graphs, the it was throttled pretty heavily and never reached the 10K max. We'll be going with a 5K Max and 64 workers for our migration once we plan some downtime.

Let me know if you would like me to add some documentation for the configuration changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants