-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor using parallel workers for get/put operations #5
base: master
Are you sure you want to change the base?
Conversation
EXPEddrewery
commented
Mar 28, 2018
- List operation adds keys to a queue
- Workers retrieve key from queue and copy the key from the source to the destination backend
- Queue Size configuration added (defaults to 10000)
- Workers configuration added (defaults to 100)
- List operation adds keys to a queue - Workers retrieve key from queue and copy the key from the source to the destination backend - Queue Size configuration added (defaults to 10000) - Workers configuration added (defaults to 100)
hi @EXPEddrewery, thanks for taking your time and make the pull request, unfortunately I don't believe that I can approve due that there is not much to gain adding go coroutines to the script, the task is bounded by the network io, but if you have a strong argument in favor to adopt coroutines I can reconsider. thanks!! |
Thanks for replying. I closed this because it wasn't quite ready and also I haven't proven it works 100% yet. Our use case is a very large Vault backend (900K objects) that we are moving from S3 to DynamoDB. It's true that networkIO is one restriction, as is the throughput on DynamoDB. In my initial tests with the original code, it was going to be hours to transfer that many objects. I did a test with 10 workers and it came down to about an hour and then another test with 64 which was about 15 minutes. I had to ramp up the DynamoDB throughput to over 5000 to do this and even then there was throttling, but it certainly shortened the change over time for our use case. I also used a C5 instance for maximum bandwidth. Once I finish testing and I'm happy with it, I'll push the final code and you can decide whether it's worth including. I've reduced the workers to default to 1 which should be the same level of performance as you have now. |
@EXPEddrewery that numbers seem very nice, I think that it is worth 100% them, I would be ready to test it by my self. |
- Move some logging to debug - Add debug logging switch to configuration
… match current behaviour
Hi again. I've tested it a few times now with my S3 to DynamoDB migration and the numbers seem reasonably good. I don't have any logs to provide but essentially I've left your original logging the same unless you pass the new Below is a screenshot of the metrics from the migration test I did today with 10K write capacity on DynamoDB and 64 workers with our Vault cluster that has approximately 950K objects in S3. It took just over 15 minutes to run on a c5.large and as you can see from the graphs, the it was throttled pretty heavily and never reached the 10K max. We'll be going with a 5K Max and 64 workers for our migration once we plan some downtime. Let me know if you would like me to add some documentation for the configuration changes. |