You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To get good performance from AWS S3, it's necessary to parallelise requests.
The odc-aio library provides functions used in the odc-tools CLI applications, and is implemented using Async Python and the aiobotocore library.
This has worked well for several years, providing good performance.
However, using async python, and in particular aiobotocore comes with several significant drawbacks.
Because aiobotocore works by altering the internals of boto3 (the Python library for accessing AWS), it is tightly coupled to the version of boto3 used. This greatly complicates building a Python environment, since boto3 has new releases almost every day, and aiobotocore only every few months.
The moto is a library which allows mocking AWS services within boto3 for testing. It does not work with aiobotocore, nor is there a drop in replacement, so most test of ODC Cloud tools rely on externally managed S3 buckets allowing anonymous access. This makes it impossible to run tests offline, AND has recently proved unreliable as access to some of those buckets has changed.
Proposal
An alternative to Asynchronous functions to parallelise access to cloud resources, is to use old fashioned threads. To get good S3 performance you only need to use somewhere from 10-50 parallel requests, which can easily be handled by threads. When used correctly the boto3 library is thread safe.
I think work should be put in to migrating away from odc-aio and using a threaded solution instead.
History
This was raised in #332 but never got to the top of the priority list.
The text was updated successfully, but these errors were encountered:
Background
To get good performance from AWS S3, it's necessary to parallelise requests.
The
odc-aio
library provides functions used in the odc-tools CLI applications, and is implemented using Async Python and the aiobotocore library.This has worked well for several years, providing good performance.
However, using async python, and in particular aiobotocore comes with several significant drawbacks.
Proposal
An alternative to Asynchronous functions to parallelise access to cloud resources, is to use old fashioned threads. To get good S3 performance you only need to use somewhere from 10-50 parallel requests, which can easily be handled by threads. When used correctly the boto3 library is thread safe.
I think work should be put in to migrating away from
odc-aio
and using a threaded solution instead.History
This was raised in #332 but never got to the top of the priority list.
The text was updated successfully, but these errors were encountered: