-
Notifications
You must be signed in to change notification settings - Fork 38
CRABCache replacement with S3
Stefano Belforte edited this page Mar 29, 2021
·
18 revisions
Action items and details are tracked as a GH project
Here is a link to Initial requirement list prepared to review our use case with CERN IT Storage people (Dan Van der Ster and Enirco Bocchi)
General ideas for using S3 in CRAB (possibly beyond CRABCache replacement) and notes about S3 use are in S3 for CRAB
- we create a few buckets called "CRABCacheProd", "CRABCachePreprod" etc.. All objects will have an expiration time of 31 days.
- we keep the current limit at 120 MB per sandbox
- there will be no user quota and we will deprecate the "crab purge" command
- we will monitor (at low rate) usage by user and will take actions if needed, but expect that with a large enough storage container the system will self-regulate
- All objects will be private
- CRABServer will hold the keys via a Kubernetes secret
- operators can use something like Ansible Vault to access keys in a safe way for testing/debugging/devel. from lxplus.
- eventually it will be good to have all operations done via CRABServer REST API's so we can do via browser using CMSWEB auth.
- CRABClient and TaskWorker will ask the REST server for a pre-signed URL whenever they want to upload a file and will then do an HTTP POST (e.g. via curl)
- for downloads CRABserver will fetch the object to memory and serve it to the client. So we have CMSWEB authentication and username/role validation at hand.
- read-only access to non-sensitive files. since those files will be accessible to the full world, it is unclear if we have anything that is safe to expose this way. To be reviews if ad when CERN puts SSO in front of S3.
see S3 guide Organizing Objects
we will have this structure:
CRABCacheProd/<username>/sanboxes/<sandboxid>
CRABCacheProd/<username>/<taskname>/[ClientLog.txt|TWLog.txt|DebugFiles.tgz]
so that CRABClient can check existing sandboxid's to decide if to reuse, or upload new one.