- S3 是一種 object storage,為許多應用 (web、mobile、IoT 等) 提供簡單的方式存取,資料備份就是其中一項應用。
參考資料:
- Getting Started – Amazon Simple Storage Service (S3) – AWS https://aws.amazon.com/s3/getting-started/ S3 有很多種用法,AWS Storage Gateway 可以做為 file server 或 local disk volume,有 AWS Management Console 可以操作檔案 (主要是建立 bucket),也有 Java、Python、Node.js、Ruby、Mobile (iOS、Android、Unity) 等不同的 SDK。
- Introduction to Amazon Simple Storage Service (S3) - Cloud Storage on AWS - YouTube https://www.youtube.com/watch?v=rKpKHulqYOQ 透過 web service interface 上傳/下載資料,可以自選 region,建立 bucket 就可以存放資料,資料會複製到多處;會有 object storage 的說法,應該是支援同一資料不同版本的關係?
- Cloud Storage Pricing – Amazon Simple Storage Service (S3) – AWS 主要分為 Storage、Request、Storage Management、Data Transfer,很意外其中 S3 Ojbect Tagging 要另外收費? #ril
- Amazon Simple Storage Service (S3) — Cloud Storage — AWS 一開始就寫著 "OBJECT STORAGE built to store and retrieve any amount of data from anywhere"。
- How to Backup files to Amazon S3 – Amazon Web Services 提到 bucket 可以啟用 versioning、logging、tags
- What is Cloud Object Storage? – AWS #ril
- Store and Retrieve a File – Amazon Web Services 操作過一次比較有感覺,登入 AWS Console、知道什麼是 bucket、object。
- How to Script the Backup of Files to Amazon S3 – AWS 前面一大段在講如何設定 IAM user #ril
- Getting Started – Amazon Simple Storage Service (S3) – AWS #ril
- How to Script the Backup of Files to Amazon S3 – AWS #ril
- Getting Started with Amazon S3 - YouTube #ril
- How to Backup files to Amazon S3 – Amazon Web Services A bucket is the container you store your files in.
- Bucket Restrictions and Limitations - Amazon Simple Storage Service #ril
- Bucket 名稱在整個 AWS 上必須是唯一的,建議採用 DNS-compliant naming??
- 雖然預設有 100 個 bucket 的上限 (可以擴充),但單一個 bucket 可以存放的 object 數量是沒有限制的,效能也不會因為使用多個 bucket 而比較好 (bucket 內不能再建 bucket)。
- DNS-compliant 是指 3 ~ 63 個字元;可以由多個 label 組成 (用
.
分開),每個 label 可以由小寫字母、數字跟-
構成,但不能由-
做為開頭。但最後又說因為 SSL 的關係,不建議在 bucket name 裡用 period (.
)。
- 在 S3 console 裡新增 folder 時提示 "When you create a folder, S3 console creates an object with the above name appended by suffix "/" and that object is displayed as a folder in the S3 console.",看起來都是 object,只是一種特殊的 object...
參考資料:
-
Working with Amazon S3 Objects - Amazon Simple Storage Service #ril
- S3 被設計為 key/value store,可以在一或多個 bucket 裡存放任意數量的 object。
- Key 是 object 的 name,對應的 value 可以 sequence of bytes (最大 5 TB)。
- Value 可以變更,透過 name + version ID 才能識別到特定的 object。
-
Object Key and Metadata - Amazon Simple Storage Service #ril
- 每個 S3 object 由 key、data 與 metadata 組成,其中 key (name) 可以在一個 bucket 裡識別出特定的 object,而 metadata 則是用來說明 data 的 key-value pairs。特別的是 metadata 只能在 upload 的當下給,一旦上傳後就不能再修改;變通的方式是建立 object 的複本,過程中再調整 metadata。
- Key 是一連串的 Unicode 字元,其 UTF-8 編碼最大不能超過 1024 bytes;但若能符合 naming guideline 之後其他應用比較不會遇到相容性的問題 (例如 sync 到檔案系統時,某些字元不能做為檔名),建議 key name 裡只用
[0-9a-zA-Z]
、!
、-
、_
、.
、*
、'
、(
及)
,例如videos/2014/birthday/video1.wmv
。 - If you anticipate that your workload against Amazon S3 will exceed 100 requests per second, follow the Amazon S3 key naming guidelines for best performance. 為什麼 key name 跟 performace 有關??
- 事實上 S3 的 data model 是偏平的 (flat structure) -- bucket 下就是 object,沒有 subbucket/subfolder 的概念,但可以像 S3 console 一樣利用 key name prefix 及 delimiter 來推導邏輯上的階層 (infer logical hierarchy)。
- S3 console 支援 folder 的概念。假設 bucket 下有 4 個 object:
Development/Projects1.xls
、Finance/statement1.pdf
、Private/taxdocument.pdf
與s3-dg.pdf
,S3 console 採用/
做為 delimiter,將Development/
、Finance/
及Private/
視為 key name prefix,顯示為 3 個資料夾,外加一個根目錄下的檔案s3-dg.pdf
。 - Characters That Might Require Special Handling 及 Characters to Avoid 在講的不是 key 不能用哪些字元,而是應用端 (application) 像是 "瀏覽器無法正確表現某些字元" 的問題,例如
-
Object Key Naming Guidelines - Object Key and Metadata - Amazon Simple Storage Service
-
You can use any UTF-8 character in an object key name. However, using certain characters in key names may cause problems with some applications and protocols. The following guidelines help you maximize compliance with DNS, web-safe characters, XML parsers, and other APIs.
或許檔名用 metadata 來記會比較好??
Safe Characters
-
The following character sets are generally safe for use in key names:
- Alphanumeric characters: 0-9, a-z, A-Z
- Special characters:
! - _ . * ' ( )
-
The following are examples of valid object key names:
4my-organization
my.great_photos-2014/jan/myvacation.jpg
videos/2014/birthday/video1.wmv
-
Important: If an object key name consists of a single period (
.
), or two periods (..
), you can’t download the object using the Amazon S3 console. To download an object with a key name of “.” or “..”, you must use the AWS CLI, AWS SDKs, or REST API.呼應上面會因 application/protocol 而異的說法。
Characters That Might Require SPECIAL HANDLING
-
The following characters in a key name might require ADDITIONAL CODE HANDLING and likely need to be URL encoded or referenced as HEX. Some of these are NON-PRINTABLE characters and your browser might not handle them, which also requires special handling:
- Ampersand ("&")
- Dollar ("$")
- ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)
- 'At' symbol ("@")
- Equals ("=")
- Semicolon (";")
- Colon (":")
- Plus ("+")
- Space – Significant sequences of spaces may be lost in some uses (especially multiple spaces)
- Comma (",")
- Question mark ("?")
所謂 "referenced as HEX" 是什麼?? 對 non-printable characters 似乎尤其重要?
Characters to Avoid
-
Avoid the following characters in a key name because of SIGNIFICANT SPECIAL HANDLING for consistency across all applications.
- Backslash ("")
- Left curly brace ("{")
- Non-printable ASCII characters (128–255 decimal characters)
- Caret ("^")
- Right curly brace ("}")
- Percent character ("%")
- Grave accent / back tick ("`")
- Right square bracket ("]")
- Quotation marks
- 'Greater Than' symbol (">")
- Left square bracket ("[")
- Tilde ("~")
- 'Less Than' symbol ("<")
- 'Pound' character ("#")
- Vertical bar / pipe ("|")
注意 non-printable character 0 ~ 31 及 128 ~ 255 的差別,前者是需要特別處理,後者則要完全避免。
-
- Bucket Restrictions and Limitations - Amazon Simple Storage Service 提到 "virtual-host style access to buckets" 及
http://MyAWSBucket.s3.amazonaws.com/yourobject
的用法。 - Virtual Hosting of Buckets - Amazon Simple Storage Service #ril
- 在 S3 console 上傳檔案後,直接用 curl 取用
https://s3-ap-northeast-1.amazonaws.com/bucketname/filename.ext
會出現 403 Forbidden,但做了 make public 之後 (不能再變成 private??),URL 不會變,但就可以存取了。
-
Object Lifecycle Management - Amazon Simple Storage Service
-
To manage your objects so that they are stored COST EFFECTIVELY throughout their LIFECYCLE, configure their lifecycle. A lifecycle configuration is a set of RULES that define ACTIONS that Amazon S3 applies to a group of objects. There are two types of actions:
-
Transition actions
Define when objects transition to another STORAGE CLASS. For example, you might choose to transition objects to the
STANDARD_IA
storage class 30 days after you created them, or archive objects to theGLACIER
storage class one year after creating them.There are costs associated with the lifecycle TRANSITION REQUESTS. For pricing information, see Amazon S3 Pricing.
按 Amazon S3 Storage Classes 的說法,不同 storage class 的差別在於 availability,所以價格也不同;最常用的資料可以留在
STANDARD
,較不常用的可以慢慢移往其他較低的 class。 -
Expiration actions
Define when objects EXPIRE. Amazon S3 DELETES expired objects on your behalf.
The lifecycle expiration COSTS depend on when you choose to expire objects. For more information, see Configuring Object Expiration. #ril
刪資料也有 cost ??
-
For more information about lifecycle rules, see Lifecycle Configuration Elements. #ril
When Should I Use Lifecycle Configuration?
-
Define lifecycle configuration rules for objects that have a well-defined lifecycle. For example:
-
If you upload periodic logs to a bucket, your application might need them for a week or a month. After that, you might want to delete them.
-
Some documents are FREQUENTLY accessed for a limited period of time. After that, they are INFREQUENTLY accessed. At some point, you might not need REAL-TIME ACCESS to them, but your organization or regulations might require you to ARCHIVE them for a specific period. After that, you can delete them.
-
You might upload some types of data to Amazon S3 primarily for ARCHIVAL PURPOSES. For example, you might archive digital media, financial and healthcare records, raw genomics sequence data, long-term database backups, and data that must be retained for REGULATORY COMPLIANCE.
-
-
With lifecycle configuration rules, you can tell Amazon S3 to transition objects to LESS EXPENSIVE storage classes, or archive or delete them.
How Do I Configure a Lifecycle?
-
A lifecycle configuration, an XML file, comprises a set of rules with predefined actions that you want Amazon S3 to perform on objects during their lifetime.
-
Amazon S3 provides a set of API operations for managing lifecycle configuration on a bucket. Amazon S3 stores the configuration as a LIFECYCLE SUBRESOURCE that is ATTACHED to your bucket. For details, see the following:
-
You can also configure the lifecycle by using the Amazon S3 console or programmatically by using the AWS SDK wrapper libraries. If you need to, you can also make the REST API calls directly. For more information, see Setting Lifecycle Configuration on a Bucket. #ril
-
-
How to Backup files to Amazon S3 – Amazon Web Services Step 5: Delete the Object and Bucket 建議沒用到的 object 就該刪除,才不會被收費。
-
S3 Console > Management 有 Lifecycle 的概念,Expire your objects: Using a lifecycle rule, you can automatically expire objects based on your retention needs or clean up incomplete multipart uploads.
-
Amazon S3 Storage Management Pricing - Cloud Storage Pricing – Amazon Simple Storage Service (S3) – AWS 這裡並沒有提到 lifecycle 相關的費用。
-
Expiring Objects: General Considerations - Amazon Simple Storage Service #ril
- amazon s3 - AWS S3: how do I see how much disk space is using - Stack Overflow thaavik:
aws s3 ls s3://mybucket --recursive --human-readable --summarize
#ril
- 安裝 Python 的
awscli
套件,執行aws configure
先給定 AWS Access Key ID 與 AWS Secret Access Key (以明碼記錄在~/.aws/credentials
),之後就可以執行aws s3 <COMMAND>
了。
-
S3 跟 CloudFront 都提供有 URL signing 的功能,但兩邊的說法不太一樣;S3 稱之為 Presigned URL,而 CloudFront 則稱之為 Signed URL。
不過 Signed URL 確實是比較通用的說法,像 Google Cloud Storage 也是稱做 Signed URL。
參考資料:
-
c# - What is difference between Pre-Signed Url and Signed Url? - Stack Overflow #ril
-
jia chen: Both S3 and CloudFront have URL signing features that work differently. However, only S3 refers to them as Pre-signed URLs CloudFront refers to them as Signed URLs and Signed Cookies.
Note the service names in the URLs in the documentation below.
- https://docs.aws.amazon.com/AmazonS3/latest/dev/PresignedUrlUploadObject.html
- https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-signed-urls.html#private-content-how-signed-urls-work
For a more in depth comparison of the different services check out the link below. If I had to guess, I would guess that AWS chose to name their signing services differently to AVOID CONFUSION. https://tutorialsdojo.com/aws-cheat-sheet-s3-pre-signed-urls-vs-cloudfront-signed-urls-vs-origin-access-identity-oai/ #ril
-
-
Share an Object with Others - Amazon Simple Storage Service
-
All objects by default are private. Only the object OWNER has permission to access these objects. However, the object owner can optionally SHARE objects with others by creating a PRESIGNED URL, using their own security credentials, to grant TIME-LIMITED permission to download the objects.
這裡的 share 指的就是 GET method。
-
When you create a presigned URL for your object, you must provide your security credentials, specify a bucket name, an object key, specify the HTTP method (
GET
to download the object) and expiration date and time. The presigned URLs are valid only for the specified duration.Anyone who receives the presigned URL can then access the object. For example, if you have a video in your bucket and both the bucket and the object are private, you can share the video with others by generating a presigned URL.
後來才發現產生 presigned URL 的過程,並不是透過 API 跟 AWS 拿,而是 SDK 自己根據 credentials 算出來的,也難怪這裡一直強調 "security credentials"。
aws-sdk-java/AmazonS3Client.java at master · aws/aws-sdk-java
public URL generatePresignedUrl(GeneratePresignedUrlRequest req) { ... Request<GeneratePresignedUrlRequest> request = createRequest( bucketName, key, req, httpMethod); request.addHandlerContext(HandlerContextKey.OPERATION_NAME, "GeneratePresignedUrl"); Signer signer = createSigner(request, bucketName, key); if (signer instanceof Presigner) { // If we have a signer which knows how to presign requests, // delegate directly to it. ((Presigner) signer).presignRequest(request, ...); } else { // Otherwise use the default presigning method, which is hardcoded // to use QueryStringSigner. presignRequest(request, ...); } // Remove the leading slash (if any) in the resource-path return ServiceUtils.convertRequestToUrl(request, true, false);
只是把 request 準備好,最後轉換成 URL 而已,難怪:
- 在 API Reference 裡完全找不到
GeneratePresignedUrl
的說明。 - 下面會有 "Anyone with valid security credentials can create a presigned URL" 的說法,暗示在產生 URL 的當下,並沒有真的去後端做什麼檢查。
Python SDK 也是;botocore/signers.py at develop · boto/botocore:
def generate_presigned_url(self, request_dict, operation_name, expires_in=3600, region_name=None, signing_name=None): request = create_request_object(request_dict) self.sign(operation_name, request, region_name, 'presign-url', expires_in, signing_name) request.prepare() return request.url
並沒有送出 request。
- 在 API Reference 裡完全找不到
-
Note: Anyone with valid security credentials can create a presigned URL. However, in order to successfully access an object, the presigned URL must be created by someone who has permission to perform the operation that the presigned URL is based upon.
The credentials that you can use to create a presigned URL include:
-
IAM instance profile: Valid up to 6 hours
這跟 EC2 才有的 IAM Role 有關,不知道它跟 Maximum CLI/API session duration 有什麼關係 (介於 1 ~ 12 hour) ??
-
AWS Security Token Service : Valid up to 36 hours when signed with permanent credentials, such as the credentials of the AWS account root user or an IAM user
-
IAM user: Valid up to 7 days when using AWS Signature Version 4
To create a presigned URL that's valid for up to 7 days, first designate IAM user credentials (the access key and secret access key) to the SDK that you're using. Then, generate a presigned URL using AWS Signature Version 4.
-
If you created a presigned URL using a TEMPORARY token, then the URL expires when the token expires, even if the URL was created with a later expiration time.
IAM instance profile 就是其中一種。
-
-
You can generate presigned URL programmatically using the AWS SDK for Java and .NET.
Generate a presigned Object URL Using the AWS SDK for Java - Amazon Simple Storage Service 的用法像是:
GeneratePresignedUrlRequest generatePresignedUrlRequest = new GeneratePresignedUrlRequest(bucketName, objectKey) .withMethod(HttpMethod.GET) .withExpiration(expiration); URL url = s3Client.generatePresignedUrl(generatePresignedUrlRequest);
跟下面 Uploading Objects Using Presigned URLs 的用法一樣,只是 HTTP method 不同而已。
-
-
Uploading Objects Using Presigned URLs - Amazon Simple Storage Service #ril
-
A presigned URL gives you access to the object identified in the URL, provided that the creator of the presigned URL has permissions to access that object.
That is, if you receive a presigned URL to upload an object, you can upload the object only if the creator of the presigned URL has the necessary permissions to upload that object.
-
All objects and buckets by default are private. The presigned URLs are useful if you want your user/customer to be able to upload a specific object to your bucket, but you don't require them to have AWS security credentials or permissions. When you create a presigned URL, you must provide your security credentials and then specify a bucket name, AN OBJECT KEY, an HTTP method (PUT for uploading objects), and an expiration date and time. The presigned URLs are valid only for the specified duration.
Upload an Object Using a Presigned URL (AWS SDK for Java) - Amazon Simple Storage Service 的用法像是:
GeneratePresignedUrlRequest generatePresignedUrlRequest = new GeneratePresignedUrlRequest(bucketName, objectKey) .withMethod(HttpMethod.PUT) .withExpiration(expiration); URL url = s3Client.generatePresignedUrl(generatePresignedUrlRequest);
-
You can generate a presigned URL programmatically using the AWS SDK for Java or the AWS SDK for .NET. If you are using Microsoft Visual Studio, you can also use AWS Explorer to generate a presigned object URL without writing any code. Anyone who receives a valid presigned URL can then programmatically upload an object.
-
Note: Anyone with valid security credentials can create a presigned URL. However, in order for you to successfully upload an object, the presigned URL must be created by someone who has permission to perform the operation that the presigned URL is based upon.
-
- 為什麼
aws s3 ls
不加s3://
也可以?? - File Commands for Amazon S3 - AWS Command Line Interface #ril
- Using Amazon S3 with the AWS Command Line Interface - AWS Command Line Interface #ril
- s3 — AWS CLI 1.14.14 Command Reference #ril
- File Commands for Amazon S3 - AWS Command Line Interface
aws s3 sync
可以同步 local folder 與 S3 bucket,例如aws s3 sync myfolder s3://mybucket/myfolder --exclude *.tmp
。 - sync — AWS CLI 1.14.14 Command Reference 有
--content-type
可以指定 MIME type,但預設會自動偵測;--dryrun
可以試跑,也可以同步兩個 S3 bucket 的資料!! 預設就是 recursive #ril
文件:
手冊: