-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add key transform functions for massive rate limit improvements #205
base: master
Are you sure you want to change the base?
Conversation
This changes the way the go-ds-s3 plugin stores data in s3 in an effort to drastically improve speed. Instead of storing data all in the /ipfs directory (which counts as one prefix), we're storing things in a way so each block gets its own prefix.
* feat: add key transformation method for sharding * fix: use offset for next to last sharding
Thank you for submitting this PR!
Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution. |
I wasn't able to test this unless the module name matched my repo name so that is one thing I will change (may need some guidance to figure out how to do this properly as I'm new to go!) |
@MichaelMure : are you able to review this? |
From conversations with @whyrusleeping , it's my understanding that the next-to-last/2 method (and other methods of filesystem sharding) were originally implemented in order to better scale linux based filesystems. @achingbrain would you be able to share the reasons for sharding things out in the way you did in js-datastore-s3? Did you run into similar scalability issues or were you just trying to follow the existing filesystem sharding patterns of go-ipfs? Also, @aschmahmann / @BigLep as we move towards go-ipfs 0.12.0, do any considerations need to be taken with the desire to "Switch to raw multihashes in datastores (including datastore version upgrade)" in the 0.12.0 upgrade? If possible, I'd love to make sure any possible compatibility issues are addressed pre-emptively here. |
Oh, wait, sorry. You already know that. For sharding blocks, "next to last" is probably fine. Although I'd personally just use a cheap hash function. |
@jacobheun did the initial implementation so perhaps he can speak to the reason, but I get the feeling it was just following the principles set out by other datastore implementations. |
This is correct, it was just following the patterns of the other JS datastore implementations at the time (2018), it's just a lightweight interface to s3. @achingbrain I believe the sharding behavior changed in ipfs/js-datastore-s3#30, but it's not managed by the s3 datastore itself. |
@MichaelMure I believe this is ready for review. PTAL when you get the chance! |
@BigLep given no word so far from @MichaelMure is there someone else who might be available to review this? |
To run tests multiple times against the same local s3 instance
Similar response as in #195 (comment) Core IPFS maintainers aren't planning to take this on as we aren't using this ourselves. We need to figure out a way to setup and denote community membership for this and other datastore repos. We're game to setup others as the maintainers here. Maybe pair up with @obo20 and review each other's code (@obo20 has #195 open) or ask for other volunteers in the ipfs-operators Slack channel? |
} | ||
|
||
var KeyTransforms = map[string]func(ds.Key) string{ | ||
"default": func(k ds.Key) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this "default" isn't very descriptive, how about "identity" or something like that?
"next-to-last/2": func(k ds.Key) string { | ||
s := k.String() | ||
offset := 1 | ||
start := len(s) - 2 - offset | ||
return s[start:start+2] + "/" + s | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some problems here
datastore.NewKey("a") -> panic
datastore.NewKey("ab") -> /a//ab
datastore.NewKey("abc") -> ab//abc
datastore.NewKey("abcd") -> bc//abcd
- Panics on keys w/ len == 1, should return an error instead
- Keys w/ len == 2 have a slash in their prefix (should probably return an error instead?)
- Keys w/ len > 2 have two slashes instead of one
A small unit test would be nice here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @guseggert ! In practice, I am not seeing the double slash issue. Also just running through the logic I don't see how that would be possible but I'm new to Go 🙈.
However for keys of len 2 or smaller I agree this should be handled differently.
For keys of length 3 or more it still seems correct given a param of 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I did was add this test to print out the results, should help you reproduce:
func TestTransforms(t *testing.T) {
transform := KeyTransforms["next-to-last/2"]
keys := []string{"ab", "abc", "abcd"}
for _, k := range keys {
fmt.Printf("datastore.NewKey(\"%s\") -> %s\n", k, transform(datastore.NewKey(k)))
}
}
Full test:
func TestTransforms(t *testing.T) {
cases := []struct {
transform string
key string
expectedS3Key string
}{
{
transform: "next-to-last/2",
key: "ab",
expectedS3Key: "ab/ab",
},
{
transform: "next-to-last/2",
key: "abc",
expectedS3Key: "ab/abc",
},
{
transform: "next-to-last/2",
key: "abcd",
expectedS3Key: "bc/abcd",
},
}
for _, c := range cases {
t.Run(fmt.Sprintf("%s %s", c.transform, c.key), func(t *testing.T) {
transform := KeyTransforms[c.transform]
dsKey := datastore.NewKey(c.key)
s3Key := transform(dsKey)
if s3Key != c.expectedS3Key {
t.Fatalf("expected %s, got %s", c.expectedS3Key, s3Key)
}
})
}
}
output:
go test -v -run TestTransforms\$ .
=== RUN TestTransforms
=== RUN TestTransforms/next-to-last/2_ab
s3_test.go:105: expected ab/ab, got /a//ab
=== RUN TestTransforms/next-to-last/2_abc
s3_test.go:105: expected ab/abc, got ab//abc
=== RUN TestTransforms/next-to-last/2_abcd
s3_test.go:105: expected bc/abcd, got bc//abcd
--- FAIL: TestTransforms (0.00s)
--- FAIL: TestTransforms/next-to-last/2_ab (0.00s)
--- FAIL: TestTransforms/next-to-last/2_abc (0.00s)
--- FAIL: TestTransforms/next-to-last/2_abcd (0.00s)
FAIL
FAIL github.com/3box/go-ds-s3 0.004s
FAIL
localBucketName, localBucketNameSet := os.LookupEnv("LOCAL_BUCKET_NAME") | ||
if !localBucketNameSet { | ||
localBucketName = fmt.Sprintf("localbucketname%d", time.Now().UnixNano()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go fmt?
Region: "local", | ||
AccessKey: "test", | ||
SecretKey: "testdslocal", | ||
KeyTransform: "default", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests for the other transforms?
@@ -1,7 +1,7 @@ | |||
package main | |||
|
|||
import ( | |||
plugin "github.com/ipfs/go-ds-s3/plugin" | |||
plugin "github.com/3box/go-ds-s3/plugin" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plugin "github.com/3box/go-ds-s3/plugin" | |
plugin "github.com/ipfs/go-ds-s3/plugin" |
@@ -3,7 +3,7 @@ package plugin | |||
import ( | |||
"fmt" | |||
|
|||
s3ds "github.com/ipfs/go-ds-s3" | |||
s3ds "github.com/3box/go-ds-s3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3ds "github.com/3box/go-ds-s3" | |
s3ds "github.com/ipfs/go-ds-s3" |
@@ -4,7 +4,7 @@ import ( | |||
"reflect" | |||
"testing" | |||
|
|||
s3ds "github.com/ipfs/go-ds-s3" | |||
s3ds "github.com/3box/go-ds-s3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3ds "github.com/3box/go-ds-s3" | |
s3ds "github.com/ipfs/go-ds-s3" |
`next-to-last/2` | ||
- Shards by storing block data based on the second to last 2 characters of its key. E.g. `CX/CIQJ7IHPGOFUJT5UMXIW6CUDSNH6AVKMEOXI3UM3VLYJRZUISUMGCXQ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`next-to-last/2` | |
- Shards by storing block data based on the second to last 2 characters of its key. E.g. `CX/CIQJ7IHPGOFUJT5UMXIW6CUDSNH6AVKMEOXI3UM3VLYJRZUISUMGCXQ` | |
`next-to-last/2` | |
- Shards by storing block data based on the second to last 2 characters of its key. E.g. `CX/CIQJ7IHPGOFUJT5UMXIW6CUDSNH6AVKMEOXI3UM3VLYJRZUISUMGCXQ` | |
- This is for compatibility with [js-datastore-s3](https://github.com/ipfs/js-datastore-s3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compatibility with js-datastore-s3 and also go-ds-flatfs. Don't think it's worth the note really. Seems most of the datastores utilize some form of sharding but next-to-last/2 happens to be the default for some.
@v-stickykeys : are you going to incorporate the feedback? This looks like a good change that is worth landing. |
@v-stickykeys : are you going to be able to incorporate the feedback? We'd like to support getting this landed. |
Hey @BigLep the short answer is yes but honestly it hasn't been the top of my priority list and is now more of a weekend project. My PR as it stands meets our needs but I do have another branch with improvements addressing the review which I can try to get wrapped and pushed up in the next week. |
@v-stickykeys : just checking in. Maintainers can take a look if you post an new revision with incorporated comments. |
* wip * wip * wip * go back to 3box org
This would be great to have! |
Hi guys. I wrote the IPFS on AWS Fargate blog for AWS and I'm currently looking at making this S3 plugin work with the serverless architecture mentioned in the blog. This would be a shame not to merge this. S3 Sharding is key to making this a viable option compared to traditional drives. Referencing this post from @mikeal, if we can achieve 40GB/s throughput per bucket we could run large clusters at a fraction of the cost. We can leverage S3 infrequent access or intelligent tiering to save on cost even more. I will try this branch and I'm willing to help move this forward. @guseggert you've submitted multiple feedback. What is your assessment of the situation? @v-stickykeys are you still around? @BigLep can you add me as contributor to this PR? |
@koxon sorry that no one answered this... are you still interested in picking this up? Right now I don't see any upsides in letting users configure the number of prefixes between next-to-last-2,3 or whatever... might as well set it to a number that 128 or a number that maximizes the usage of S3 and leave it there. Unless there is a good reason to use a worse number. |
Without sharding in S3, it is unreasonable from a performance and cost perspective to use it as an IPFS datastore.
This PR adds customizable sharding methods, matching the implementation of go-ds-flatfs and compatible with js-datastore-core (also js-datastore-s3).
This incorporates the suggestion made by @whyrusleeping #195 (comment) to allow developers to decide the layout of their data on S3.