-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Badger] Add universal database operations #6465
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6465 +/- ##
==========================================
- Coverage 41.19% 41.16% -0.04%
==========================================
Files 2052 2064 +12
Lines 182191 182649 +458
==========================================
+ Hits 75062 75183 +121
- Misses 100839 101156 +317
- Partials 6290 6310 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -0,0 +1,189 @@ | |||
package operation_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test cases are taken from similar ones in storage/badger/operation/common_test.go
@@ -0,0 +1,278 @@ | |||
package operation_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test cases are taken from similar ones in storage/badger/operation/common_test.go
storage/operation/reads.go
Outdated
keyCopy := make([]byte, len(key)) | ||
copy(keyCopy, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying the key is for safety, otherwise caller might make mistake storing the key in a slice and causing problem because this iteration here might overwrite the underlying slice. Making a copy of the key could prevent caller from making such mistake.
tx := db.NewTransaction(false) | ||
iter := tx.NewIterator(options) | ||
|
||
lowerBound, upperBound := storage.StartEndPrefixToLowerUpperBound(startPrefix, endPrefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, we are applying the same approach as pebble to compute the exclusive upperBound here. This allows us to get rid of the global max
value stored in the database (see: storage/badger/operation/max.go
)
keys := [][]byte{ | ||
// before start -> not included in range | ||
{0x09, 0xff}, | ||
// within the start prefix -> included in range | ||
{0x10, 0x00}, | ||
{0x10, 0xff}, | ||
// between start and end -> included in range | ||
{0x15, 0x00}, | ||
{0x1A, 0xff}, | ||
// within the end prefix -> included in range | ||
{0x20, 0x00}, | ||
{0x20, 0xff}, | ||
// after end -> not included in range | ||
{0x21, 0x00}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same test cases to test the boundary cases
import ( | ||
// #nosec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix linter error
} | ||
|
||
func (b *ReaderBatchWriter) Commit() error { | ||
err := b.batch.Commit(pebble.Sync) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the database operations being universal, it's easy to write benchmark to compare the performance between badger and pebble implementation.
The read performance for badger and pebble are very similar.
BenchmarkRetrieve/BadgerStorage-10 1217998 948.1 ns/op
BenchmarkRetrieve/PebbleStorage-10 1699320 725.4 ns/op
However, for writes, the benchmark shows pebble is 100x slower than badger.
I initially did:
cd storage/operation
go test -bench=BenchmarkUpsert
goos: darwin
goarch: arm64
pkg: github.com/onflow/flow-go/storage/operation
BenchmarkUpsert/BadgerStorage-10 31804 35173 ns/op
BenchmarkUpsert/PebbleStorage-10 270 4267359 ns/op
PASS
ok github.com/onflow/flow-go/storage/operation 4.886s
I tracked down to this Commit
call by adding the following code.
n1 := time.Now()
err := b.batch.Commit(pebble.Sync)
n2 := time.Now()
fmt.Println("pebbleimpl.Writer.go: Commit() time:", n2.Sub(n1).Nanoseconds())
Then I did the same to badger to record the duration for the Commit
call, and run the TestReadWrite
test case which writes a single entity data to the database, the result shows pebble is 25x slower than badger for writes (read performance is similar). I'm not sure why.
cd storage/operation
gt -run=TestReadWrite
badgerimpl.Writer.go: Commit() time: 147250
badgerimpl.Writer.go: Commit() time: 71458
badgerimpl.Writer.go: Commit() time: 72708
pebbleimpl.Writer.go: Commit() time: 3684875
pebbleimpl.Writer.go: Commit() time: 3006416
pebbleimpl.Writer.go: Commit() time: 3001083
badgerimpl.Writer.go: Commit() time: 6500
pebbleimpl.Writer.go: Commit() time: 584
PASS
ok github.com/onflow/flow-go/storage/operation 1.136s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the benchmarks and finding this regression @zhangchiqing 🙇
100x slower writes is a big difference! 😬
I think it would be worthwhile to spend some time better understanding this finding and investigating how it would impact Flow (for example, the consensus hot-path). My gut feeling is that this benchmark in isolation is overstating the actual impact we would see. If it isn't, and switching to Pebble is going to make all our writes 2 orders of magnitude slower, I'd rather know that now.
- Are there Pebble configs we can change to bring write speed down?
- Does this result hold if aspects of the environment are changed (eg. maybe it's an artifact of the test database being very small? Just guessing)
- Does this result hold if aspects of the load are changed (eg. committing a batch containing many writes)
I spent a short time during the review trying a few different iterations on BenchmarkUpsert
:
- Setting
WithKeepL0InMemory
to false (previously was true). - Setting
WithValueThreshold
to 1 byte (previously was 1kb).- Badger is different from Pebble in that it will sometimes store values directly in the LSM tree, and other times in a value log file, based on the value's size. All the values we were storing were below that 1kb threshold.
- Using larger 10kb values
Unfortunately, all of these had similar results where Pebble was still ~100x slower...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this might take some time to investigate. I will do it in a separate issue. But it should not block us from getting in with this PR, since it also allows us to move from badger transaction to badger batch updates.
Regarding the regression, Peter suspect it might be to do with the key-compression and decompression, since he did a profiling in the past and saw a lot of CPU cycles are spent on that. So might worth to try disabling compression in pebble and see if it makes a difference.
87dc8e1
to
b82106a
Compare
5c99655
to
257bde0
Compare
257bde0
to
bbfb3a1
Compare
storage/operation/reads_test.go
Outdated
|
||
func TestTraverse(t *testing.T) { | ||
dbtest.RunWithStorages(t, func(t *testing.T, r storage.Reader, withWriter dbtest.WithWriter) { | ||
keys := [][]byte{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would probably be easier to work with as a map from key -> value. If we define the key type as [2]byte
, and then do key[:]
when inserting it, that would work.
Feel free to skip this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I like the way you structured testing so we can easily test both database backends together 💯
Besides the suggestions above, could you:
- Make sure exported types have at least a basic godoc (copying the interface documentation to structs that implement it works). I added suggestions about this in a few places, but not all.
- Make sure all public functions that return errors document any expected error types, or that "no errors are expected during normal operation".
284750a
to
93fcaa1
Compare
77b44d1
to
17972e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎸
} | ||
|
||
// Writer is an interface for batch writing to a storage backend. | ||
// It cannot be used concurrently for writing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// It cannot be used concurrently for writing. | |
// One Writer instance cannot be used concurrently by multiple goroutines. | |
// However, many goroutines can write concurrently, each using their own Writer instance. |
// other errors are exceptions | ||
// | ||
// The caller should not modify the contents of the returned slice, but it is | ||
// safe to modify the contents of the argument after Get returns. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// safe to modify the contents of the argument after Get returns. The | |
// safe to modify the contents of the `key` argument after Get returns. The |
// IterItem returns the current key-value pair, or nil if done. | ||
IterItem() IterItem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we maybe just call it Item
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Leo,
thanks for putting this PR together. I have tried my best given my time constraints to go over what I considered the most correctness-critical details and the API. It is not a full review, but I wanted to help as much as I can with ensuring correctness of the implementation and its API-safety & API-clarity.
From my perspective, there aren't any major problems, but some areas where I would request iterations
- Most of my comments are suggestions to refine some of the documentation, which is hopefully quick.
- I think we still have some minor edge cases that need to be fixed (I marked those comments with "
⚠️ "). Very well possible, that I misunderstood the code and contrary to my intuition, the code is correct. Though, if you agree that there are problems, please make sure to include tests that cover these areas in detail. - Please make sure you enforce in the implementation that
startPrefix ≤ endPrefix
and the implementation errors. I think this should be enforced on the lowest level of the implementation:badgerimpl.dbReader
andpebbleimpl.dbReader
as the Iterator-constructorsNewIter
are publicly accessible. This check is so cheap that I would rather have it replicated than not applied in one of the many code paths. - I find the API of
Exists
,Retrieve
,FindHighestAtOrBelow
in theoperations
package very convoluted. See my last two comments (here and here) including suggestions on how to improve this.
// Iterator is an interface for iterating over key-value pairs in a storage backend. | ||
type Iterator interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend to document the convention about upper and lower bound in the interface. I think the upper and lower bounds are much more than an implementation detail, because they determine the output of functions defined in the interface. While you document some of that in NewIter
below, I would recommend to primarily document this convention in the interface that is supposed to adhere to this convention.
Iterator walks over all key value pairs
(k,v)
, where the keyk
starts with a prefix in the range[a,b]
(both bounds inclusive), witha ≤ b
and an entry with keyk
exists in the database. Key-value pairs are visited in lexicographical order wrt the keyk
. If key-value pairs for the lower bounda
and/or upper boundb
exist, they are visited by the iterator.
- not sure about the requirement
a ≤ b
, but I think it is worth-while to explicitly document. But given howStartEndPrefixToLowerUpperBound
is implemented, I think it is a correctness requirement. If my interpretation is correct, constructors must enforce this!
} | ||
|
||
// PrefixUpperBound returns a key K such that all possible keys beginning with the input prefix | ||
// sort lower than K according to the byte-wise lexicographic key ordering used by Pebble. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also apply this to badger. Hence, I think we can remove the explicit mentioning of Pebble?
// sort lower than K according to the byte-wise lexicographic key ordering used by Pebble. | |
// sort lower than K according to the byte-wise lexicographic key ordering. |
// Valid returns whether the iterator is positioned at a valid key-value pair. | ||
func (i *badgerIterator) Valid() bool { | ||
// if it's beyond the upper bound, it's invalid | ||
if !i.iter.Valid() { | ||
return false | ||
} | ||
key := i.iter.Item().Key() | ||
// "< 0" means "key < upperBound" | ||
valid := bytes.Compare(key, i.upperBound) < 0 | ||
return valid | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ ❓❗
I am worried this might be incorrect:
- my understanding is that the
StartEndPrefixToLowerUpperBound
function should translate the prefix bounds to bounds on the full key values:lowerBound, upperBound := storage.StartEndPrefixToLowerUpperBound(startPrefix, endPrefix) - If we specify
startPrefix, endPrefix := []byte{0x10}, []byte{0xff}
the functionStartEndPrefixToLowerUpperBound
will turn this intonil
(empty list) for theendPrefix
. - the nil slice is lexicographically smaller than any non-empty slice
⚠️
// Next advances the iterator to the next key-value pair. | ||
Next() | ||
|
||
// IterItem returns the current key-value pair, or nil if done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
badger's Item()
implementation specifies:
⚠️ This item is only valid until Next() gets called.
I think that is a safety critical constrained, which should be highlighted in the documentation!
// if it's beyond the upper bound, it's invalid | ||
if !i.iter.Valid() { | ||
return false | ||
} | ||
key := i.iter.Item().Key() | ||
// "< 0" means "key < upperBound" | ||
valid := bytes.Compare(key, i.upperBound) < 0 | ||
return valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what is the difference between
flow-go/storage/operation/badgerimpl/iterator.go
Lines 44 to 47 in 17972e0
// if it's beyond the upper bound, it's invalid | |
if !i.iter.Valid() { | |
return false | |
} |
you are documenting in line 44:
if it's beyond the upper bound, it's invalid
so why do we do the check again for the upper bound:
flow-go/storage/operation/badgerimpl/iterator.go
Lines 48 to 51 in 17972e0
key := i.iter.Item().Key() | |
// "< 0" means "key < upperBound" | |
valid := bytes.Compare(key, i.upperBound) < 0 | |
return valid |
Please document the answer in the implementation (not as a reply to my PR comment). Thanks
// Exists returns true if a key exists in the database. | ||
// No errors are expected during normal operation. | ||
func Exists(key []byte, keyExists *bool) func(storage.Reader) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this function's signature challenging - specifically that we input a pointer to a bool
that is only set once the returned function is executed. I am sure you have good reasons for implementing it that way. Personally, I would prefer to keep code with this many indirections out of the production code base -- if possible.
If Exists
is only used for testing, maybe move it into storage/operation/dbtest
package and lets make sure we document it in detail:
// Exists returns true if a key exists in the database. | |
// No errors are expected during normal operation. | |
func Exists(key []byte, keyExists *bool) func(storage.Reader) error { | |
// Exists takes a key and a pointer to an a boolean variable `keyExists` as inputs and returns an function. | |
// When this returned function is executed (and only then), it will write into the `keyExists` whether | |
// the key exists. | |
// No errors are expected during normal operation. | |
func Exists(key []byte, keyExists *bool) func(storage.Reader) error { |
If we want to keep this functionality as part of the production implementation, I would personally find it much simpler to just include the Exists
method in the reader interface. Yes, we would duplicate the method across the two implementations - nevertheless, I would find this worth-while to remove API complexity. Suggesting something like this:
// Exists checks whether a key-value pair with the given key exists. The function returns nil,
// if the key exists and `storage.ErrNotFound` if the key does not exist.
// No other errors are expected during normal operation.
func (b dbReader) Exists(key []byte) error {
_, closer, err := b.Get(key)
_ = closer.Close()
// we only expect err to be nil or storage.ErrNotFound; all other cases are symptoms of an irrecoverable exception
if err != nil && (!errors.Is(err, storage.ErrNotFound)) {
return irrecoverable.NewExceptionf("could not load data: %w", err)
}
return err
}
// Retrieve will retrieve the binary data under the given key from the database | ||
// and decode it into the given entity. The provided entity needs to be a | ||
// pointer to an initialized entity of the correct type. | ||
// Error returns: | ||
// - storage.ErrNotFound if the key does not exist in the database | ||
// - generic error in case of unexpected failure from the database layer, or failure | ||
// to decode an existing database value | ||
func Retrieve(key []byte, entity interface{}) func(storage.Reader) error { | ||
return func(r storage.Reader) error { | ||
val, closer, err := r.Get(key) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
defer closer.Close() | ||
|
||
err = msgpack.Unmarshal(val, entity) | ||
if err != nil { | ||
return irrecoverable.NewExceptionf("could not decode entity: %w", err) | ||
} | ||
return nil | ||
} | ||
} | ||
|
||
// FindHighestAtOrBelow finds the highest key with the given prefix and | ||
// height equal to or below the given height. | ||
func FindHighestAtOrBelow(prefix []byte, height uint64, entity interface{}) func(storage.Reader) error { | ||
return func(r storage.Reader) error { | ||
if len(prefix) == 0 { | ||
return fmt.Errorf("prefix must not be empty") | ||
} | ||
|
||
key := append(prefix, EncodeKeyPart(height)...) | ||
it, err := r.NewIter(prefix, key, storage.DefaultIteratorOptions()) | ||
if err != nil { | ||
return fmt.Errorf("can not create iterator: %w", err) | ||
} | ||
defer it.Close() | ||
|
||
var highestKey []byte | ||
// find highest value below the given height | ||
for it.First(); it.Valid(); it.Next() { | ||
highestKey = it.IterItem().Key() | ||
} | ||
|
||
if len(highestKey) == 0 { | ||
return storage.ErrNotFound | ||
} | ||
|
||
// read the value of the highest key | ||
val, closer, err := r.Get(highestKey) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
defer closer.Close() | ||
|
||
err = msgpack.Unmarshal(val, entity) | ||
if err != nil { | ||
return irrecoverable.NewExceptionf("failed to decode value: %w", err) | ||
} | ||
|
||
return nil | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯
similarly with Retrieve
and FindHighestAtOrBelow
: I think the API is way too complicated. Just include those in the interface.
If you are concerned about duplicating code in the implementation for pebble and badger, you can use the decorator pattern:
- the
badgerimpl.dbReader
andpebbleimpl.dbReader
only implementGet
andNewIter
, while the interface would additionally demand the extra methodsExists
,Retrieve
,FindHighestAtOrBelow
- constructors for
badgerimpl.dbReader
andpebbleimpl.dbReader
could then wrap their concrete type with an implementation-agnostic decorator that addsExists
,Retrieve
,FindHighestAtOrBelow
Its a bit more effort on the framework, but I think it is worth-while for removing the convoluted indirections of returning a functor that sets an input in the original method, which generated the functor 🤯. Essentially the same API simplification as for the Exists
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the API of returning functions is complicated. It's worth noting that Leo is creating these APIs in this way so that they will be drop-in compatible with our existing database code, which all uses this pattern:
flow-go/storage/badger/operation/payload.go
Lines 9 to 15 in 164fa95
func InsertSeal(sealID flow.Identifier, seal *flow.Seal) func(*badger.Txn) error { | |
return insert(makePrefix(codeSeal, sealID), seal) | |
} | |
func RetrieveSeal(sealID flow.Identifier, seal *flow.Seal) func(*badger.Txn) error { | |
return retrieve(makePrefix(codeSeal, sealID), seal) | |
} |
We aren't introducing new APIs here. We're replicating APIs that already exist to make the transition smoother.
The reason the existing database code uses this pattern is to match Badger's API, for example:
func (db *DB) View(fn func(txn *Txn) error) error
I think the interface changes you're suggesting here are good ideas, but implementing them during the transition, while supporting both database backends, would be burdensome.
var highestKey []byte | ||
// find highest value below the given height | ||
for it.First(); it.Valid(); it.Next() { | ||
highestKey = it.IterItem().Key() | ||
} | ||
|
||
if len(highestKey) == 0 { | ||
return storage.ErrNotFound | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next
, the previous IterItem()
might not be valid anymore and since key
only returns a slice reference, whose underlying array might be re-used by the database (see also my comment here for context). My worry is that the same problem might extend to highestKey
.
Furthermore, I would prefer to use the Valid
method that the implementation provides to test for validity instead of len(highestKey) == 0
.
// FindHighestAtOrBelow finds the highest key with the given prefix and | ||
// height equal to or below the given height. | ||
func FindHighestAtOrBelow(prefix []byte, height uint64, entity interface{}) func(storage.Reader) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood this function at first. Suggesting to refine the documentation:
// FindHighestAtOrBelow finds the highest key with the given prefix and | |
// height equal to or below the given height. | |
func FindHighestAtOrBelow(prefix []byte, height uint64, entity interface{}) func(storage.Reader) error { | |
// FindHighestAtOrBelow is for database entries that are indexed by block height. It is suitable to search | |
// keys with the format prefix` + `height` (where "+" denotes concatenation of binary strings). The height | |
// is encoded as Big-Endian (entries with numerically smaller height have lexicographically smaller key). | |
// The function finds the *highest* key with the given prefix and height equal to or below the given height. | |
func FindHighestAtOrBelow(prefix []byte, height uint64, entity interface{}) func(storage.Reader) error { |
} | ||
|
||
// RemoveByPrefix removes all keys with the given prefix defined by [startPrefix, endPrefix] (both inclusive). | ||
// If no keys exist with the given prefix, this is a no-op. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If no keys exist with the given prefix, this is a no-op. | |
// We require that startPrefix ≤ endPrefix (otherwise this function errors). | |
// If no keys exist with the given prefix, this is a no-op. |
This PR implemented the low level database operations that can be shared by both badger and pebble implementation.
Since badger operations will be refactored to use batch updates instead of transactions, this makes the badger db operations and pebble db operations very similar and I managed to unify them.
A separate PR that refactors the approvals will demonstrate how the unified database operations can be used there. (see #6466)
This allows us to remove both the badger-specific and pebble-specific database modules, and only keep one universal version, easier to maintain.