-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.33 - Add pebble-based storage layer files #6207
Conversation
return irrecoverable.NewExceptionf("could not load data: %w", err) | ||
} | ||
*keyExists = true | ||
defer closer.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets be mindful that golang's defer
have a small but non-negligible performance overhead (to the best of my knowledge ... which might already be outdated). We have a few instances in the code where it is trivially apparent how to avoid the defer, like this one:
defer closer.Close() | |
closer.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was curious so I benchmarked it: https://gist.github.com/jordanschalm/d4d268a39c6b2197a0c0882a4ca4dab5.
It seems like it has improved a lot in the past few years. About 5 years ago, it was 15x slower (~50ns) in a minimal benchmark. Now it's about 2x slower (2ns).
Agree with removing it in this case, but the performance overhead is negligible now for almost any use-case.
defer closer.Close() | ||
|
||
err = msgpack.Unmarshal(val, &sc) | ||
err = msgpack.Unmarshal(val, sc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer
can be removed by shifting the Close
call below line 96
defer closer.Close() | |
err = msgpack.Unmarshal(val, &sc) | |
err = msgpack.Unmarshal(val, sc) | |
err = msgpack.Unmarshal(val, sc) | |
closer.Close() |
|
||
// checkFunc is called during key iteration through the badger DB in order to | ||
// check whether we should process the given key-value pair. It can be used to | ||
// avoid loading the value if its not of interest, as well as storing the key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// avoid loading the value if its not of interest, as well as storing the key | |
// avoid loading the value if it's not of interest, as well as storing the key |
@@ -10,14 +12,71 @@ import ( | |||
"github.com/onflow/flow-go/storage" | |||
) | |||
|
|||
type ReaderBatchWriter struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I would very appreciate at least minimal godoc for the methods. While not exported, most of the methods are called by other structs within the pebble operation
package. Hence, I think documentation would be very useful.
type ReaderBatchWriter struct { | |
// ReaderBatchWriter is a batch of storage writes that also exposes functionality (see method | |
// `IndexedBatch`) for reading arbitrary keys as if the batch write had already been applied to | |
// the database. Note that this requires indexing and tracking the pending writes, hence adding | |
// pending writes to a `ReaderBatchWriter` is significantly slower than inserting into a | |
// non-indexed batch. Only use an indexed batch if you require reading from it. | |
type ReaderBatchWriter struct { |
LowerBound: prefix, | ||
UpperBound: append(prefix, ffBytes...), | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// } | ||
func noStore[K comparable, V any](_ K, _ V) func(pebble.Writer) error { | ||
return func(pebble.Writer) error { | ||
type storeFunc[K comparable, V any] func(key K, val V) func(storage.PebbleReaderBatchWriter) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pebble.Batch
includes the following in its documentation
Batch implements the Reader interface, but only an indexed batch supports reading [...]
The batch index is implemented via an skiplist (internal/batchskl). While the skiplist implementation is very fast, inserting into an indexed batch is significantly slower than inserting into a non-indexed batch. Only use an indexed batch if you require reading from it.
Essentially for all our storage abstractions we use indexed batches:
batch: db.NewIndexedBatch(), |
so we are always paying the performance cost for supporting reads despite rarely using that feature.
// BatchStorage serves as an abstraction over batch storage, adding ability to add ability to add extra | ||
// callbacks which fire after the batch is successfully flushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is very unlcear due to broken sentence structure and strange repetitions.
AddCallback(func()) | ||
} | ||
|
||
func OnlyWriter(fn func(pebble.Writer) error) func(PebbleReaderBatchWriter) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename this to onlyWrites
} | ||
|
||
// BatchStorage serves as an abstraction over batch storage, adding ability to add ability to add extra | ||
// callbacks which fire after the batch is successfully flushed. | ||
type BatchStorage interface { | ||
GetWriter() *badger.WriteBatch | ||
GetWriter() BatchWriter | ||
GetReader() Reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think method GetReader()
is used anywhere (at least my IDE claims this). Can we remove it?
} | ||
|
||
// Event retrieval does not guarantee any order, | ||
// Hence, we a sort the events for comparing the expected and actual events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
// Hence, we a sort the events for comparing the expected and actual events. | |
// Hence, we sort the events for comparing the expected and actual events. |
} | ||
|
||
// index the transaction IDs within the collection | ||
txIDs := payload.Collection.Light().Transactions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already computed payload.Collection.Light()
above. As this has a non-negligible computational cost, we should not repeat it here,
// cluster payloads only contain a single collection, allow duplicates, | ||
// because it is valid for two competing forks to have the same payload. | ||
light := payload.Collection.Light() | ||
err := operation.InsertCollection(&light)(tx) | ||
if err != nil { | ||
return fmt.Errorf("could not insert payload collection: %w", err) | ||
} | ||
|
||
// insert constituent transactions | ||
for _, colTx := range payload.Collection.Transactions { | ||
err = operation.InsertTransaction(colTx.ID(), colTx)(tx) | ||
if err != nil { | ||
return fmt.Errorf("could not insert payload transaction: %w", err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that payload.Collection.Light()
already computes the IDs of the transactions. This has non-negligible cost due to serialization and potentially hashing somewhat larger data structures.
Lets not needlessly repeat that here. The colTx.ID()
in line 172 is already pre-computed in light
func InsertExecutedBlock(blockID flow.Identifier) func(pebble.Writer) error { | ||
return insert(makePrefix(codeExecutedBlock), blockID) | ||
} | ||
|
||
func RetrieveExecutedBlock(blockID *flow.Identifier) func(pebble.Reader) error { | ||
return retrieve(makePrefix(codeExecutedBlock), blockID) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear what this is used for due to lacking documentation. Given that we are just storing some blockID, this seems to be vulnerable to stale writes.
Close in favor of #6197 |
This PR cherry-picked changes from #6197.
The changes picked are unused files or changes that won't change the existing behavior. Merging these changes in order to minimize the code changes to be maintained on a feature branch.