-
Notifications
You must be signed in to change notification settings - Fork 110
bmt, param: Introduce SectionHasher interface, implement in bmt #2021
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed only the technical aspects. I will review functional changes in another round.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely brilliant.
//t.GetSection() = make([]byte, sw.secsize) | ||
//copy(t.GetSection(), section) | ||
// TODO: Consider whether the section here needs to be copied, maybe we can enforce not change the original slice | ||
copySection := make([]byte, sw.secsize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the copying not part of SetSection then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no member in tree
that remembers the section size, so either we must add a member or we must pass it with the function. The latter seems clumsy.
In general I think it's a good idea to introduce as few side-effects in the lower level components as possible; the tree component could be used without any copying, after all.
@@ -346,11 +362,16 @@ func testHasherCorrectness(bmt *Hasher, hasher BaseHasherFunc, d []byte, n, coun | |||
if len(d) < n { | |||
n = len(d) | |||
} | |||
binary.BigEndian.PutUint64(span, uint64(n)) | |||
binary.LittleEndian.PutUint64(span, uint64(n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why LittleEndian suddenly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember off the top of my head, but at least it's same as in storage/types.go
?
@@ -93,7 +93,8 @@ func GenerateRandomChunk(dataSize int64) Chunk { | |||
sdata := make([]byte, dataSize+8) | |||
rand.Read(sdata[8:]) | |||
binary.LittleEndian.PutUint64(sdata[:8], uint64(dataSize)) | |||
hasher.ResetWithLength(sdata[:8]) | |||
hasher.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now this is called twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I don't understand what you mean? You mean since we actually construct the hasher then Reset
is redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
PR is great the plan also looks great. Make sure there is a way for erasure coding section writer will have access to the child chunk data in order to generate parity data chunks or is there a better way? |
Benchmarks are fine after closer inspection. Thanks to @janos for hint on stabilizing benchmark results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM implementation wise, nice work! Business logic wise, I am not 100% sure as I don't have big experience about this part yet. Left just one minor question.
@@ -151,7 +151,8 @@ func TestSha3ForCorrectness(t *testing.T) { | |||
rawSha3Output := rawSha3.Sum(nil) | |||
|
|||
sha3FromMakeFunc := MakeHashFunc(SHA3Hash)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌷 I know it's not part of this PR, but why not use constructor for Hasher instead of a func? If the func is needed maybe the builder can be extracted instead of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pradovic The SwarmHash
needs the length of the data it represents prepended as an 64-bit value. The BMT
hash has this builtin, and we extend the other types with HashWithLength
to allow setting the length (SetSpanBytes
, see storage/swarmhasher.go
)
This PR is part of a series of PRs that introduces an interface that allows chaining of components that receive a data stream and generate hashes and intermediate Merkle-Tree chunks. The individual PR steps will be partitioned from #2022 (branch https://github.com/nolash/swarm/tree/filehasher-avenged ) as follows:
SectionWriter
, implement this interface inbmt
, makeAsyncHasher
standalone (this PR)AsyncHasher
tofile/hasher
SectionWriter
sub-component for hashing intermediate Merkle Tree levels.SectionWriter
component executing the FileHasher algorithmio.Reader
andSectionWriter
, and an implementation ofSectionWriter
component that providesChunk
output.SectionWriter
that provides encryption, along with a test utilitySectionWriter
implentation of a data cache.bmt.Hasher
exports wtrAsyncHasher
Introduce
SectionWriter
interface and implement this interface inbmt
The objectives of this PR are:
bmt.Hasher
bmt.Hasher
by using only thehash.Hash
interfaceAsyncHasher
to separate packagestorage.SwarmHash
outside thestorage
packageSectionWriter
interfaceThe interface is defined in the package
/file
hash.Hash
Essentially the
FileHasher
is a hashing operation. Thus it makes sense that the components can be used through the same interface as other hashing components provided ingolang
.SetWriter
Chains
SectionWriter
to a subsequentSectionWriter
. It should be optional for theSectionWriter
to provide chaning. The method is itself chainable.SetSpan
Sets the "span," meaning the amount of data represented by the data written to the
SectionWriter
. Eg. the references constituting the data of an intermediate chunk "repesents" more data than the actual data bytes. Forbmt.Hasher
this was previously provided by theResetWithLength
call, and lack of a separate way of setting the span made it impossible to usebmt.Hasher
with a purehash.Hash
interface.SectionSize
Informs the caller about the underlying
SectionSize
of theSectionWriter
. In some cases this will be the same as for the chainedSectionWriter
, in some cases theSectionWriter
may buffer and/or pad data, and translate theSectionSize
accordingly.Branches
Informs the caller about the underlying
Branches
a.k.a. branch-factor, with same rationale as forSectionSize
above.bmt
implementationsNeither
bmt
implementation currently provides any chaining, and will raise errors on calls toSetWriter
.bmt.Hasher
Can now be used as
hash.Hash
, where the span is merely calcuated from the amount of bytes written to it. If a different span is needed, theSetSpan
method can be used.Since the
SetLength
call has no practical utility forbmt.Hasher
currently, it is ignored.Exports are added to make it possible to move
AsyncHasher
to a separate package. Excess exports will be pruned later.bmt.AsyncHasher
bmt.AsyncHasher
is now ready to be moved to a separate package. It`s left in place for this PR to make it easy to see the changes that were made.WriteIndexed
andSumIndexed
replace the originalWrite
andSum
calls. It can still be used as abmt.Hasher
(and thushash.Hash
) transparently by using the usualWrite
andSum
calls.storage.SwarmHash
ResetWithLength
instorage.SwarmHash
interface has been changed toSetSpanBytes
.bmt.Hasher
provides this method, which performs the same function asSetSpan
albeit with 8-byte serialized uint instead.By the way, a bug was unearthed through the reworking of the
bmt
, in which the hash result for zero-length data was different betweenRefHasher
andbmt.Hasher
(but notbmt.AsyncHasher
). This has been fixed.