Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

bmt, param: Introduce SectionHasher interface, implement in bmt #2021

Merged
merged 16 commits into from
Feb 8, 2020

Conversation

nolash
Copy link
Contributor

@nolash nolash commented Dec 10, 2019

This PR is part of a series of PRs that introduces an interface that allows chaining of components that receive a data stream and generate hashes and intermediate Merkle-Tree chunks. The individual PR steps will be partitioned from #2022 (branch https://github.com/nolash/swarm/tree/filehasher-avenged ) as follows:

  1. Introduce SectionWriter, implement this interface in bmt, make AsyncHasher standalone (this PR)
  2. Move AsyncHasher to file/hasher
  3. Add reference implementation of the Filehasher algorithm
  4. Add implementation of SectionWriter sub-component for hashing intermediate Merkle Tree levels.
  5. Add implementation of SectionWriter component executing the FileHasher algorithm
  6. Add a "splitter" that bridges io.Reader and SectionWriter, and an implementation of SectionWriter component that provides Chunk output.
  7. Add implementation of SectionWriter that provides encryption, along with a test utility SectionWriter implentation of a data cache.
  8. Evaluate and prune bmt.Hasher exports wtr AsyncHasher

Introduce SectionWriter interface and implement this interface in bmt

The objectives of this PR are:

  • Introduce the interface
  • Implement interface in bmt.Hasher
  • Enable use of bmt.Hasher by using only the hash.Hash interface
  • Prepare for moving AsyncHasher to separate package
  • Avoid any dependencies on storage.SwarmHash outside the storage package

SectionWriter interface

The interface is defined in the package /file

type SectionWriter interface {
        hash.Hash
        SetWriter(hashFunc SectionWriterFunc) SectionWriter
        SetLength(length int)
        SetSpan(length int)
        SectionSize() int
        Branches() int
}

hash.Hash

Essentially the FileHasher is a hashing operation. Thus it makes sense that the components can be used through the same interface as other hashing components provided in golang.

SetWriter

Chains SectionWriter to a subsequent SectionWriter. It should be optional for the SectionWriter to provide chaning. The method is itself chainable.

SetSpan

Sets the "span," meaning the amount of data represented by the data written to the SectionWriter. Eg. the references constituting the data of an intermediate chunk "repesents" more data than the actual data bytes. For bmt.Hasher this was previously provided by the ResetWithLength call, and lack of a separate way of setting the span made it impossible to use bmt.Hasher with a pure hash.Hash interface.

SectionSize

Informs the caller about the underlying SectionSize of the SectionWriter. In some cases this will be the same as for the chained SectionWriter, in some cases the SectionWriter may buffer and/or pad data, and translate the SectionSize accordingly.

Branches

Informs the caller about the underlying Branches a.k.a. branch-factor, with same rationale as for SectionSize above.

bmt implementations

Neither bmt implementation currently provides any chaining, and will raise errors on calls to SetWriter.

bmt.Hasher

Can now be used as hash.Hash, where the span is merely calcuated from the amount of bytes written to it. If a different span is needed, the SetSpan method can be used.

Since the SetLength call has no practical utility for bmt.Hasher currently, it is ignored.

Exports are added to make it possible to move AsyncHasher to a separate package. Excess exports will be pruned later.

bmt.AsyncHasher

bmt.AsyncHasher is now ready to be moved to a separate package. It`s left in place for this PR to make it easy to see the changes that were made.

WriteIndexed and SumIndexed replace the original Write and Sum calls. It can still be used as a bmt.Hasher (and thus hash.Hash) transparently by using the usual Write and Sum calls.

storage.SwarmHash

ResetWithLength in storage.SwarmHash interface has been changed to SetSpanBytes. bmt.Hasher provides this method, which performs the same function as SetSpanalbeit with 8-byte serialized uint instead.


By the way, a bug was unearthed through the reworking of the bmt, in which the hash result for zero-length data was different between RefHasher and bmt.Hasher (but not bmt.AsyncHasher). This has been fixed.

Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed only the technical aspects. I will review functional changes in another round.

param/io.go Outdated Show resolved Hide resolved
param/hash.go Outdated Show resolved Hide resolved
bmt/bmt.go Outdated Show resolved Hide resolved
bmt/bmt.go Outdated Show resolved Hide resolved
bmt/bmt.go Outdated Show resolved Hide resolved
bmt/bmt.go Show resolved Hide resolved
bmt/bmt.go Show resolved Hide resolved
bmt/bmt.go Show resolved Hide resolved
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely brilliant.

//t.GetSection() = make([]byte, sw.secsize)
//copy(t.GetSection(), section)
// TODO: Consider whether the section here needs to be copied, maybe we can enforce not change the original slice
copySection := make([]byte, sw.secsize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the copying not part of SetSection then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no member in tree that remembers the section size, so either we must add a member or we must pass it with the function. The latter seems clumsy.

In general I think it's a good idea to introduce as few side-effects in the lower level components as possible; the tree component could be used without any copying, after all.

bmt/bmt.go Outdated Show resolved Hide resolved
bmt/bmt.go Outdated Show resolved Hide resolved
bmt/bmt.go Show resolved Hide resolved
bmt/bmt.go Show resolved Hide resolved
@@ -346,11 +362,16 @@ func testHasherCorrectness(bmt *Hasher, hasher BaseHasherFunc, d []byte, n, coun
if len(d) < n {
n = len(d)
}
binary.BigEndian.PutUint64(span, uint64(n))
binary.LittleEndian.PutUint64(span, uint64(n))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why LittleEndian suddenly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember off the top of my head, but at least it's same as in storage/types.go?

bmt/bmt_test.go Show resolved Hide resolved
@@ -93,7 +93,8 @@ func GenerateRandomChunk(dataSize int64) Chunk {
sdata := make([]byte, dataSize+8)
rand.Read(sdata[8:])
binary.LittleEndian.PutUint64(sdata[:8], uint64(dataSize))
hasher.ResetWithLength(sdata[:8])
hasher.Reset()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now this is called twice

Copy link
Contributor Author

@nolash nolash Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I don't understand what you mean? You mean since we actually construct the hasher then Reset is redundant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@zelig
Copy link
Member

zelig commented Jan 10, 2020

PR is great
fix spurious return as per https://travis-ci.org/ethersphere/swarm/jobs/626578539#L495

the plan also looks great.

Make sure there is a way for erasure coding section writer will have access to the child chunk data in order to generate parity data chunks or is there a better way?

@nolash
Copy link
Contributor Author

nolash commented Feb 3, 2020

@zelig One of the async tests suddenly failed locally before the last commit adc45db . I will have to investigate :/

@nolash
Copy link
Contributor Author

nolash commented Feb 7, 2020

Benchmarks are fine after closer inspection. Thanks to @janos for hint on stabilizing benchmark results.

Copy link
Contributor

@pradovic pradovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM implementation wise, nice work! Business logic wise, I am not 100% sure as I don't have big experience about this part yet. Left just one minor question.

@@ -151,7 +151,8 @@ func TestSha3ForCorrectness(t *testing.T) {
rawSha3Output := rawSha3.Sum(nil)

sha3FromMakeFunc := MakeHashFunc(SHA3Hash)()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌷 I know it's not part of this PR, but why not use constructor for Hasher instead of a func? If the func is needed maybe the builder can be extracted instead of this?

Copy link
Contributor Author

@nolash nolash Feb 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradovic The SwarmHash needs the length of the data it represents prepended as an 64-bit value. The BMT hash has this builtin, and we extend the other types with HashWithLength to allow setting the length (SetSpanBytes, see storage/swarmhasher.go)

@nolash nolash merged commit ac0845d into ethersphere:master Feb 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants