Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create base storage zome to make up for Holochain deficiencies. #183

Open
adaburrows opened this issue Mar 30, 2024 · 0 comments
Open

Create base storage zome to make up for Holochain deficiencies. #183

adaburrows opened this issue Mar 30, 2024 · 0 comments

Comments

@adaburrows
Copy link
Collaborator

While https://github.com/holochain-open-dev/file-storage exists and it is a decent proof of concept, it has a little too much focus on storing files specifically. We need something that stores binary blobs which could be used as a basis for either storing UI components, larger pieces of data in the space zome, or files.

blob - the complete binary data we want to store
sub-blob - a chunk of the binary blob

At the minimum we need something that stores a sequence of sub-blobs to the the DHT in a way that is easy to verify the data was properly reconstructed.

At the least we need to store the size, a SHA-256 hash, and an ordered list of blobs.

At most we create a modified Merkle tree as described in Haber & Stornetta, 1997 with the header being descriptive of the size of the data and pointing to the Merkle tree where the leaf nodes point to the sub-blobs (since they are content addressed by hash, they satisfy the requirements of the Merkle tree). This Merkle tree is very easy to implement in Holochain since each data type gets a hash for free, so we just need to implement a node structure which has two holohash entries and just build the tree up from the hashes of the sub-blobs. However, considering the way things are stored on the DHT, it might be more efficient to just store the whole Merkle tree of hashes as an array minus the leaf nodes, which can be stored in a separate array. That way validation of the whole Merkle tree can optionally happen.

The SHA-256 is probably good enough along with the list of sub-blob hashes. Anything more is feeling like overkill.

Any additional metadata about what it stored can be provided by the object that links to the root node of the blob.

Is there a good reason why splitting the blob into sub-blobs needs to happen on the client?

  • Thinking through this, it seems like client side is the most reliable way to ensure everything persists without failing. We can't upload the whole blob and save it to the source chain or some other scratch area we can access to then piece it out and store sub-blobs because fo size constraints. If we tried to upload a large file and it did fit in memory, we might not be able to store all the sub-blobs to the DHT before the request timed out.

If we have time, we should implement a service worker for chunking and uploading blobs along with the counterpart for fetching the chunks reconstructing and verifying the file. For other purposes, like streaming media, we will likely need to have a more relaxed way of fetching the chunks and appending them to the buffer.

For a streaming video example see: https://webtorrent.io/bundle.js (renderMedia); https://github.com/webtorrent/webtorrent/blob/master/lib/file.js
For MediaSource docs see: https://code.pieces.app/blog/the-media-source-extension-javascript-api-the-foundation-of-streaming-on-the-web; https://eyevinntechnology.medium.com/how-to-build-your-own-streaming-video-html-player-6ee85d4d078a

@adaburrows adaburrows converted this from a draft issue Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

1 participant