Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't pin large files uploaded with chunks #4864

Open
tmm360 opened this issue Oct 12, 2024 · 8 comments
Open

Can't pin large files uploaded with chunks #4864

tmm360 opened this issue Oct 12, 2024 · 8 comments
Assignees
Labels
needs-triaging new issues that need triaging

Comments

@tmm360
Copy link
Contributor

tmm360 commented Oct 12, 2024

Context

Bee 2.2.0

Summary

Even using tags to upload with sessions, is not possible to pin large files providing all the chunks.

Expected behavior

I expect when using tags that chunks are preserved on node until the tag is released. A pinning request on root hash with tag alive, and all chunks uploaded, should succeed.

Actual behavior

Instead, also with tag alive, is possible that successive pinning request will fail returning error 404, and making pinning with chunks actual impossible.

Steps to reproduce

Try to upload a large file with chunks on a connected node, using tags. A successive request to pin the root may fail with 404 error code.

Possible solution

@tmm360 tmm360 added the needs-triaging new issues that need triaging label Oct 12, 2024
@ldeffenb
Copy link
Collaborator

ldeffenb commented Oct 12, 2024

Try using the swarm-pin: true header on your upload. That will pin all chunks as soon as they are created in the uploading node, regardless of the results from the push operation.

AFAIK, tags have nothing to do with chunk retention but only serve as a status feedback with counters showing the state of the upload. Tags do not affect uploading, pushing, propagation, retention, nor pinning from all of the source code that I've read. Seems like a case of mis-understanding what tags are.

Oh, and one advantage of the swarm-pin header is that the uploading node will pin ALL chunks which is important if you are using a non-zero swarm-redundancy-level. I don't know of anything (yet) that provides any visibility to the redundant root chunks of an upload except pinning them with swarm-pin: true.

@tmm360
Copy link
Contributor Author

tmm360 commented Oct 13, 2024

Thanks, your are right! Probably there is a misunderstanding of what really a tag is. But this rise other questions:

  • What's the mean of having tags on single chunks upload, at this point? Chunks are atomic elements, and I already know when a tag has been processed or not
  • A feature to keep all the chunks on node until the root chunk is pinned is missing
  • If I pin a file uploaded with bzz, the pin is made on root, and a traversal process will try to find all the child chunks, to keep them recursively. But this obviously doesn't happen pinning single chunks. When node tries to prune a chunk from local storage, I understand that if a parent chunk is pinned, the chunk itself will be kept. Is this valid also for chunks that has been pinned individually? Or in other words: the pinning tree of chunks is built during traversal, or is verified each time a chunk is checked for its removal?

@ldeffenb
Copy link
Collaborator

ldeffenb commented Oct 13, 2024

Remember, there's an ability to explicitly create a tag (https://docs.ethswarm.org/api/#tag/Tag/paths/~1tags/post) and supply it on each individual /chunks upload with the swarm-tag header. This allows all of the individual uploads to be tracked and accounted for with a single tag. I use this feature on the /bytes uploads to aggregate a bunch of individual files into a single tracking tag.

You are correct that there is no support for pinning individual chunks in a node. IMHO, this is lacking in the current pinning implementation.

When using the swarm-pin: true header on a /bzz upload, it only appears that the pin is made on the root. In the actual implementation, each individual generated chunk is invisibly pinned and these pins are gathered together under the final root reference which becomes visible in the /pins query API. But when using swarm-pin: true, no traversal is required.

But if you do a /bzz or /bytes upload without the swarm-pin header, then you are correct, a full recursive traversal must succeed to create the desired pin. And if any of those component chunks are unavailable for any reason, the /pins request will fail.

And a detail on the implementation of pruning, in actual fact the pinning store is a parent/child relationship with the root reference as the parent and every single chunk reference as the children. The prune only needs to check if the chunk is in the pin children store. Well, it doesn't really check, but the sharky store has a reference counter that prevents pinned chunks from being pruned, even if they are no longer in the reserve or cache. Pinned chunks are retained in the sharky index lookup until their reference count goes to zero (eg. when they are unpinned and also not in any other store). I am currently using this feature in a 2.1.0 node to be able to access locally pinned OSM chunks whose stamp has already expired.

I did implement a hack on the GET /pins/{reference} API to return information on all child chunks under the parent reference, so I know that data structure all too well.

Oh, and one other difference between swarm-pin: true during the upload and a POST /pins/{reference} on the resulting root is that the former pins all root redundancy chunks and the latter only seems to get the traversal-discovered chunks which doesn't include (at this time) the root redundancy SOCs. Also a problem with the current /pins implementation (and /stewardship) IMHO.

@ldeffenb
Copy link
Collaborator

ldeffenb commented Oct 13, 2024

ldeffenb@4ad599f is the commit of my enhanced /pins API support. Note that the /pins/{reference} is very hacky in that I disabled most of the information that I originally returned because I haven't brought the required lower-level support over from my 2.1.0-hacks yet.

This commit also adds offset and limit support to the /pins query similar to that used in the /tag query. This was required for me to query all of the pinned files without exhausting the node's memory.

https://docs.ethswarm.org/api/#tag/Tag/paths/~1tags/get

@ldeffenb
Copy link
Collaborator

Oh, and if you want to see the /pins and /stewardship APIs break at their finest, upload an individual /chunks that "smells" like a manifest node or a root BMT chunk, but where none of the chunks referenced by that node exist. Then try to POST /pins/{reference} that chunk. It will always fail because the traverser cannot iterate through it completely. Ugly IMHO.

@tmm360
Copy link
Contributor Author

tmm360 commented Oct 13, 2024

Thanks for your assistance! Unfortunately to me, I don't have advanced Go knowledge. I'm just learning to read it, so I've tried to understand something about implementations from code, but to get details would require too much time at moment.

I'm trying to implement with C# a reverse proxy with chunk cache on db to implement chunks pinning, because is a faster solution to me. A lot of code has already been implemented with my bee.net project. At least I can debug on-fly any kind of problem I will encounter.
This will solve also other problems like pinning corruption when node crash for some reason...

Anyway, I leave open this issue because a better pinning implementation is really needed

@martinconic
Copy link
Contributor

ith chunk cache on

There is and open PR for repair pin #4836

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triaging new issues that need triaging
Projects
None yet
Development

No branches or pull requests

4 participants
@ldeffenb @tmm360 @martinconic and others