Can't pin large files uploaded with chunks #4864

tmm360 · 2024-10-12T15:08:12Z

Context

Bee 2.2.0

Summary

Even using tags to upload with sessions, is not possible to pin large files providing all the chunks.

Expected behavior

I expect when using tags that chunks are preserved on node until the tag is released. A pinning request on root hash with tag alive, and all chunks uploaded, should succeed.

Actual behavior

Instead, also with tag alive, is possible that successive pinning request will fail returning error 404, and making pinning with chunks actual impossible.

Steps to reproduce

Try to upload a large file with chunks on a connected node, using tags. A successive request to pin the root may fail with 404 error code.

Possible solution

ldeffenb · 2024-10-12T22:52:48Z

Try using the swarm-pin: true header on your upload. That will pin all chunks as soon as they are created in the uploading node, regardless of the results from the push operation.

AFAIK, tags have nothing to do with chunk retention but only serve as a status feedback with counters showing the state of the upload. Tags do not affect uploading, pushing, propagation, retention, nor pinning from all of the source code that I've read. Seems like a case of mis-understanding what tags are.

Oh, and one advantage of the swarm-pin header is that the uploading node will pin ALL chunks which is important if you are using a non-zero swarm-redundancy-level. I don't know of anything (yet) that provides any visibility to the redundant root chunks of an upload except pinning them with swarm-pin: true.

tmm360 · 2024-10-13T14:50:51Z

Thanks, your are right! Probably there is a misunderstanding of what really a tag is. But this rise other questions:

What's the mean of having tags on single chunks upload, at this point? Chunks are atomic elements, and I already know when a tag has been processed or not
A feature to keep all the chunks on node until the root chunk is pinned is missing
If I pin a file uploaded with bzz, the pin is made on root, and a traversal process will try to find all the child chunks, to keep them recursively. But this obviously doesn't happen pinning single chunks. When node tries to prune a chunk from local storage, I understand that if a parent chunk is pinned, the chunk itself will be kept. Is this valid also for chunks that has been pinned individually? Or in other words: the pinning tree of chunks is built during traversal, or is verified each time a chunk is checked for its removal?

ldeffenb · 2024-10-13T20:02:10Z

Remember, there's an ability to explicitly create a tag (https://docs.ethswarm.org/api/#tag/Tag/paths/~1tags/post) and supply it on each individual /chunks upload with the swarm-tag header. This allows all of the individual uploads to be tracked and accounted for with a single tag. I use this feature on the /bytes uploads to aggregate a bunch of individual files into a single tracking tag.

You are correct that there is no support for pinning individual chunks in a node. IMHO, this is lacking in the current pinning implementation.

When using the swarm-pin: true header on a /bzz upload, it only appears that the pin is made on the root. In the actual implementation, each individual generated chunk is invisibly pinned and these pins are gathered together under the final root reference which becomes visible in the /pins query API. But when using swarm-pin: true, no traversal is required.

But if you do a /bzz or /bytes upload without the swarm-pin header, then you are correct, a full recursive traversal must succeed to create the desired pin. And if any of those component chunks are unavailable for any reason, the /pins request will fail.

And a detail on the implementation of pruning, in actual fact the pinning store is a parent/child relationship with the root reference as the parent and every single chunk reference as the children. The prune only needs to check if the chunk is in the pin children store. Well, it doesn't really check, but the sharky store has a reference counter that prevents pinned chunks from being pruned, even if they are no longer in the reserve or cache. Pinned chunks are retained in the sharky index lookup until their reference count goes to zero (eg. when they are unpinned and also not in any other store). I am currently using this feature in a 2.1.0 node to be able to access locally pinned OSM chunks whose stamp has already expired.

I did implement a hack on the GET /pins/{reference} API to return information on all child chunks under the parent reference, so I know that data structure all too well.

Oh, and one other difference between swarm-pin: true during the upload and a POST /pins/{reference} on the resulting root is that the former pins all root redundancy chunks and the latter only seems to get the traversal-discovered chunks which doesn't include (at this time) the root redundancy SOCs. Also a problem with the current /pins implementation (and /stewardship) IMHO.

ldeffenb · 2024-10-13T20:08:51Z

ldeffenb@4ad599f is the commit of my enhanced /pins API support. Note that the /pins/{reference} is very hacky in that I disabled most of the information that I originally returned because I haven't brought the required lower-level support over from my 2.1.0-hacks yet.

This commit also adds offset and limit support to the /pins query similar to that used in the /tag query. This was required for me to query all of the pinned files without exhausting the node's memory.

https://docs.ethswarm.org/api/#tag/Tag/paths/~1tags/get

ldeffenb · 2024-10-13T20:13:02Z

Oh, and if you want to see the /pins and /stewardship APIs break at their finest, upload an individual /chunks that "smells" like a manifest node or a root BMT chunk, but where none of the chunks referenced by that node exist. Then try to POST /pins/{reference} that chunk. It will always fail because the traverser cannot iterate through it completely. Ugly IMHO.

tmm360 · 2024-10-13T23:20:24Z

Thanks for your assistance! Unfortunately to me, I don't have advanced Go knowledge. I'm just learning to read it, so I've tried to understand something about implementations from code, but to get details would require too much time at moment.

I'm trying to implement with C# a reverse proxy with chunk cache on db to implement chunks pinning, because is a faster solution to me. A lot of code has already been implemented with my bee.net project. At least I can debug on-fly any kind of problem I will encounter.
This will solve also other problems like pinning corruption when node crash for some reason...

Anyway, I leave open this issue because a better pinning implementation is really needed

martinconic · 2024-10-28T08:21:15Z

ith chunk cache on

There is and open PR for repair pin #4836

tmm360 added the needs-triaging new issues that need triaging label Oct 12, 2024

nikipapadatou assigned martinconic Oct 23, 2024

tmm360 mentioned this issue Nov 26, 2024

add swarm video hashes efdevcon/monorepo#23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't pin large files uploaded with chunks #4864

Can't pin large files uploaded with chunks #4864

tmm360 commented Oct 12, 2024

ldeffenb commented Oct 12, 2024 •

edited

Loading

tmm360 commented Oct 13, 2024

ldeffenb commented Oct 13, 2024 •

edited

Loading

ldeffenb commented Oct 13, 2024 •

edited

Loading

ldeffenb commented Oct 13, 2024

tmm360 commented Oct 13, 2024

martinconic commented Oct 28, 2024

Can't pin large files uploaded with chunks #4864

Can't pin large files uploaded with chunks #4864

Comments

tmm360 commented Oct 12, 2024

Context

Summary

Expected behavior

Actual behavior

Steps to reproduce

Possible solution

ldeffenb commented Oct 12, 2024 • edited Loading

tmm360 commented Oct 13, 2024

ldeffenb commented Oct 13, 2024 • edited Loading

ldeffenb commented Oct 13, 2024 • edited Loading

ldeffenb commented Oct 13, 2024

tmm360 commented Oct 13, 2024

martinconic commented Oct 28, 2024

ldeffenb commented Oct 12, 2024 •

edited

Loading

ldeffenb commented Oct 13, 2024 •

edited

Loading

ldeffenb commented Oct 13, 2024 •

edited

Loading