Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add baseName to ExternalContent #71

Merged
merged 25 commits into from
Aug 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
ee1d3d2
Update rsa modulus endianess to match most protocols
matheus23 Jul 12, 2023
a8bb40e
Whoops, fix `low-endian` -> `little-endian`
matheus23 Jul 12, 2023
9fd4361
Typo
matheus23 Jul 12, 2023
7ef230d
Merge remote-tracking branch 'origin/main' into matheus23/update-rsa-…
matheus23 Jul 17, 2023
631912b
Remove `namefilter.md`
matheus23 Jul 17, 2023
fff9a19
Write verification algorithm
matheus23 Jul 17, 2023
7178e3a
Remove `TODO` from the allowed words list...
matheus23 Jul 17, 2023
9f224d8
Change from SHA3 to Blake3, more domain separation
matheus23 Jul 17, 2023
fbbb759
More domain separation strings.
matheus23 Jul 17, 2023
a60ea33
Fix `hashToPrime` usages
matheus23 Jul 17, 2023
24522ae
Spelling
matheus23 Jul 17, 2023
173d26d
Switch from AES-GCM to XChaCha20-Poly1305
matheus23 Jul 18, 2023
cda33eb
Fix constant
matheus23 Jul 18, 2023
f04a352
Remove `blockCount` restriction
matheus23 Jul 18, 2023
16c7c1a
Update rationale
matheus23 Jul 18, 2023
8a6fdd8
Woords
matheus23 Jul 18, 2023
cef8600
Add `baseName` to `ExternalContent`
matheus23 Jul 18, 2023
637f1da
Small clarification
matheus23 Jul 19, 2023
a3a64b1
Merge remote-tracking branch 'origin/main' into matheus23/external-co…
matheus23 Aug 11, 2023
aba0699
Expand on Section 3.1.4
matheus23 Aug 11, 2023
64f84bd
Improve references to `baseName` and `name`
matheus23 Aug 11, 2023
91c1628
Use "its" instead of "the".
matheus23 Aug 11, 2023
1526a99
Add "kiB" as a valid word
matheus23 Aug 11, 2023
01db354
Try using "KB" instead of "kB"
matheus23 Aug 11, 2023
eecd1eb
Add "KiB" as word
matheus23 Aug 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/words-to-ignore.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ ethereum
exponentiate
extractable
golang
KiB
idempotence like omnipotence
inline like outline
little-endian
Expand Down
25 changes: 17 additions & 8 deletions spec/private-wnfs.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@ type InlineContent = {
type ExternalContent = {
"external": {
key: Key
baseName: NameAccumulator
blockSize: Uint64 // in bytes, at max 262,104
blockCount: Uint64
}
Expand Down Expand Up @@ -207,11 +208,19 @@ If the `previous` links contain more than one element, then some CIDs MAY refer

### 3.1.4 Private File

Private file content has two variants: inlined or externalized. Externalized content is held as a separate node in the bucket. Inlined content is kept alongside (and thus is decrypted with) the header.
Private file content has two variants: inlined or externalized. Externalized content stored in separate blocks from the private file block. Inlined content is kept alongside (and thus is decrypted with) the private file block itself.

This makes inline content only suitable for small files, when the content size is much smaller than the IPLD maximum block size (256KiB).

The advantage of inline content is that there's no need for computing `NameAccumulator`s for external content blocks, but the downside is that upon copying a file, you also need to copy the inline content and re-encrypt it with a new key.

It is a sensible default to make use of inline content for file sizes below a certain size threshold, e.g. 10KB.

#### 3.1.4.1 Externalized Content
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it would be good to expand section 3.1.4. I've had to re-load a bunch of context about the difference between inline and external content, which means that it will very likely be confusing to someone new coming in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically some examples of when you'd want one or the other would be helpful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to expand the section a little bit 👀
LMK what you think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super great! Thanks :)


Since external content blocks are separate from the header, they MUST have a unique `NameAccumulator` derived from a random key (to avoid forcing lookups to go through the header). If the key were derived from the header's key, then the file would be re-encrypted e.g. every time the metadata changed. See [sharded file content access] algorithm for more detail.
Since external content blocks are separate from its header, they each MUST have a `NameAccumulator` that is different than the file's `name` from its header. We allow these names to have an arbitrary `baseName`. For the normal case, the `baseName` is RECOMMENDED to be the file's `name` from its header with the externalized content's encryption `key`, hashed to a prime, added to it as a name segment.
However, the `baseName` is allowed to be anything else, for instance to support copying or moving a file to a different location without having to re-encrypt all of its data.
The [sharded file content access] algorithm contains more information about how to derive each externalized block's name from this `baseName`.

The block size MUST be at least 1 and at maximum $2^{18} - 40 = 262,104$ bytes, as the maximum block size for IPLD is usually $2^{18}$, but 24 initialization vector bytes and 16 authentication tag bytes need to be added to each ciphertext. It is RECOMMENDED to use the maximum block size. An externalized content block is laid out like this:

Expand Down Expand Up @@ -262,7 +271,7 @@ However, developers should be aware that such operations wouldn't check the inva

#### 3.1.6.1 Temporal Key

Temporal keys give temporal read access to a certain node and its descendants. It MUST be derived from the skip ratchet for that node, incremented to the relevant revision number. This limits the reader to reading from a their earliest ratchet and forward, but never earlier revisions than that. The derivation algorithm MUST be the skip ratchet [key derivation algorithm][/spec/skip-ratchet.md#21-Key-Derivation] with the domain separation string `wnfs/1.0/revision segment derivation from ratchet`.
Temporal keys give temporal read access to a certain node and its descendants. It MUST be derived from the skip ratchet for that node, incremented to the relevant revision number. This limits the reader to reading from a their earliest ratchet and forward, but never earlier revisions than that. The derivation algorithm MUST be the skip ratchet [key derivation algorithm][skip ratchet key derivation] with the domain separation string `wnfs/1.0/revision segment derivation from ratchet`.

When added to a private directory, it MUST be encrypted with [AES-KWP] and the private directory's temporal key. This prevents readers with only a snapshot key from gaining revision read access.

Expand Down Expand Up @@ -457,19 +466,19 @@ Consider the following diagram. An agent may only have access to some nodes, but

`getShards : PrivateFile -> Array<NameAccumulator>`

To calculate the array of HAMT labels for [externalized content], add `key` and `concat(key, encode(i))` for each block index `i` of external content to the file's name like so:
To calculate the array of HAMT labels for [externalized content], add `concat(key, encode(i))` for each block index `i` of external content to the external file content's `baseName` like so:

```ts
function* shardLabels(key: Key, count: Uint64, name: NameAccumulator): Iterable<NameAccumulator> {
for (let i = 0; i < count; i++) {
function* shardLabels(key: Key, blockCount: Uint64, baseName: NameAccumulator): Iterable<NameAccumulator> {
for (let i = 0; i < blockCount; i++) {
// add returns `name` with the parameter added as a name segment
yield name.add(hashToPrime("wnfs/1.0/segment derivation for file block", concat(key, encode(i)), 32))
yield baseName.add(hashToPrime("wnfs/1.0/segment derivation for file block", concat([key, encode(i)]), 32))
}
}
```

- `key`, `blockCount` and `baseName` are fetched from the `PrivateFile`'s external file content record,
- `concat` denotes byte array concatenation,
- `name` is the `NameAccumulator` from the private file's header,
- `encode` is a function that maps a block index to a little-endian byte array encoding of a 64-bit unsigned integer.

## 4.5 Merge
Expand Down