Canonicalise node ID representation #261

joshuakarp · 2021-10-22T07:07:09Z

Specification

We'd like to canonicalise the representation of a keynode's node ID.

Recall that the node ID of a node is based on the public key of a keynode: it is a public key fingerprint. At the moment, we produce a SHA-256 hash of the public key, resulting in an array of 32 bytes. This array of bytes can be currently seen as the primitive canonicalised representation of the node ID.

The representation needs to provide:

fixed size encodings: the Kademlia system for node ID resolution requires a fixed-size representation of a node ID, such that we can perform bitwise XOR operations between node IDs.
URL-safe characters

Previously, we were using base64 as the encoding choice for a human-readable version of the node ID. This produced fixed-size node IDs of 44 characters, but used unsafe characters (e.g. /, +).

After that, we were then looking at base58btc. This has the advantage of no longer producing URL-unsafe characters, as well as some quality-of-life improvements by not including visually-similar characters. However, it can produce different sized node IDs. See below:

import * as keysUtils from './src/keys/utils';
import {
  util as forgeUtil,
} from 'node-forge';
import { base58btc } from "multiformats/bases/base58";

async function main() {
  const pemPublicKey43 = `-----BEGIN PUBLIC KEY-----
  MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA0dTpU0P4AMR3bWs9t4KQ
  WWkQZZsjqMLj4Hr2c3n6bYhqCl4e69P7tFUG6mrIGomUb/JCqZ5cik+IyDoKz/rw
  wB6vJ51ajT4nLgiCgKbiMeU3gdlXKDpJSBxi85sFFjprbmJiO41r2ZezE7bS62Sl
  k8vkPI2IPhH4H/Zun60iO0lRBMd/aWSZ03hNZHGBe1JVTzkoxLneaNOd5XRQ90KO
  TPButem2+5tU91oQsFH4VZONHuJbHEuYeiJUbFtgKNvwDe8w/0BmDoV5r9R7oEjI
  busgF0SQv6DkQILL2PfhJAsYQdY3azd07q3Nhm4YgQJg3r7XbN4+goPUwCr3073d
  rM1Ulc2zh9Oukx7iqzlGjjQWJ7q6bd2ewn/E9dlQDNc8KP4Fq08aMo1bRGSX1Pl8
  h5E5lr/FjKgNmxLwe+gBqo/iMvGYISo6IedEpw2xLcWxQrZ6yu9MzCi+6l6mr/M7
  PQqye8FzMXMxaaqIBi0WUkgg3qTynqAM/jde64GcBdLv4UfkbSkgSRlJb5srmcrc
  jckqpjeP4yRO0ex0ynkMMi78LT0yG3E/XYqFx+XD6EGEpEKMbVPwBmOyEapRbVVI
  DLBEmR6e6WuvdnyiJksCUpZhuDrVyJuVOImL8FJMFtanpPxwZe3FpNaGJQxa0m1d
  VQE/QpKzYZL0pMhBhHr/JBkCAwEAAQ==
  -----END PUBLIC KEY-----`;
  const pemPublicKey44 = `-----BEGIN PUBLIC KEY-----
  MIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEA3SABwHq1vJ9ydCl2eW/Q
  r3guKAPdesaxkm85k4EhJ2ywvDcw0wjh39KbcU5ymexxDGzKa3DkhrGzzHbQmS37
  Ee0Nvm6VR2PI+rhmRtJAVztAPPTMj3YHJXaqpoffILXt508jAnvQ38477tZ8hxSR
  /NfZ2OYCLJrCENXvIy6NOPWBwcXPJUL02fDPpaZsvcnoLpIptOnIEuQXL3YJHjKd
  w6yYT4g0sy0MSqp8NFPzzlcY4k5gKO1KR/G2/o2DFErd0n2a7k+SY/ChuUaZpk6U
  q88zfLSRdUCM4pe469ZRrlz7lRoQUemBrmDx0g00MS9GsSEgOZFz3xqt+/3jE+vg
  vt9Ul4929z+9onK5NvjwVR3gRDAl9IFW6P3YVdnyPUo3tNkgQ3Q9Pew/cmulGVf3
  js/DiS9QVjivscfWGbVnkRPuNlUQ0WX4mNEDtYUEGsuYIRkgvBwHYFO1TGQWn3P3
  vEfUg61LzP8WHoYxb/nmYcd14t8zt9reQnCOk+/ON+Btp1Q1ZhIiy7uxG0IGE7m2
  Nawvql9YGFheFC56TYnikm3jfw4boyaqwCeVmqRBAe/NCXcIzDK76AmEvfjzitOl
  2f8rtFMcRE9LFzAYf0bGMbtLEWmE3tMbxDeO3B+DXKtUjeRun6TWca1biyK8niI0
  xsGTzpaAygxgj5DHF8OjP9ECAwEAAQ==
  -----END PUBLIC KEY-----`;
  const publicKey43 = keysUtils.publicKeyFromPem(pemPublicKey43);
  const publicKey44 = keysUtils.publicKeyFromPem(pemPublicKey44);

  const bytes43 = keysUtils.publicKeyToFingerprintBytes(publicKey43);
  const bytes44 = keysUtils.publicKeyToFingerprintBytes(publicKey44);
  console.log('publicKeyToFingerprintBytes length 43:', bytes43.length);
  console.log('publicKeyToFingerprintBytes length 44:', bytes44.length);
  const fTypedArray43 = forgeUtil.binary.raw.decode(bytes43);
  const fTypedArray44 = forgeUtil.binary.raw.decode(bytes44);
  console.log('fTypedArray 43:', fTypedArray43);
  console.log('fTypedArray 44:', fTypedArray44);
  console.log('Hex string 43:', Buffer.from(fTypedArray43).toString('hex'));
  console.log(base58btc.baseEncode(fTypedArray43));
  console.log(keysUtils.publicKeyToFingerprint(publicKey43));

}

main();

[nix-shell:~/Documents/MatrixAI/js-polykey]$ npm run ts-node -- nodeIds.ts 

> @matrixai/[email protected] ts-node /home/josh/Documents/MatrixAI/js-polykey
> ts-node -r tsconfig-paths/register "nodeIds.ts"

publicKeyToFingerprintBytes length 43: 32
publicKeyToFingerprintBytes length 44: 32
fTypedArray 43: Uint8Array(32) [
  10,  45,  47,  45, 125, 109,  40,  40,
  20, 160,  60, 163, 101, 100, 189, 222,
  38,  35, 246, 172,  73, 251, 232, 221,
  13,  97, 139, 151, 240, 251, 129, 201
]
fTypedArray 44: Uint8Array(32) [
  233, 254, 155, 220,   4, 146, 225,  43,
  130, 201,  87,  50,  52, 141, 221, 156,
   15, 162, 131,  58, 120, 191, 146,  36,
   49, 133,  50,  92, 203,  95, 232,  13
]
Hex string 43: 0a2d2f2d7d6d282814a03ca36564bdde2623f6ac49fbe8dd0d618b97f0fb81c9
gj3DUVjzAGHvmP9ZRFB5geWLVRwXUSdEUtMcVvxozLx
gj3DUVjzAGHvmP9ZRFB5geWLVRwXUSdEUtMcVvxozLx

As a temporary solution, we've adopted a base32hex encoding, producing a fixed fixed 53 character node ID, such as v6n7m9vuf44pfqq133re8v91ju7a8968h4nkirjt8r7r9pbcoqqq0 (with a prepended v character from multibase): https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_711237922

Additional context

Refactor NodeId as a proxy/singleton instance #254 - will need to look at this issue concurrently
discussions on base58 and base58btc

Tasks

Choose a standard means of representing the node ID
Change all current usages of node ID to be this new standard (e.g. within NodeGraph (kademlia), across GRPC, etc)

The text was updated successfully, but these errors were encountered:

CMCDragonkai · 2021-10-22T07:29:19Z

My proposal is to just use a class NodeId extends Uint8Array object. We can also use permaproxy as well if we want to refer to an primitive or Uint8Array.

Then any kind of encoding can be used, and we always use NodeId as the official type.

We knock out the problem of dereferencing node ids improperly, and also ensure that all domains are use the byte-form canonical version of NodeId and not some encoded form.

The exact encoding algorithm we use has some important features needed:

base64 has characters that cause problems like /
do you need lexicographic order? if so you have to use base32hex or one of the encodings talked about here: Which of the multibase formats preserves lexicographic order? multiformats/js-multiformats#124 (comment)
do you want padding? As per the answer https://stackoverflow.com/a/26632221/582917, the reason padding is used or not used depends on whether there will be a concatenation of encoded node ids without any kind of structure indicating separation, ideally we don't want padding

If we do choose base58btc, that's fine, as long as the above 3 problems are resolved, and no domains in PK are using string/encoded form of node id, and they are all using NodeId as a type, whether that is a special singleton object, or a permaproxy.

CMCDragonkai · 2021-10-22T07:30:49Z

Remember with this issue done, it does not matter what the encoded node id lengths are, and there is no guarantee that the all encoded node id lengths are the same length and this is fine cause it doesn't matter anymore.

At the moment in the source code, there's a need to temporarily ensure that the base encoding we are using is producing a fixed length. That's base32hex. But this is only because so many domains like kademlia, and others all assume a string base id that is of a fixed length.

joshuakarp · 2021-10-25T03:59:26Z

Some interesting discussions that came up with Brian and I whilst we've been debugging the tests.

Previously, the node ID's "canonicalised" representation was its string representation (as a base64/base58 string). Instead, we're now using the actual, internal 32 bytes of the node ID for all calculations in the nodes domain. See https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_712546331 for more details.

joshuakarp · 2021-10-26T02:09:08Z

This work on the node ID could also relate to #168. When we eventually move to Ed25519 keys, then the node ID will be the 32 byte sequence of the Ed25519 public key.

CMCDragonkai · 2021-10-26T06:09:49Z

As talked about in #269, we need to support multibase encoded Node IDs as these will be used by PK_SEED_NODES env variable and related configuration. Right now I believe @joshuakarp has already made the changes to nodes domain to use the 32 byte Uint8Array, which means we actually can reuse base58btc now, but there are other domains that are still using string-encoded Node IDs, all of those will require refactoring.

Note that we do not need ordered base encoding for Node Ids, Node Ids do not have any order.

One question I have is whether this blocks testnet deployment? If not, then this can come after we have testnet working, and I don't think it blocks general publishing and distribution of PK CLI #268. If so this can be pushed to the icebox.

joshuakarp · 2021-10-26T23:39:11Z

No, I don't think it does block testnet deployment, nor #268, so this can be pushed back.

joshuakarp · 2021-10-26T23:40:31Z

For this issue, I think we need to do a bit more speccing out of the requirements for our string-encoded node ID. e.g. what's needed from a CLI point of view? What would we want a user to see as the representation of their node ID? Do we have length constraints on this? etc etc

CMCDragonkai · 2021-10-27T00:07:28Z

Yes we would need to pick a specific node ID encoding to use by default. There's a robustness principle at play here. Specific outputs, diverse inputs. So if at each human rendering location or just external printing is needed like logging, then base58 can be used, while we can take as input any base, and decode it to bytes. What base encoding does wireguard use for their ed25519 keys?

On 27 October 2021 10:40:42 am AEDT, Josh ***@***.***> wrote: For this issue, I think we need to do a bit more speccing out of the requirements for our string-encoded node ID. e.g. what's needed from a CLI point of view? What would we want a user to see as the representation of their node ID? Do we have length constraints on this? etc etc -- You are receiving this because you commented. Reply to this email directly or view it on GitHub: #261 (comment)

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

joshuakarp · 2021-10-27T01:33:14Z

Seems like Wireguard uses base64 encodings: https://www.wireguard.com/quickstart/#key-generation

Interestingly, seems like the standard for SSH keys is also base64: https://datatracker.ietf.org/doc/html/rfc4716#section-3.4

joshuakarp · 2021-10-27T01:38:19Z

Perhaps there's also merit in standardising this with our claims on the sigchain with base64url (removes unsafe URL characters for transfer): https://datatracker.ietf.org/doc/html/rfc7515

CMCDragonkai · 2021-11-04T01:45:54Z

Note that ErrorInvalidId should be removed. Parsing whether an id is valid or not is supposed to happen on the boundaries of the program, not internally. So it should be part of parsing functions. Internally the typesystem should ensure that we have valid ids and we should not need to have runtime exceptions for this.

joshuakarp · 2021-11-29T04:53:05Z

As a refresher, it's worthwhile to remember that we've already done some form of canonicalisation of the node ID, as discussed here: #261 (comment). See the following commit too: 5f382e2

joshuakarp · 2021-12-20T00:12:35Z

It will also potentially be worthwhile to undertake #299 (the review of the GenericIdTypes.ts 'module') alongside this work.

CMCDragonkai · 2022-01-21T00:35:02Z

With #318, we can use the raw typed array/buffer form of NodeId without needing to encode it first, this works by using the Id from js-id. The canonical version of NodeId is the byte-representation.

Encoded representations of NodeId can be used elsewhere where string form/human readable is needed. And in those cases, we would use base32hex in order to preserve lexicographic order.

CMCDragonkai · 2022-01-21T00:50:44Z

Example of encoding the public key fingerprint as Id: #254 (comment)

joshuakarp added development Standard development design Requires design discussion Requires discussion labels Oct 22, 2021

joshuakarp assigned joshuakarp and tegefaulkes Oct 25, 2021

CMCDragonkai mentioned this issue Oct 26, 2021

Embed Trusted Default Seed Nodes into src/config.ts for bootstrapping the P2P network #269

Closed

joshuakarp added the epic Big issue with multiple subissues label Nov 29, 2021

This was referenced Nov 29, 2021

1.0.0 Feature Freeze #291

Closed

Investigate and fix random NodeGraph test failures #274

Closed

CMCDragonkai mentioned this issue Dec 2, 2021

Introducing vault sharing and the restrictions of pulling and cloning with vault permissions #266

Merged

15 tasks

tegefaulkes mentioned this issue Jan 20, 2022

Change NodeId from encoded string to an opaque alias of Id which is a typed array from js-id #318

Merged

13 tasks

tegefaulkes closed this as completed in #318 Jan 28, 2022

CMCDragonkai added the r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy label Jul 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Canonicalise node ID representation #261

Canonicalise node ID representation #261

joshuakarp commented Oct 22, 2021 •

edited

Loading

CMCDragonkai commented Oct 22, 2021

CMCDragonkai commented Oct 22, 2021 •

edited

Loading

joshuakarp commented Oct 25, 2021

joshuakarp commented Oct 26, 2021

CMCDragonkai commented Oct 26, 2021

joshuakarp commented Oct 26, 2021

joshuakarp commented Oct 26, 2021

CMCDragonkai commented Oct 27, 2021 via email

joshuakarp commented Oct 27, 2021

joshuakarp commented Oct 27, 2021

CMCDragonkai commented Nov 4, 2021

joshuakarp commented Nov 29, 2021

joshuakarp commented Dec 20, 2021

CMCDragonkai commented Jan 21, 2022

CMCDragonkai commented Jan 21, 2022

Canonicalise node ID representation #261

Canonicalise node ID representation #261

Comments

joshuakarp commented Oct 22, 2021 • edited Loading

Specification

Additional context

Tasks

CMCDragonkai commented Oct 22, 2021

CMCDragonkai commented Oct 22, 2021 • edited Loading

joshuakarp commented Oct 25, 2021

joshuakarp commented Oct 26, 2021

CMCDragonkai commented Oct 26, 2021

joshuakarp commented Oct 26, 2021

joshuakarp commented Oct 26, 2021

CMCDragonkai commented Oct 27, 2021 via email

joshuakarp commented Oct 27, 2021

joshuakarp commented Oct 27, 2021

CMCDragonkai commented Nov 4, 2021

joshuakarp commented Nov 29, 2021

joshuakarp commented Dec 20, 2021

CMCDragonkai commented Jan 21, 2022

CMCDragonkai commented Jan 21, 2022

joshuakarp commented Oct 22, 2021 •

edited

Loading

CMCDragonkai commented Oct 22, 2021 •

edited

Loading