Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperbee URLs #21

Open
RangerMauve opened this issue Nov 18, 2020 · 4 comments
Open

Hyperbee URLs #21

RangerMauve opened this issue Nov 18, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@RangerMauve
Copy link
Owner

Based on discussions on Discord

  • Use a regular hyper:// URL
  • / in the URL is pulled through the sub command
  • Keys can be URI encoded, that's how you can have a literal /
  • When you GET a key, you get the raw value back
  • Content-type is determined from the key's file extension
    • e.g. foo/bar.txt' => text/plain`
  • Having a / at the end of a path triggers a directory listing
    • By default do an HTML listing like in Hyperdrive
    • Use Accept header to enable JSON
      • {[urlEncodedKey]: "urlEncodedValue"}?
      • [[urlEncodedKey, urlEncodedValue]]?
      • [{key: urlEncodedKey, value: urlEncodedValue}] ?
      • [{key: urlEncodedKey, value: byteArray}] ?
      • Opt into byte array?
    • Enable EventSource via application/octet-stream
    • Some sort of binary format with length prefixed streams?
    • Use gt and lt query string params for search

cc @KyGost @pfrazee

Conversation Log
(05:29:02 PM) Mauve: 
@pfrazee So far, I was thinking we could detect hypercore vs hyperdrive using headers when doing a GET and providing a way to get hyper://hyperbeeaddres/path/to/key.

One question is whether it'd make sense to convert the / in the path to delimeters that'd usually be used in leveldb. Not certain yet.

Also wondering what binary encoding would look like in the URL, somehow detecting hex strings?

Also, what does it mean to "read a folder" in a hyperbee. Should it list all the keys with that prefix?

For prefix searches, I think it could be useful to have querystring parameters for gt and lt when listing.

Finally, I think it'd make sense to use some sort of content type encoding when returning multiple results so you can get the key and value stuff out, maybe take note from the EventSource API?
(05:29:40 PM) Mauve: 
cc @Kyran if you have thoughts on this
(05:31:06 PM) Mauve: 
Also, I was thinking of encouraging people to use file endings in their keys and doing mimetype detection. So if you're storing a JSON key you could use /file.json for the key name to give apps a hint. This is the approach EarthStar is taking IIRC
(05:31:27 PM) Kyran: 
I think that about sums it up. If I think of anything to add I'll add :-)


---


(05:33:42 PM) Kyran: 
@Mauve (if using Pidgin atm) @pfrazee edited message.
(05:34:02 PM) pfrazee: 
ah let me repost
(05:34:09 PM) pfrazee: 
converting the / to the path delim makes sense to me. I guess that might cause problems if / is actually in the path but I feel like that's just something we'll have to live with?
(05:35:15 PM) pfrazee: 
binary key encoding... also a pain. Hmph. Yeah I guess we just have to choose some kind of encoding
(05:35:17 PM) Mauve: 
 @Kyran I can see edits, thankfully. 😁 Yeah, I think the other gotcha with converting slashes is which delimiter we should go wtih since different projects might use different schemes
(05:36:07 PM) pfrazee: 
there's a PR to standardize it https://github.com/mafintosh/hyperbee/pull/8
(05:36:11 PM) Mauve: 
 @pfrazee What if we don't do anything special for keys to start and use the slashes as plain values, and if you want to search it can only be done on `/` with query strings?
(05:36:54 PM) Mauve: 
And further, what about using hex encoding for binary and treating everything with `0x` as hex keys or something
(05:39:12 PM) pfrazee: 
would there be any merit to only using query parameters?
(05:39:21 PM) pfrazee: 
we might have a little more flexibility
(05:39:32 PM) Mauve: 
Only using query strings for keys in general?
(05:39:55 PM) pfrazee: 
yeah. Would we lose anything by doing that?
(05:39:59 PM) Mauve: 
Hmmm. I dunno. I kinda like the idea of having a key in the pathname since it "makes sense"
(05:40:03 PM) pfrazee: 
right
(05:40:06 PM) Mauve: 
Functionally it should be fine though
(05:40:27 PM) Mauve: 
If you use slashes in your keys, then you can do useful things with the URL API
(05:40:53 PM) Mauve: 
Like `url = `hyper://name/foo/bar`, otherURl = new URL('./fizzbuzz.txt', url)`
(05:41:04 PM) Mauve: 
I've been using this pattern a lot lately
(05:41:07 PM) pfrazee: 
yeah that's true
(05:41:50 PM) pfrazee: 
okay back to your other points...
(05:42:04 PM) pfrazee: 
0x prefixes for hex seems pretty sensible
(05:42:27 PM) pfrazee: 
same with the searches using query strings
(05:43:11 PM) pfrazee: 
yeah for reading "a folder," I feel like listing the keys is pretty sensible
(05:43:38 PM) pfrazee: 
is it possible for a keyname that's a sub's prefix to have a value?
(05:44:16 PM) Mauve: 
One thing that'll be weird is that folder traversal doesn't make sense for hyperbee. Like, you could list everything in the folder and any sub folders, but getting just the sub folder names is hard
(05:44:24 PM) Mauve: 
Yeah that could be possible.
(05:44:34 PM) Mauve: 
It'd be similar to the index.html stuff in hyperdrive
(05:45:11 PM) pfrazee: 
yeah agree on the subfolder thing, that is a little tricky
(05:45:44 PM) Mauve: 
It could be fine to treat it like a recursive directory listing, IMO. 🤷
(05:45:55 PM) Mauve: 
It's just a bit quirky compared to hyperdrive
(05:45:56 PM) pfrazee: 
what I meant with the prefix key having a value thing is, if I have /foo/bar = "hi" and /foo = "hello" then GET /foo probably needs to give "hello" and not list subkeys
(05:46:06 PM) Mauve: 
Yeah agreed
(05:46:32 PM) pfrazee: 
would it make sense to never list subkeys on GET because of that possibility?
(05:46:43 PM) pfrazee: 
or should it list subkeys if it has no value itself?
(05:46:49 PM) Kyran: 
Maybe a different kind of request for subkeys?
(05:46:58 PM) pfrazee: 
that's what it makes me wonder
(05:47:05 PM) Kyran: 
Hmm
(05:47:13 PM) pfrazee: 
could always do a new method name like LIST
(05:47:14 PM) Mauve: 
What about never listing subkeys unless you use something in the querystring params?
(05:47:22 PM) pfrazee: 
or that
(05:47:40 PM) Kyran: 
That makes sense I guess
(05:47:40 PM) Kyran: 
EDIT: That makes sense
(05:48:16 PM) Mauve: 
Like `/prefix/?gt=&lt=foobar`
(05:48:29 PM) Mauve: 
So that'd search between `/prefix/` and `/prefix/foobar`
(05:48:32 PM) Kyran: 
Ah!
(05:48:42 PM) Kyran: 
What if it was if the path is suffixed by / it lists
(05:48:49 PM) Kyran: 
If no suffix it gives value?
(05:48:55 PM) Kyran: 
Does that make any sense?
(05:49:10 PM) Mauve: 
That'd run into the issue pfrazee had above where you might have keys that end in `/` which you'd want to be able to access
(05:49:25 PM) pfrazee: 
aren't we kind of hosed on that either way?
(05:49:45 PM) pfrazee: 
if we substitute / for the delim then you can never have / in a keyname right?
(05:50:00 PM) Mauve: 
I was thinking `/` in this case isn't treated specially
(05:50:06 PM) pfrazee: 
I wonder if we could percent-encode / to differentiate
(05:50:25 PM) pfrazee: 
%2F
(05:50:41 PM) Mauve: 
In the case where you only do lists when you have gt and lt in the querystring, the `/` doesn't mean anything special
(05:50:46 PM) pfrazee: 
so if you did /foo%2Fbar that would lookup foo/bar not sub(foo).get(bar)
(05:51:18 PM) pfrazee: 
@Mauve that's true but we need to solve this for the value-get case anyway, right?
(05:52:02 PM) Mauve: 
Ah, I guess if hyperbee has special sub functionality it'd make sense to use `/` for value getting.
(05:52:29 PM) pfrazee: 
that's pending an unmerged PR but I think they intend to merge that
(05:52:58 PM) Mauve: 
I think it'd make most sense for `/` in the URL to be subs, and `%2F` to be literal slashes in the key
(05:53:06 PM) pfrazee: 
yeah same
(05:53:17 PM) Mauve: 
So maybe for binary data we could stick to URL encoding?
(05:53:33 PM) Mauve: 
No need for the 0x thing
(05:53:42 PM) pfrazee: 
oh that's...huh I have no idea what that would look like
(05:54:23 PM) pfrazee: 
% encode just does a direct byte->number translation right?
(05:54:33 PM) Mauve: 
No clue, actually. 😂
(05:54:40 PM) pfrazee: 
haha yeah I gotta look that up
(05:54:55 PM) Kyran: 
hahaha what encoding would be used if not hex.
Why the need for the 0x prefix btw?
(05:55:04 PM) Mauve: 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI
(05:55:08 PM) Kyran: 
Hypercore addresses don't use them do they?
(05:55:16 PM) Mauve: 
 @Kyran Sometimes people use non-unicode keys for leveldb
(05:55:47 PM) Kyran: 
Yeah but if you are encoding it from binary to hex that's not an issue?
(05:55:54 PM) pfrazee: 
if that's right then I assume Buffer([0,1,2,3]) would be encoded as /%00%01%02%03
(05:56:31 PM) Kyran: 
O
(05:57:44 PM) Mauve: 
https://www.ietf.org/rfc/rfc2396.txt
(05:57:58 PM) Kyran: 
I'm a bit inexperienced here-- why would you need to separate values? Why wouldn't you be able to reference that as binary or hex?
URI encoding is kinda yuck if not needed
(05:57:58 PM) Mauve: 

2.4.1. Escaped Encoding An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character. escaped = "%" hex hex hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"


kylemathews Kyran 
(05:58:30 PM) Mauve: 
 @Kyran We can't have arbitrary binary in a URL key so we need to encode it somehow
(05:58:38 PM) Kyran: 
Oh! Nevermind I'm being silly. This is to facilitate both ascii and binary yeah?
(05:59:03 PM) Mauve: 
Kinda yeah
(05:59:12 PM) Mauve: 
I think this URL escaped encoding would work fine
(05:59:22 PM) pfrazee: 
yeah I think this is probably the "right way" to do it
(05:59:25 PM) Mauve: 
Something custom would make life harder for other implementations
(05:59:26 PM) Mauve: 
Yeah
(05:59:33 PM) Kyran: 
That's fair. I agree.
(05:59:48 PM) pfrazee: 
ok cool, so the question about listing keys vs getting value
(06:00:02 PM) pfrazee: 
the trailing slash idea seems pretty sensible to me
(06:00:22 PM) Mauve: 
Yeah, if the slash is treated specially, I think that's an elegant solution
(06:00:57 PM) pfrazee: 
@Mauve what was your point about the encoding and EventSource?
(06:01:36 PM) Mauve: 
 @pfrazee I think that listing as HTML is useful for when browsing, but when an application wants to GET the data, something more structured is useful
(06:01:52 PM) pfrazee: 
yeah agree
(06:02:07 PM) Mauve: 
In dat-fetch you can opt into reading a directory as JSON with the `ACCEPT` header, I wasn't sure what would be best for a hyperbee
(06:02:16 PM) Mauve: 
JSON would be obvious, but it's bad for binary values
(06:02:38 PM) pfrazee: 
I was going to suggest that yeah, so arguably accept=text/html  could also give a UI for rendering it
(06:03:14 PM) pfrazee: 
there's a mimetype for generic binary, application/octet-stream
(06:03:24 PM) Mauve: 
That make sense when you get a single value
(06:03:34 PM) Mauve: 
But when you list it's key-value pairs
(06:03:51 PM) pfrazee: 
hmm
(06:04:01 PM) Mauve: 
lol
(06:04:04 PM) Mauve: 
URI encoding it?
(06:04:10 PM) pfrazee: 
hah
(06:04:14 PM) Mauve: 
https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events
(06:04:44 PM) Mauve: 
Something loosely based on text/event-stream
(06:05:34 PM) pfrazee: 
hmm. I'm sure people have had to deal with this in the past
(06:05:38 PM) Mauve: 
Like

key: urlencodedkey
value: urlencodedvalue

(06:06:00 PM) Kyran: 
How is the key delimiter stored in binary keys?
(06:06:00 PM) Kyran: 
EDIT: How is the key delimiter stored in binary keys? (once subs are added)
(06:06:18 PM) Mauve: 
Same as the URLs!
(06:06:26 PM) Kyran: 
Huh?
(06:06:45 PM) pfrazee: 
maf has dealt with so much streaming stuff I imagine he'd be a good person to ask about this
(06:06:57 PM) Mauve: 


key: url/encoded/key
value: urlencoded%AAvalue

key: url/encoded/key2
value: urlencoded%AAvalue2


(06:07:14 PM) pfrazee: 
@Mauve that is a sensible option for sure
(06:07:21 PM) Kyran: 
Perhaps one could simply respond with an octet-stream with a delimiter between key and value and a double delimiter between key and key
(06:07:36 PM) pfrazee: 
also has a nice upside of human-readability
(06:07:49 PM) Kyran: 
@Mauve would that just be plaintext like that?
(06:07:52 PM) pfrazee: 
@Kyran the question there is whether we can be sure the delim wouldnt show up in the actual data
(06:08:39 PM) Kyran: 
@pfrazee that's why I was wondering how the true delimiter works (the one we can disguish to be different from $2F)
(06:08:39 PM) Kyran: 
EDIT: @pfrazee that's why I was wondering how the true delimiter works (the one we can disguish to be different from %2F)
(06:08:56 PM) pfrazee: 
@Mauve for a text-based encoding, I think either something like what you're proposing or just a JSON array (which is thus not ideal for streaming reads)
(06:09:25 PM) Mauve: 
OMG, if we literally use the event source format we could load a series of events from a hyperbee using EventSource. Like if we could do a live listing we could have it notify us of changes
(06:09:37 PM) Mauve: 
JSON array could be fine too
(06:09:39 PM) pfrazee: 
that's true
(06:09:46 PM) Mauve: 
Like, an array of bytes?
(06:09:49 PM) Mauve: 
We could do both
(06:10:04 PM) Mauve: 
Or neither. 😛
(06:10:08 PM) pfrazee: 
@Kyran if you're doing text encoding then you have a little more control of the output because you're transforming it
(06:11:03 PM) pfrazee: 
@Mauve tbh I dont have any strong opinions here so whatever yall think is best. One other thing to consider is, for binary listing responses using something like protobufs
(06:11:56 PM) pfrazee: 
I sort of get the feeling that SSE is on the way out but it's not a bad option for this
(06:12:01 PM) Mauve: 
Yeah... Protobufs would be great but then it wouldn't be usable in the browser without a library
(06:12:07 PM) pfrazee: 
right
(06:12:21 PM) Mauve: 
I was gonna use SSE for a bunch of stuff in hyperdrive and hypercore actually
(06:12:40 PM) pfrazee: 
yeah seems fine, SSE is pretty cool
(06:13:06 PM) Mauve: 
Electron doesn't give us any hope for custom websocket protocols so it's the only way I can get streaming data out in a friendly API
(06:13:10 PM) pfrazee: 
one other option for binary responses is just to cook up a format where you write lengths and then values
(06:13:20 PM) Mauve: 
Though I suppose we have readable streams in fetch these days...
(06:13:44 PM) Mauve: 
Yeah, length-prefixed streams could be decent for getting data
(06:14:06 PM) pfrazee: 
yeah <varint><key><varint><value><varint><key>... etc
(06:14:21 PM) Mauve: 
K, what about this: Start with HTML, add JSON with a flag for binary arrays / strings, then figure out event source or some binary thing?
(06:14:34 PM) pfrazee: 
sure sounds good to me
(06:15:00 PM) Kyran: 
That works. What's the default going to be? HTML like for Hyperdrive?
(06:15:48 PM) Mauve: 
For dat-fetch I was thinking of doing the same thing as Hyperdrive with the Accept header
(06:16:18 PM) Kyran: 
Yup. That makes sense.
(06:16:18 PM) Kyran: 
EDIT: Yup. That makes sense. :-)

kylemathews Kyran 

kylemathews Kyran 
(06:24:08 PM) Mauve: 
 @pfrazee Kyran: Mind if I copy this convo to a GitHub issue?
(06:24:24 PM) pfrazee: 
@Mauve go for it
(06:24:26 PM) Kyran: 
Sounds good
(06:24:26 PM) Kyran: 
EDIT: @Mauve Sounds good
@RangerMauve RangerMauve added the enhancement New feature or request label Nov 18, 2020
@DeltaF1
Copy link
Contributor

DeltaF1 commented Dec 18, 2020

@KyGost @pfrazee
This sounds like a pretty sensible interface in line with the way the other fetch handlers work. In hyperdrive getting a directory listing only shows the keys, not the keys and the values. Is it possible that it would be better to only list sub-keys and not their values?

Is there a standardized list of useful range query parameters like gt lte etc.? It might be good to make sure that any future database-type protocols with range queries have similar semantics.

@RangerMauve
Copy link
Owner Author

I think making values optional is something we could shove into the querystring too.

I'm not sure what standardized would mean with regards to query parameters, tbh.

I was thinking we could support all the stuff inside Hyperbee's createReadStream API. https://github.com/mafintosh/hyperbee#stream--dbcreatereadstreamoptions

@RangerMauve
Copy link
Owner Author

Working on earthstar-fetch which has similar design constraints as hyperbee since it's also based on b+ trees for querying.

earthstar-project/earthstar-fetch#1

@RangerMauve
Copy link
Owner Author

Might be good to use hyper+bee://

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants