Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize support for streaming uploads #28

Open
jonhoo opened this issue May 19, 2013 · 32 comments
Open

Standardize support for streaming uploads #28

jonhoo opened this issue May 19, 2013 · 32 comments

Comments

@jonhoo
Copy link

jonhoo commented May 19, 2013

As @felixge pointed out in #26, it would be good to have a standardized way of providing URL endpoints where a client can retrieve a file that is currently being uploaded that will stay open until the entire file has been sent.

Following the decision in #26 to replace Offset with Content-Length, clients will by default be getting only the bytes that have been uploaded at the time of the request. A conforming client might be able to detect the Entity-Length header and keep the connection open to stream more bytes, but it would be good to define the protocol in such a way that "normal" HTTP clients would be able to request a file being uploaded and receive the entire file too.

One way of achieving this might be to change the default behavior of HEAD and GET requests to by default serve Content-Length = Entity-Length and stream the file to the client, but add a request flag a client to send if they wish to only get the uploaded bytes and not wait for the rest. Something like Accept: incomplete, except with a more appropriate header field (Accept is only for content types).

@vayam
Copy link
Member

vayam commented May 19, 2013

+1 to use standard byte range requests. Linking to @felixge's gist

@jonhoo
Copy link
Author

jonhoo commented May 20, 2013

@vayam that wouldn't allow streaming a download to tus-unaware clients though...

@vayam
Copy link
Member

vayam commented May 20, 2013

@jonhoo I see your point.
I prefer 'Entity-*' . How about Entity-Receive: available
Here is one attempt to describe the flow. Let me know what you guys think.

Upload Client

HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Enity-Receive: available

Response:

HTTP/1.1 200 Ok
Content-Length: 70

For clients interested in downloading whatever is available:

GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Enity-Receive: available

Response:

HTTP/1.1 200 Ok
Content-Length: 70

bytes

Download client (Standard HTTP client - browser/curl)

waits until the file is downloaded

HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org

Response:

HTTP/1.1 200 Ok
Content-Length: 100


GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org

Response:

HTTP/1.1 200 Ok
Content-Length: 100

bytes

Advanced Downloader

HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org


Response
HTTP/1.1 200 Ok
Accept-Ranges: bytes
Content-Length: 100


GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Range: bytes=0-

Response:

HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 100
Content-Range: bytes 0-99/100

bytes

>> Connection dropped after receiving 70 bytes



GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Range: bytes=70-

Response:

HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 100
Content-Range: bytes 70-99/100

bytes


Advanced Downloader to receive available bytes

HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Entity-Receive: available


Response:

HTTP/1.1 200 Ok
Accept-Ranges: bytes
Content-Length: 70

GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Entity-Receive: available
Range bytes=0- 

Response:

HTTP/1.1 200 Ok
Content-Range: bytes 0-70/70
Content-Length: 70

bytes

@jonhoo
Copy link
Author

jonhoo commented May 20, 2013

The flows you indicate correspond mostly to the kind of flow I had in mind too. Some points though:

  • Surely Accept-Ranges: bytes would always need to be present as the server does not know if the client might check for it or not?
  • For the Connection dropped after receiving 70 bytes, shouldn't Content-Range be 0-69/100, not 0-99/100? And for that use-case I'd say Content-Length should be 70, not 100, no?
  • Entity-Receive: available doesn't seem entirely intuitive to me either as it could be read as "Entity-Receive is available" rather than "Receive the part of the entity that is available". I'm not sure any special headers apart from Range (and maybe Content-Range) are needed...

How about something like this?

# Incomplete file
HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org

HTTP/1.1 200 Ok
Content-Length: 70
Entity-Length: 100
Accept-Ranges: bytes

# Get only bytes the server has
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Range: bytes=0-69

HTTP/1.1 200 OK
# Not a 206 because it's not a partial reply considering
# the user is only asking for bytes we have
Content-Length: 70
Content-Range: bytes 0-69/70

bytes

# Stream (and what will happen to regular curl-like clients)
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org

HTTP/1.1 200 OK
Content-Length: 100

bytes

@vayam
Copy link
Member

vayam commented May 21, 2013

Surely Accept-Ranges: bytes would always need to be present as the server does not know if the client might check for it or not?

Accept-Ranges is optional for server to implement. I should have probably mentioned it is an extension.

For the Connection dropped after receiving 70 bytes, shouldn't Content-Range be 0-69/100, not 0-99/100? And for that use-case I'd say Content-Length should be 70, not 100, no?

Even range based requests should work same as GET. because, standard video players would do range based
requests, if server supports it

Entity-Receive: available doesn't seem entirely intuitive to me either as it could be read as "Entity-Receive is available" rather than "Receive the part of the entity that is available".

Agreed. No Custom Request Headers

How about something like this?
Incomplete file
HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org

HTTP/1.1 200 Ok
Content-Length: 70
Entity-Length: 100
Accept-Ranges: bytes

Shouldn't it be
HTTP/1.1 200 Ok Content-Length: 100 Entity-Received: 70 <- or Any better name to indicates actual bytes received

Get only bytes the server has
GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Range: bytes=0-69

Yes that would work on a server that supports Range based requests.

HTTP/1.1 200 OK
Not a 206 because it's not a partial reply considering
the user is only asking for bytes we have

Not true. All standard implementations - Akamai, S3 return 206 for Range: bytes=0- even if they are sending all data. You can check with any standard HTML5 video player.

It would be nice to have GET with range and without to be consistent. Because if the same url is passed to video player, it will do range requests if server supports it.
Eg: <video src="http://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216">

For the video to play it has to return Content-Length = Entity-Length

If server supports range requests and you know how many bytes were actually received, you can do byte range

GET /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Range: bytes=0-69 

Response
HTTP/1.1 200 Ok
Content-Range: bytes 0-69/100
Content-Length: 70

@felixge @jonhoo The more I think all we need is a better name for Offset to indicate actual bytes received.
We can recommend server support standard HTTP 1.1 Range requests to allow partial download and video player seek support.

@jonhoo
Copy link
Author

jonhoo commented May 21, 2013

Shouldn't it be
HTTP/1.1 200 Ok
Content-Length: 100
Entity-Received: 70 <- or Any better name to indicates actual bytes received

No, as we discussed in #26, Content-Length should indicate the size of the content present in the body of the reply (or for HEAD, the length of the content that would have been present in the body of the reply for the corresponding GET). Entity-Length is a made up header that should indicate the size of the "real" entity once it has finished uploading.

I do see your point that now HEAD and GET return different values, so perhaps swapping them around might be appropriate. That is, say that Entity-Length is always the length of the body of the response and Content-Length is the length of the full object...

Not true. All standard implementations - Akamai, S3 return 206 for Range: bytes=0- even if they are sending all data.

Ah, ok, I wasn't aware. Fair enough - 206 it is then. I'm not sure I agree with that interpretation of the standard, but in this case it might be better to follow the de facto standard.

For the video to play it has to return Content-Length = Entity-Length

Are you sure about this?

If server supports range requests and you know how many bytes were actually received, you can do byte range

That example there seems good to me. Making the last number of Content-Range be the total size of the entity is a good way of doing it. I think, for consistency, we might want to add in Entity-Length there as well to make it consistent with HEAD and GET without byte ranges

@vayam
Copy link
Member

vayam commented May 21, 2013

I do see your point that now HEAD and GET return different values, so perhaps swapping them around might be appropriate. That is, say that Entity-Length is always the length of the body of the response and Content-Length is the length of the full object...

Entity-Length already means length of full object
Either we have Entity-Received or something equivalent or keep Offset as is.

For the video to play it has to return Content-Length = Entity-Length

Are you sure about this?

Yes

@jonhoo
Copy link
Author

jonhoo commented May 21, 2013

Ah, sorry, I misread Entity-Received as Entity-Length. That makes much more sense now. Yes, I think that might be a good way of doing it. Essentially this means that we're always sending Content-Length as the size of the full entity, so I suppose we could actually get rid of Entity-Length altogether?

The end result of doing it that way would be that the default will always be to download the full file even if that means waiting for the server to receive all the bits. Services that want only the available bits would then use a Range request to get only those bytes based on the value of Entity-Received. That seems a good solution to me. @felixge ?

@vayam
Copy link
Member

vayam commented May 21, 2013

The reason we have Entity-Length is we have to send Content-Length: 0 during file creation

Request:

POST /files HTTP/1.1
Host: tus.example.org
Content-Length: 0
Entity-Length: 100
Response:

HTTP/1.1 201 Created
Location: http://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216

@jonhoo
Copy link
Author

jonhoo commented May 21, 2013

Okay, fair enough, but it doesn't seem to be needed for anything download related?

@vayam
Copy link
Member

vayam commented May 21, 2013

Yes that is correct.

@felixge
Copy link
Contributor

felixge commented May 22, 2013

Ok, to summarize:

  • rename: Final-Length -> Entity-Length
  • keep: Offset for PATCH requests and HEAD/GET responses (do we keep the name?)
  • Content-Length for GET/HEAD is always Entity-Length

Did I miss anything?

@jonhoo
Copy link
Author

jonhoo commented May 22, 2013

I still think Offset in HEAD/GET responses is misleading. Something like Entity-Received as suggested by @vayam above seems more appropriate. For PATCH, Offset makes sense, but wouldn't Range be more appropriate so we don't have to come up with our own specialized header?

Also, the bits about being able to request only certain ranges of a file (that is, bits that the server already has) should probably be mentioned in the spec?

Apart from that I think you have everything.

@vayam
Copy link
Member

vayam commented May 22, 2013

Ok, to summarize:
rename: Final-Length -> Entity-Length

Yes

keep: Offset for PATCH requests and HEAD/GET responses (do we keep the name?)

Not sure. Not entirely convinced with Entity-Received Header. Unless you can come up with a better one

Content-Length for GET/HEAD is always Entity-Length

Yes

Did I miss anything?

Nope

@vayam
Copy link
Member

vayam commented May 22, 2013

@jonhoo

For PATCH, Offset makes sense, but wouldn't Range be more appropriate so we don't have to come up with our own specialized header?

Can you come up with a valid byte range request where in server responds back with "bytes received so far", without breaking our set assumption Content-Length=Entity-Length? Because I couldnt come up with one.

@jonhoo
Copy link
Author

jonhoo commented May 22, 2013

@vayam not sure I understand your question? The server already includes Entity-Received (or something like it) in the HEAD, so the client knows how much data the server has. Couldn't it then use Range: 70-/100 in the PATCH request to indicate that it's uploading bytes 70 and onwards?

@vayam
Copy link
Member

vayam commented May 22, 2013

The reasoning behind Offset
Related #2 and @felixge's summary

@jonhoo
Copy link
Author

jonhoo commented May 22, 2013

Ah, fair enough. If Range is standardized to only be meaningful to GET and Offset is on the track for becoming a standard then I'd say go ahead with Offset for PATCH - it makes more intuitive sense than Range anyway.

I still believe Offset is the wrong header to use for GET/HEAD though, as we're not serving data from an offset in those requests, rather we're only serving data up to a certain point in the stream. Here a header like Entity-Received seems much more correct.

@vayam
Copy link
Member

vayam commented May 23, 2013

@jonhoo I agree Entity-Received is more intuitive for GET and HEAD
@felixge What are your thoughts on this?

@jonhoo
Copy link
Author

jonhoo commented May 23, 2013

How about Entity-Available over Entity-Received?

@vayam
Copy link
Member

vayam commented May 23, 2013

Sounds good to me. We should clarify that it is the bytes received or available at the time HEAD/GET request was made.

@felixge
Copy link
Contributor

felixge commented May 25, 2013

@jonhoo what's the benefit of Entitiy-Available / Entitiy-Received over the Offset header for HEAD/GET responses? Maybe it's just me, but I find both names to be more confusing.

@jonhoo
Copy link
Author

jonhoo commented May 27, 2013

In my opinion, Offset doesn't make sense as a response to a HEAD/GET as the value describes the amount of content available, not an offset into that content.

@vayam
Copy link
Member

vayam commented Jun 10, 2013

I did some experiments with tusd and brewtus node server. It is not trivial to implement Content-Length == Entity-Length for GET request while upload is in progress. To implement a smooth streaming of url, the GET request should throttle to current upload speed. This can potentially lead bugging GET implmentations. A simple GET implementation like this and this would cause timeouts

Here is my test

Upload to tus.io demo site using tuspy

python tuspy.py -f ~/vayam-dev/Largefile.mov 
http://master.tus.io/files/d949a91388ef1d7ab4e74d5203f57ebd
{'content-length': '0', 'access-control-allow-methods': 'HEAD,GET,PUT,POST,PATCH,DELETE', 'access-control-expose-headers': 'Location, Range, Content-Disposition, Offset', 'date': 'Mon, 10 Jun 2013 03:47:19 GMT', 'access-control-allow-origin': '*', 'access-control-allow-headers': 'Origin, X-Requested-With, Content-Type, Accept, Content-Disposition, Final-Length, Offset', 'offset': '0'}

Now issue a download while upload is in progress.

curl -v "http://master.tus.io/files/d949a91388ef1d7ab4e74d5203f57ebd" > /dev/null 
* About to connect() to master.tus.io port 80 (#0)
*   Trying 54.235.134.243...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* connected
* Connected to master.tus.io (54.235.134.243) port 80 (#0)
> GET /files/d949a91388ef1d7ab4e74d5203f57ebd HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
> Host: master.tus.io
> Accept: */*
> 
< HTTP/1.1 200 OK
< Access-Control-Allow-Headers: Origin, X-Requested-With, Content-Type, Accept, Content-Disposition, Final-Length, Offset
< Access-Control-Allow-Methods: HEAD,GET,PUT,POST,PATCH,DELETE
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Location, Range, Content-Disposition, Offset
< Content-Length: 4058518957
< Date: Mon, 10 Jun 2013 03:47:49 GMT
< Offset: 0
< Content-Type: application/octet-stream
< 
{ [data not shown]
  0 3870M    0 6159k    0     0   487k      0  2:15:22  0:00:12  2:15:10  842k* transfer closed with 4051146053 bytes remaining to read
  0 3870M    0 7200k    0     0   542k      0  2:01:39  0:00:13  2:01:26  839k
* Closing connection #0
curl: (18) transfer closed with 4051146053 bytes remaining to read <------------------Request times out

@felixge, @jonhoo I suggest we keep it simple. Make Content-Length == Offset and Add Entity-Length header to GET and HEAD like we discussed in #26

@jonhoo
Copy link
Author

jonhoo commented Jun 10, 2013

@vayam in a way the timeout you show above has the same end result as having Content-Length == Offset - you only download as much of the file as available at the time. Keeping it the way we've discussed thus has the same behavior for legacy clients, and it allows a server to implement streaming downloads if it so chooses?

@vayam
Copy link
Member

vayam commented Jun 10, 2013

@jonhoo the issue is Offset at the time response would be less the actual content downloaded. The download size would depend on upload and download speeds.

@felixge, how about we get rid of Offset in response header keep Content-Length == Entity-Length by default.
Add a new header Streaming: off for getting the bytes received by server.

HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Streaming: off

Response:

HTTP/1.1 200 Ok
Content-Length: 70

By default

HEAD /files/24e533e02ec3bc40c387f1a0e460e216 HTTP/1.1
Host: tus.example.org
Streaming: on <--- Optional / on by default

Response:

HTTP/1.1 200 Ok
Content-Length: 100

@felixge, @jonhoo can we discuss this on irc. i am usually available morning EST

@jonhoo
Copy link
Author

jonhoo commented Jun 10, 2013

Wouldn't it then make more sense as we decided above to include Entity-Available in the response - that way conforming clients could decide what behavior they want based on whether only parts of the file are available.

They could either choose to do a streaming download by requesting the whole file and not timing out, or they could choose to do a non-streaming download by only downloading bytes 0-70 using Range. Legacy clients will time out if the upload is not faster than their download speed, but I'm not sure this is really unexpected behavior?

@vayam
Copy link
Member

vayam commented Jun 11, 2013

I have been in two minds about this. Both have advantages and disadvantages. For now, I will throttle GET requests to keep my downloads smooth while upload is in progress.

@felixge
Copy link
Contributor

felixge commented Jul 8, 2013

@vayam sorry for the lack of activity on the project lately. A few things have changed on my end, which means I won't be able to continue with the project for a while : /. Is there any chance you might be interested in taking over the project? If so @kvz and @tim-kos would be happy to help with anything you might need.

@vayam
Copy link
Member

vayam commented Jul 8, 2013

@felixge you started something awesome here. I would love to take it forward. Let us talk over skype (vnarenv) or on irc(vayam) about the details. Let me know what times work for you. Excited about working with @kvz and @tim-kos.

@felixge
Copy link
Contributor

felixge commented Jul 9, 2013

@vayam added you on skype, looking forward to having a chat!

@Acconut
Copy link
Member

Acconut commented Dec 16, 2014

This seems to be an interesting idea in conjunction with long running uploads but also adds additional problems when looking at uploads with unknown-size (see #16) and non-contiguous data streams (see #3).

Until we have finished the original aim of tus, uploading data, I would like to move this to the backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants