Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP server #40

Open
drtamermansour opened this issue Aug 3, 2015 · 9 comments
Open

HTTP server #40

drtamermansour opened this issue Aug 3, 2015 · 9 comments

Comments

@drtamermansour
Copy link
Member

For the sack of Horse transcriptome project, I should initiate some UCSC track hubs. These hubs require publicly available web server. The annotations files are small but if we decided to include BAM files (which I highly recommend), then we need of course much bigger disk space.

For now I am using "Genome Browser in the Box" which is local virtual machine that simulate the UCSC website. So nothing urgent.

@ctb
Copy link
Member

ctb commented Aug 3, 2015

On Mon, Aug 03, 2015 at 02:16:56PM -0700, Tamer Mansour wrote:

For the sack of Horse transcriptome project, I should initiate some UCSC track hubs. These hubs require publicly available web server. The annotations files are small but if we decided to include BAM files (which I highly recommend), then we need of course much bigger disk space.

For now I am using "Genome Browser in the Box" which is local virtual machine that simulate the UCSC website. So nothing urgent.

Can you host these on figshare or github, do you know?

@mr-c
Copy link
Contributor

mr-c commented Aug 3, 2015

I'm told that the BAM files are in the 20+ gigabyte range

On Mon, Aug 3, 2015 at 2:21 PM C. Titus Brown [email protected]
wrote:

On Mon, Aug 03, 2015 at 02:16:56PM -0700, Tamer Mansour wrote:

For the sack of Horse transcriptome project, I should initiate some UCSC
track hubs. These hubs require publicly available web server. The
annotations files are small but if we decided to include BAM files (which I
highly recommend), then we need of course much bigger disk space.

For now I am using "Genome Browser in the Box" which is local virtual
machine that simulate the UCSC website. So nothing urgent.

Can you host these on figshare or github, do you know?


Reply to this email directly or view it on GitHub
#40 (comment).

Michael R. Crusoe: Programmer & Bioinformatician [email protected]
The lab for Data Intensive Biology; University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe

@ctb
Copy link
Member

ctb commented Aug 4, 2015

On Mon, Aug 03, 2015 at 03:31:23PM -0700, Michael R. Crusoe wrote:

I'm told that the BAM files are in the 20+ gigabyte range

Well, so what I'm really asking is: what file serving properties does the Web
server need? HTTP doesn't provide random access to files, so can we use
things like figshare (which can host 20 GB files, yes), or github LFS,
or S3?

Tamer, pointers to tech details would be useful - thanks!

@mr-c
Copy link
Contributor

mr-c commented Aug 4, 2015

? HTTP totally allows for retrieval of specific parts of a file since 1.1:
https://en.wikipedia.org/wiki/Byte_serving

On Tue, Aug 4, 2015 at 6:07 AM C. Titus Brown [email protected]
wrote:

On Mon, Aug 03, 2015 at 03:31:23PM -0700, Michael R. Crusoe wrote:

I'm told that the BAM files are in the 20+ gigabyte range

Well, so what I'm really asking is: what file serving properties does the
Web
server need? HTTP doesn't provide random access to files, so can we use
things like figshare (which can host 20 GB files, yes), or github LFS,
or S3?

Tamer, pointers to tech details would be useful - thanks!


Reply to this email directly or view it on GitHub
#40 (comment).

Michael R. Crusoe: Programmer & Bioinformatician [email protected]
The lab for Data Intensive Biology; University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe

@drtamermansour
Copy link
Member Author

UCSC Track hubs require certain data compression formats. According to
their description, the HTTP servers can retrieve the required data from
files as Micheal said.
I will try Figshare and see how it works.

On Tue, Aug 4, 2015 at 9:55 AM, Michael R. Crusoe [email protected]
wrote:

? HTTP totally allows for retrieval of specific parts of a file since 1.1:
https://en.wikipedia.org/wiki/Byte_serving

On Tue, Aug 4, 2015 at 6:07 AM C. Titus Brown [email protected]
wrote:

On Mon, Aug 03, 2015 at 03:31:23PM -0700, Michael R. Crusoe wrote:

I'm told that the BAM files are in the 20+ gigabyte range

Well, so what I'm really asking is: what file serving properties does the
Web
server need? HTTP doesn't provide random access to files, so can we use
things like figshare (which can host 20 GB files, yes), or github LFS,
or S3?

Tamer, pointers to tech details would be useful - thanks!


Reply to this email directly or view it on GitHub
#40 (comment).

Michael R. Crusoe: Programmer & Bioinformatician [email protected]
The lab for Data Intensive Biology; University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe


Reply to this email directly or view it on GitHub
#40 (comment).

@ctb
Copy link
Member

ctb commented Aug 5, 2015

On Tue, Aug 04, 2015 at 12:46:24PM -0700, Tamer Mansour wrote:

UCSC Track hubs requires certain data compression formats. According to
their description, the HTTP servers can retrieve the required data from
files as Micheal said.
I will try Figshare and see how it works.

Great, please let me know; I'm interested in this for a variety of reasons
that go beyond horse :).

MRC: I did not know that HTTP 1.1 allowed random access, cool. I wonder
if S3 supports?

@mr-c
Copy link
Contributor

mr-c commented Aug 5, 2015

Yes, HTTP 1.1 is near universal :-)

On Wed, Aug 5, 2015, 06:42 C. Titus Brown [email protected] wrote:

On Tue, Aug 04, 2015 at 12:46:24PM -0700, Tamer Mansour wrote:

UCSC Track hubs requires certain data compression formats. According to
their description, the HTTP servers can retrieve the required data from
files as Micheal said.
I will try Figshare and see how it works.

Great, please let me know; I'm interested in this for a variety of reasons
that go beyond horse :).

MRC: I did not know that HTTP 1.1 allowed random access, cool. I wonder
if S3 supports?


Reply to this email directly or view it on GitHub
#40 (comment).

Michael R. Crusoe: Programmer & Bioinformatician [email protected]
The lab for Data Intensive Biology; University of California, Davis
https://impactstory.org/MichaelRCrusoe http://twitter.com/biocrusoe

@mr-c
Copy link
Contributor

mr-c commented Aug 5, 2015

FigShare & S3 both support range requests (look for 206 Partial Content below)

(jessie)mcrusoe@localhost:~/git/khmer$ wget --start-pos 1K http://figshare.com/download/file/2143250
--2015-08-05 11:16:17--  http://figshare.com/download/file/2143250
Resolving figshare.com (figshare.com)... 54.154.140.224
Connecting to figshare.com (figshare.com)|54.154.140.224|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://files.figshare.com/2143250/khmer_1.4.1.tar.gz [following]
--2015-08-05 11:16:17--  http://files.figshare.com/2143250/khmer_1.4.1.tar.gz
Resolving files.figshare.com (files.figshare.com)... 54.231.132.13
Connecting to files.figshare.com (files.figshare.com)|54.231.132.13|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 25479526 (24M), 25478502 (24M) remaining [application/octet-stream]
Saving to: '2143250'

2143250                                                                100%[===========================================================================================================================================================================>]  24.30M  2.60MB/s   in 13s    

2015-08-05 11:16:31 (1.82 MB/s) - '2143250' saved [25479526/25479526]
(jessie)mcrusoe@localhost:~/git/khmer$ wget --start-pos 1K http://s3.amazonaws.com/cloudman/fs-archives/galaxyFS-dev-latest.tar.gz
--2015-08-05 11:19:05--  http://s3.amazonaws.com/cloudman/fs-archives/galaxyFS-dev-latest.tar.gz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.14.16
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.14.16|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 183026757 (175M), 183025733 (175M) remaining [application/gzip]
Saving to: 'galaxyFS-dev-latest.tar.gz'

@drtamermansour
Copy link
Member Author

I tried using iPlant data storage. It looks like a normal habitat for such genomic data. Also they have the iCommand package which allows add/remove data directly from clouds.
Unfortunately, it seems that their server does not support the the byte-range request. This is what UCSC display:

screen shot 2015-09-06 at 6 31 37 am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants