Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attempt txt input update #44

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from
Open

attempt txt input update #44

wants to merge 4 commits into from

Conversation

sanghoonio
Copy link
Member

This is to allow create_igd_f() to work with a .txt input file instead of a directory for fileList of beds.

@nleroy917
Copy link
Member

Oh so this will basically say:

If filelist is an actual file on disk, read it in... otherwise assume its a directory and loop through

@nleroy917
Copy link
Member

I feel like it should be better to just have it read from stdin, right? That way you can be like:

cat filelist.txt | gtars igd

or

ls path/to/files/*.bed.gz | gtars igd

@sanghoonio
Copy link
Member Author

For usability yeah, just added the txt file input for now because it seemed easier to add. I'll try stdin inputs

@donaldcampbelljr
Copy link
Member

I feel like it should be better to just have it read from stdin, right? That way you can be like:

cat filelist.txt | gtars igd

or

ls path/to/files/*.bed.gz | gtars igd

The C version builds from a directory or a .txt file which is why I suggested it (to continue towards total feature parity). Is reading from stdin objectively better?

@sanghoonio
Copy link
Member Author

From an R bindings perspective having to write a txt file from R to read in rust might negligibly slow things down over passing in bed filepaths directly. With stdin though, we can input multiple paths like below right?

path/to/files/file1.bed.gz path/to/files/file2.bed.gz path/to/files/file3.bed.gz | gtars igd

@donaldcampbelljr
Copy link
Member

From an R bindings perspective having to write a txt file from R to read in rust might negligibly slow things down over passing in bed filepaths directly. With stdin though, we can input multiple paths like below right?

path/to/files/file1.bed.gz path/to/files/file2.bed.gz path/to/files/file3.bed.gz | gtars igd

Ah ok. Got it. Thanks.

@nleroy917
Copy link
Member

The C version builds from a directory or a .txt file which is why I suggested it (to continue towards total feature parity). Is reading from stdin objectively better?

I'm not too sure which is better. I think its more in line with "UNIX philosophy" to be able to pipe things around, but I suppose I'm not too opinionated one way or another.

@donaldcampbelljr
Copy link
Member

I fixed this and the functionality now works for reading a .txt file.

@donaldcampbelljr
Copy link
Member

And I added stdin as an option so you can do this:

echo "/home/drc/GITHUB/gtars/gtars/tests/data/igd_file_list/igd_bed_file_1.bed" | cargo run igd create --output /home/drc/IGD_TEST/output/ --filelist -

And it works.

Please confirm this meets your needs and then merge to dev when you are ready.

Copy link
Member

@donaldcampbelljr donaldcampbelljr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make sure this works like you expect it to.

@sanghoonio
Copy link
Member Author

And I added stdin as an option so you can do this:

echo "/home/drc/GITHUB/gtars/gtars/tests/data/igd_file_list/igd_bed_file_1.bed" | cargo run igd create --output /home/drc/IGD_TEST/output/ --filelist -

And it works.

Please confirm this meets your needs and then merge to dev when you are ready.

Looks like it creates a .igd with .txt input but it the .igd output doesn't seem to work with search. A .igd created with a bed file directory doesn't work with search either unless I switch back to dev and recompile. Can you let me know if you can reproduce this?

sam@Sams-MacBook-Pro release % ./gtars igd create --output /Users/sam/Documents/Work/episcope/.test/igd/ --filelist /Users/sam/Documents/Work/episcope/.test/bed_paths_10.txt

Number of Bed Files found:
10
Temporary Tiles:
 nCtgs (igd.nctg): 63, nRegions (igd.total): 4840383, nTiles (nt): 191146
IGD saved to: /Users/sam/Documents/Work/episcope/.test/igd/igd_database.igd
Total Intervals: 4644521, l_avg: 690.1135
nctg:63  nbp:16384
sam@Sams-MacBook-Pro release % ./gtars igd search --database /Users/sam/Documents/Work/episcope/.test/igd/igd_database.igd --query /Users/sam/Documents/Work/episcope/.test/bed2.bed

thread 'main' panicked at src/igd/search.rs:434:41:
index out of bounds: the len is 0 but the index is 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@donaldcampbelljr
Copy link
Member

I can't reproduce the above error. However, when using the dev branch of IGD, I notice a bug where I'm creating an .igd from a directory and my search is returning 0 hits.

However, if I create an igd from a single source file and query that source file against the igd, it appears to be working fine.

I will have to investigate this early next week. I suspect the issues may be related.

@sanghoonio
Copy link
Member Author

Hmm I'm still getting the same error after pulling and recompiling, are there differences between mac and linux when compiling rust that can cause something like this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants