Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential issue in identifying inconsistent delimiters #16

Open
npb596 opened this issue Jan 18, 2022 · 1 comment
Open

Potential issue in identifying inconsistent delimiters #16

npb596 opened this issue Jan 18, 2022 · 1 comment

Comments

@npb596
Copy link

npb596 commented Jan 18, 2022

I think this is a neat tool and was just trying it out today. I am curious about one potential issue. I've noticed if the first 3 fields are separated by tab but any subsequent fields are separated by another delimiter (e.g. space) then bedqc will give green checks all around but incorrectly identify the file as having 3 fields. Files where the first 4 fields are delimited by tabs but with space delimitation afterwards seem to run through bedtools intersect (and I assume other functions) fine so this may rarely be an issue but I assume inconsistent delimitation of this type may happen often (like appending extra columns to a bed file) and it may help for bedqc to explicitly report something like this. I apologize if I'm simply missing something though or if this is a trivial point.

@brwnj
Copy link
Contributor

brwnj commented Jan 19, 2022

BED does allow the delimiter to change throughout the file as long as it's either space or tab, but right now my main concern is successfully grabbing the first 3 columns in order to continue the analysis of the intervals. The files in this case being exposed to the bedtools command have been standardized as tab-delimited and sorted. This particular issue will have to be addressed in the function where I allow the user to download a "fixed" version of their file. Of course, that depends on finding these more complex issues in the first place...

I'm attempting to account for most of the cases outlined by the Hoffman group (https://github.com/hoffmangroup/acidbio) and we're both working from the official specs at:

https://samtools.github.io/hts-specs/BEDv1.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants