Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't find transcript ids from fasta in bed #8

Open
XMTian opened this issue Dec 9, 2021 · 4 comments
Open

Can't find transcript ids from fasta in bed #8

XMTian opened this issue Dec 9, 2021 · 4 comments

Comments

@XMTian
Copy link

XMTian commented Dec 9, 2021

Hi,

I tried to use how_are_we_stranded_here to figure out the strands of my RNA-seq data.

Here are my commands:

check_strandedness --gtf Midas.annotation.v2.6.gtf -r1 Aast_14_trimmed_R1.fq.gz -r2 Aast_14_trimmed_R2.fq.gz -fa Midas.2.6.transcripts.fa

Here is how the gtf file looks like.

head Midas.annotation.v2.6.gtf

1 funannotate transcript 6208 30525 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";
1 funannotate exon 6208 6471 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";
1 funannotate exon 7462 7555 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";
1 funannotate exon 16749 16855 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";
1 funannotate exon 17474 17511 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";
1 funannotate exon 17707 17785 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";
1 funannotate exon 17863 17957 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";
1 funannotate exon 18669 18714 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";
1 funannotate exon 18793 18939 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";
1 funannotate exon 19209 19291 . + . transcript_id "Midas_000001-T2"; gene_id "Midas_000001";

How Midas.2.6.transcripts.fa looks like.

>Midas_000001-T1
AAAAAAAAAAAAAAAAAAAAAAATTCGGCTTATACTGCAGCTTAGAAGTCGTGCTCAGAGGAACGAGCTTTATCCCACTGCATTTCGGGCAGGTTAGAAGTCGTGCCGAGTAACGAAAAAAAAGAGGGAATTCGTGTTCATGAGCACGAATCAATAGATTAAATTTCGTT
>Midas_000001-T2
CTGACGAAGTGTCCTGATGCGATCCACGTCCCATTTTTACGACGGAGACCGCTTGCTGACGGCAGCCGGTTGCCATAACGTCCCTCCAGCAACATGTCCTCGTCGGGGGCGCAGCGTGTCGGTCCCGCGGCGGCTTTCCCAGAGAACCAGGGCGGTGCGGCGGCGGCAGG
>Midas_000002-T1
CATAACTCAAAAAACCATCAGTGAAGTCACAGCTGCTAGTCTTTGGAGCAGATCTCAATAAAAACCACAGCTTGTCCTCATCTCATGGCCTCATCCTCTCTCTGCAGCTGCAGGCGTGGTGGACTTGAACCTGTGCAACATCCGGGACATGGAGGTCATCGAGCTGAGCA
>Midas_000003-T1
GTGTCATTCACAATAAAAATCAGAGCAGCTTGGACTTGCATCCAAACAGAACACCATCAAAGTAGAAGAAGCACCAGATGACGACCTGTAGCTTTTCAAACACCTGAAGTTCAGCTTTCTGACAGGAGCCATCAGGCCTGAACCTCCAGCACTGCCTAAGGAGCCTTCAA
>Midas_000004-T1
ATGACAGAGAGAGATAGACTAGAGCTGAGGAGACCGCCATGGAGAGAGAGAGGACAGAGAAGAGCGAGACAGAGAGCGATACATACAGAGAGAGAGACCGCCAGAGAGAGAGACCCCGACCGAGAGAGCGGCGACGACCGAGAGGCGCGCACAACAGCACATAGAGAGAG
>Midas_000005-T1
CTGGACTGGACCAGAGCGACATCATGAAGCTGCTGAGACACGGCATCTACACTCTGCTGGTAATTTGCAGTGTATTGTGGGCTTCTTGCTCCAAGGTTAAAGCTGAATCATCTCCTGGATGTGACACCACCTTGACGTTCTCCTCAGAATTGAGCACCTTGACTGAAGGA
>Midas_000006-T1
GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGCGAGCGAGCGAGCGAGCACGCGCACGCAGCAGGTCAAACTCGCTGACAACAGGTCCAAAGACCCGGACAGAAAAACAATAGAAGAGAGTAGAAAATGAGTGGTGCAG
>Midas_000007-T1
GTAAGCCAGCAGTCTGAGAGTGAAGTGCTCTGTTGGGGTGATACAGTACTATGAGGTCTTTGAAATAAGATGGGGCCTGATTATTCAAGACCTTTTTGCATTTGTTCTGGGACCCCTGTGTCCTCATGCATGGAAATACAGAGGCGAGGGGAGAAGCAGCACCACCACCT

But I got this error, I checked the transcript ids, there are all in the fasta file.

Results stored in: stranded_test_Aast_14_trimmed_R1_2
converting gtf to bed
Checking if fasta headers and bed file transcript_ids match...
Can't find transcript ids from Midas.2.6.transcripts.fa in stranded_test_Aast_14_trimmed_R1_2/Midas.annotation.v2.6.bed
Trying to converting fasta header format to match transcript ids to the BED file...
Can't find any of the first 10 BED transcript_ids in fasta file... Check that these match

Could you help me to figure it out?

Best,
Xiaomeng

@LauraPugh
Copy link

Hi,

I get the same error. Did you find a solution?

@OlivierBakker
Copy link

Hi,

Just bumping this as I am also running into the same issue and there doesn't seem to be a solution. I re-generated the cdna fasta based on the Ensembl 99 gtf file. I also manually checked the first 10 records, and they are definitely in there. Would love an update on this.

@signalbash
Copy link
Owner

I think this was an issue with formatting. I've edited the code to strip whitespace and it tentatively works locally on the Midas transcriptome/gtf examples you gave.
0fedf92

@KristinaGagalova
Copy link

KristinaGagalova commented Nov 16, 2023

Hi,
I am using the pip version of the tool and I am having the same issue. Would be possible to update this bug fix in all versions? The commit that you are pointing to works well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants