Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with unknown nucleotides (nnn../NNN...) in the reference transcript sequence "transcripts.13cds10.transcripts.fa" #2

Open
marcasriv opened this issue Apr 4, 2019 · 0 comments

Comments

@marcasriv
Copy link

Hi again,

I've encountered the following error when trying to train my model on my own data :
I start running :

python /home/marina/git-repos/iXnos/iXnos/reproduce_scripts/28mer_models.py \
        s28_cod_n5p4_nt_n15p14 \
        /home/marina/git-repos/iXnos/iXnos/expts/dmso /home/marina/git-repos/iXnos/iXnos/expts/dmso/process/dmso.transcript.mapped.wts.sam \
        /home/marina/git-repos/iXnos/iXnos/genome_data/crigri.transcripts.13cds10.lengths.txt /home/marina/git-repos/iXnos/iXnos/genome_data/crigri.transcripts.13cds10.fa \
        /home/marina/git-repos/iXnos/iXnos/expts/dmso/process/tr_set_bounds.size.28.28.trunc.20.20.min_cts.200.min_cod.100.top.300.txt /home/marina/git-repos/iXnos/iXnos/expts/dmso/process/te_set_bounds.size.28.28.trunc.20.20.min_cts.200.min_cod.100.top.300.txt \
        /home/marina/git-repos/iXnos/iXnos/expts/dmso/process/outputs.size.28.28.txt 35 \
        32

which outputs

s28_cod_n5p4_nt_n15p14
[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
[-15, -14, -13, -12, -11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
Traceback (most recent call last):
  File "/home/marina/git-repos/iXnos/iXnos/reproduce_scripts/28mer_models.py", line 34, in <module>
    nonlinearity="tanh", widths=[200], update_method="nesterov")
  File "/home/marina/git-repos/iXnos/iXnos/iXnos/interface.py", line 404, in make_lasagne_feedforward_nn
    filter_max=filter_max, filter_pct=filter_pct, filter_test=filter_test)
  File "/home/marina/git-repos/iXnos/iXnos/iXnos/process.py", line 1118, in load_lasagne_data
    max_struc_width=max_struc_width, aa_feats=aa_feats)
  File "/home/marina/git-repos/iXnos/iXnos/iXnos/process.py", line 1158, in get_data_matrices_lasagne
    max_struc_width=max_struc_width, aa_feats=aa_feats)
  File "/home/marina/git-repos/iXnos/iXnos/iXnos/process.py", line 672, in get_X
    for gene in sorted_genes for A_site in codon_set[gene]])
  File "/home/marina/git-repos/iXnos/iXnos/iXnos/process.py", line 1248, in get_rel_cod_feats
    features[i*64 + cod2id[cod]] = 1
KeyError: 'nnn'
makefile:600: recipe for target '/home/marina/git-repos/iXnos/iXnos/expts/dmso/lasagne_nn/s28_cod_n5p4_nt_n15p14/init_data/init_data.pkl' failed
make: *** [/home/marina/git-repos/iXnos/iXnos/expts/dmso/lasagne_nn/s28_cod_n5p4_nt_n15p14/init_data/init_data.pkl] Error 1

I understand that the KeyError : 'nnn' that is thrown is due to the presence of unknown sequences (n(s)) in my reference transcriptome file (that are not included as keys in the python dictionary of codons ). I had thought of just removing those transcripts with unknown sequences from the reference and re-mapping again. However, I have checked some of the transcriptome reference files that are provided in your iXnos/genome_data and I have found that the reference transcriptome file for the Iwasaki experiment (human.transcripts.13cds10.transcripts.fa) also contains nns. I have managed to successfully run Iwasaki's models in my system, so I was wondering if you dealt with the same issue, and being that the case, if you could provide any insights on how to solve it .

Thank you very much for your kind help.

Best,

Marina

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant