Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with vg inject in version 1.59 #4415

Open
NMarthe opened this issue Oct 8, 2024 · 2 comments
Open

Problem with vg inject in version 1.59 #4415

NMarthe opened this issue Oct 8, 2024 · 2 comments

Comments

@NMarthe
Copy link

NMarthe commented Oct 8, 2024

1. What were you trying to do?
I was trying to report coordinates from a linear genome onto the graph using vg inject, and report the coordinates from the graph to a target linear genome using vg surject.
I converted a BED file with my coordinates to a BAM file using bedtools bedtobam. Then I used the BAM file with vg inject to output a GAF file. The GAF file was used by vg surject to output a SAM file.

2. What did you want to happen?
I wanted the resulting SAM file to have the sequence corresponding to my coordinates in the target genome.

3. What actually happened?
When I used vg version 1.58 it worked as expected, but with vg version 1.59 and 1.60, the sequence field in the SAM file has a surprising sequence that doesn't correspond to DNA, and I don't what it corresponds to. The strange sequence in version 1.59 and 1.60 has the same length as the DNA sequence in version 1.58.
When I compared the GAF files from vg inject, it appears the problems comes from these files, so vg inject seems responsible.

Here is an example of the strange sequence in the SAM (field 10) and GAF (field 12) files :

1.58 SAM file :
TTTCTTGACCTCTGCTACTGCCATTCTGTACGGCCTCTTGTAAATTGGGGCGGTTCCTGGTGCCAAATCAATCCGGAATTCAATCTCTCTCTTGGGTGGCATAGTAGTGAGGTCCTCAGGGAACACCTCTGGATACTCGCAGACTATGGGGATATCTTCCAATTTCCTCCAACTCTTCTCCTCGGTGACCACCGAATTCTCCACCTCAATCTGATTTAAGGAGATTCCCTGCTTCAGTGCGACTGGTGACTTGTACACCACTGTTTCCTCTTTCTCATTGGTCAAGGTGACCGTGCGATTCGCACAATCAATGACACCCTTGAACTTAGTCAGCCAATCCAT

1.60 SAM file :
RNRRRGRC========N=AA====MCM=V=VRVGSAVGSWVSSRCRMCGMCRMCM=VRSBSMSKSAVMVMSWSSSWSRSGCRMCGMCRMCM=SRVTV=VCSRVMVMSRSGMHSVSRSBSR========S=W=====V=HC=A==D=GD=A==========R=BA====D=CK====CYGAGMCYGMRCCYGVGDCYGAGMCYGMGDCYGVMDCYGARSCYGARCCYGARMCYGVGCCYGMRMCYGAGDCYGMRMCYGVGHCYGARMMYMACYRGRSCYRGGDCYGARSCYGMGDCYGVRMCYRGRVCYGVRMCYRGRMCYRGRMCYGMRVCYRGRMCYRGRC

1.58 GAF file :
cs:Z:*CT*TT*CT*TC*GT*CT*TG*AA*GC:1*TT*CC*GT*TG*TC*GT*GA*CC*TT*GG*CC*CC*AA*TT*TT*CC*TT:1*TT*AA*CC*GG*GG*CC:1*TT*CC*TT*TT*GG*TT*AA*AA*AA*TT*TT:1*GG:1*GG*CC*GG*GG*TT*TT*CC:1*TT*GG*GG*TT*GG:1*CC*AA*AA*AA*TT*CC*AA*AA*TT:1*CC*GG*GG*AA*AA*TT*TT*CC*AA*AA*TT:1*TT*CC*TT*CC*TT*CC*TT*TT*GG*GG*GG*TT*GG*GG*CC*AA*TT*AA*GG*TT*AA*GG*TT*GG*AA*GG*GG*TT*CC:1*TT*CC*AA*GG*GG*GG*AA*AA*CC*AA*CC*CC*TT*CC*TT*GG*GG*AA*TT*AA*CC*TT*CC*GG*CC*AA*GG*AA*CC*TT*AA*TT*GG*GG*GG*GG*AA*TT*AA*TT*CC*TT*TT*CC*CC*AA*AA*TT:1*TT*CC*CC*TT*CC*CC*AA*AA*CC*TT*CC*TT*TT*CC*TT*CC*CC*TT*CC*GG*GG*TT*GG*AA*CC*CC*AA*CC*CC*GG*AA*AA*TT*TT*CC*TT*CC*CC*AA*CC*CC*TT*CC*AA*AA*TT*CC*TT*GG*AA*TT*TT*TT*AA*AA*GG*GG*AA*GG*AA*TT*TT:1*CC*CC*TT*GG*CC*TT*TT*CC:1*GG*TT:1*CC:3*TT:1*GG*TT:1*AA*CC*TT*TT:1*TT*AA*CC*AA*CC*CC:2*TT*GG*TT*TT*TT:1*CC*TT*CC*TT*TT*TT*CC*TT*CC*AA*TT*TT*GG*GG*TT:1*AA*AA*GG:1*TT*GG*AA*CC*CC:1*TT*GG*CC*GG*AA*TT*TT:1*GG*CC*AA*CC*AA*AA*TT*CC*AA*AA*TT*GG*AA*CC*AA*CC*CC:1*TT*TT*GG*AA:2*TT*TT*AA:1*TT:1*AA:1*CC*CC:1*AA*TT*CC*CC*AA*TT

1.60 GAF file :
cs:Z:*CR*TN*CR*TR*GR*CG*TR*AC*G=*C=*T=*C=*G=*T=*T=*G=*GN*C=*TA*GA*C=*C=*A=*T=*TM:1*TM*G=*TV*A=*CV*GR*GV*CG*CS*TA*CV*TG*TS*GW*TV*AS*AS*AR*TC*TR*GM*GC:1*GM:1*GR*GM*TC*TM*C=*CV*TR*GS*GB*TS*GM*CS*CK*AS:1*AV*TM*CV*AM*AS*TW*CS*CS*GS*GW*AS*AR*TS*TG:1*AR*AM*TC*CG*TM:1*TR*CM*TC*CM*T=*TS*GR*GV*GT*TV*G=*GV:1*AS*TR*AV*GM*TV*AM*GS*TR*GS*AG*GM*GH*TS*CV*CS*TR*CS*AB*GS*GR*G=*A=*A=*C=*A=*C=*C=*T=*CS*T=*GW*G=*A=*T=*A=*C=*TV*C=*GH:1*A=*GA*A=*C=*TD*A=*TG*GD*G=*GA*G=*A=*T=*A=*T=*C=*T=*T=*C=*C=*AR*A=*TB*TA*T=*C=*C=*T=*CD*C=*AC*AK*C=*T=*C=*T=*TC*CY*TG*CA*CG*TM:1*GY:1*TM*GR*AC:1*CY*AG*CV*CG*GD*AC*AY*TG*TA*CG*TM:1*CY*AG*CM*CG*TD:1*AY*AG*TV*CM*TD*GC*AY*TG*TA*TR*AS*AC*GY:2*GR*AC*TC*TY*CG*CA*CR*TM*GC*CY*TG*TV*CG*AC*GC*TY:1*CM*GR*AM:1*TY:1*GA*TG*GD*AC*CY*TG*TM*GR*TM*AC*CY*AG*CV*CG*AH:1*TY:1*TA*TR*TM*CM*CY*TM*CA*TC*TY*TR*CG*TR*CS*AC*TY*TR:2*TD:1*AY*AG*GA*GR*TS*GC*AY*CG*CM:1*TD*GC*CY:1*AV*TR*TM:1*GY*CR*AG*CR*AV*AC*TY*CG*AV*AR*TM*GC*AY*CR*AG*CR*CM:1*TY*TR:1*AR*AM:1*TY*TG*AM*GR*TV:1*AY*GR*CG*CR*AM*AC*TY*CR*CG*AR*TC

5. What data and command can the vg dev team use to make the problem happen?
The data is available here : https://filesender.renater.fr/?s=download&token=a1a584c0-46e3-4ff9-8eef-20c916237644

vg inject -o gaf -x NewMGC_Paths.xg nb_allFeatures_renamed_filter_vg.bam > nb_allFeatures_renamed_filter_vg.gaf

vg surject --xg-name NewMGC_Paths.xg --gaf-input --into-paths refpath_vg -s nb_allFeatures_renamed_filter_vg.gaf > nb_allFeatures_renamed_filter_vg.sam

6. What does running vg version say?

vg version v1.60.0 "Annicco"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by root@buildkitsandbox

or

vg version v1.58.0 "Cartari"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by root@buildkitsandbox

Thank you for your help.

@jeizenga
Copy link
Contributor

@NMarthe Apologies for the delay getting around to this. The link you provided the data at seems to have expired. Is it possible to renew it?

@NMarthe
Copy link
Author

NMarthe commented Nov 19, 2024

No problem. The link works for me, and it says the expiration date is december 4th 2024. I updated the link a few weeks ago, maybe you still have the old (expired) link ? Here is the new link : https://filesender.renater.fr/?s=download&token=a1a584c0-46e3-4ff9-8eef-20c916237644

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants