Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updategff produces wrong output if parts of the scaffolds are used in the AGP file #188

Open
taprs opened this issue Sep 13, 2024 · 0 comments

Comments

@taprs
Copy link

taprs commented Sep 13, 2024

Hi!

Not sure if that was the indended use of the updategff command (or even the AGP format), but I was trying to update annotation based on an AGP file where I introduced a break into the chromosome and got corrupted GFF file. Minimal example follows:

echo $'## gff-version 3.1\nbibaboba\ttest\tgene\t5\t6\t.\t+\t.\tID=sas' > example.gff
echo $'## agp-version 2.1\nbiba\t1\t4\t1\tW\tbibaboba\t1\t4\t+\nboba\t1\t4\t1\tW\tbibaboba\t5\t8\t+' > example.agp

$cat  example.gff

## gff-version 3.1
bibaboba        test    gene    5       6       .       +       .       ID=sas


$cat example.agp

## agp-version 2.1
biba    1       4       1       W       bibaboba        1       4       +
boba    1       4       1       W       bibaboba        5       8       +

Obviously, positions 5 and 6 of bibaboba should then be converted into positions 1 and 2 of boba. But what I get is

$ ragtag.py updategff example.gff example.agp
Fri Sep 13 11:10:44 2024 --- VERSION: RagTag v2.1.0
Fri Sep 13 11:10:44 2024 --- CMD: ragtag.py updategff example.gff example.agp
## gff-version 3.1
boba    test    gene    5       6       .       +       .       ID=sas
Fri Sep 13 11:10:44 2024 --- INFO: Goodbye

The coordinates do not get transformed.

For completeness, I attach the solution that leaves me with the correct coordinates (for BED files though):

#!/usr/bin/gawk -f
 BEGIN{
 help="\
    This script updates a BED file using the transformation defined by an AGP file. \n\
    That is, BED in coordinates of contigs gets transformed to coordinates of scaffolds. \n\
    Pleaze bgzip and tabix the BED file before operation. \n\
    EXAMPLE: updategff.awk file.agp file.bed.gz > file_transform.bed"
   if (ARGC < 2) {print help; exit 1}
   OFS="\t"
 }

 # Exploit tabix to query bed files

 NR==FNR && !/^#/ && $5=="W" {
   cmd="tabix " ARGV[2] " " $6":"$7"-"$8
   while ( cmd | getline bedline > 0 ) {
    split(bedline, bed)
    if (bed[2] < $7 || bed[3] > $8) { next }
    if ($9=="+"){
     printf "%s\t%s\t%s", $1, bed[2]-$7+$2, bed[3]-$7+$2
    } else {
     printf "%s\t%s\t%s", $1, $8-bed[3]+$2, $8-bed[2]+$2                                                                                                                           }
   if (length(bed) > 3) {
     for (i=4;i<=length(bed);i++) {
      printf "\t%s", bed[i]
      print ""
     }
    }
   }
   close(cmd)
 }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant