-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
different percent identity/alignment length for identical sequences on opposite strands #201
Comments
Thanks for the great write-up of this issue. The differences you see is interesting. I don't quite have time to look in-depth right now, but one thing you could try is to add the option StarAMR runs blast against the ResFinder database, which can consist of many sequences for the same gene with minor variation. StarAMR then attempts to pick the "best" blast hit based on some logic related to percent identity and length. Logic for this is here: staramr/staramr/blast/results/BlastResultsParser.py Lines 115 to 122 in d8370b7
Maybe something strange is going on with how the blast hit to report is being selected? The |
Thanks for the quick reply. I tried running with the |
It's possibly a blast issue. You can run with verbose turned on I did test out by adding qacE_1_X68232 to the beginning and ending of an example genome (and reverse complementing the 2nd sequence) and here are the records I get:
That is, they have identical percent identity and length. Maybe there is some particular sequence around one of the copies of your qacE gene that is causing blast to report a slightly different hsp? |
I will take a look at the flanking sequences. This is my output from running the above plasmids with
A pattern I've noticed is that this is only happening in cases (qacE, blaOXA-10, catB3) where there is a less than 100% length match to the reference sequence. Could that possibly be influencing this discrepancy? If I align my plasmid sequence against the reference 333 bp length of X68232 in the online NCBI blastn interface, I get two hits that have 100% identity over 282/333 nucleotides for both strands, which I would interpret as correct |
Thanks. We will have to investigate this further later on as we don't have the resources to look into this issue right now. My suspicion though is differences in blast parameters or blast versions. You could try directly running the version of blast used by StarAMR to look at the output more closely. |
Hello,
I've come across an odd discrepancy - I have a plasmid with two copies of qacE gene (X68232) on opposite strands and I'm getting differing percent identity/alignment length to the reference for each copy despite the nucleotide sequences being identical. The output I get from StarAMR/Resfinder is the following:
Initially this suggested to me that these are two distinct sequences: one on the forward strand with 100 % identity over 282 nt, and one on the reverse strand with 99.65% identity over 285 nt. However, I aligned both qacE sequences to the X68232 reference to search for where the sequences differed and found the following:
Seems like the alignment is ending at 282 nt for the forward strand, but the next three bases in the reverse strand are included with a single mismatch over the 285 nt length. Even though the forward strand has the same sequence, the last three nts are being excluded. Given the nt sequences are identical on both strands, I would expect the ResFinder output to be the same for both strands. Is this an artifact of StarAMR or is this a BLAST issue?
I'm also noticing a similar pattern with blaOXA-10 (J03427) - getting 100 % alignment over 768/801 nt in the reference when blaOXA-10 is on the reverse strand, but 99.24 % alignment over 788/801 nt in the reference when blaOXA-10 is on the forward strand. Sequences between forward and reverse are 100% identical in both cases.
plasmid file: plasmid.fasta.txt
version = 0.10.0 (same issue occurs in v0.9.1)
resfinder_gene_drug_version = 072621
resfinder_db_date = Tue, 24 May 2022 06:51
The text was updated successfully, but these errors were encountered: