Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about the alignment length calculation #42

Open
samuell opened this issue Dec 19, 2024 · 1 comment
Open

Some questions about the alignment length calculation #42

samuell opened this issue Dec 19, 2024 · 1 comment

Comments

@samuell
Copy link
Contributor

samuell commented Dec 19, 2024

Hi,

I'm trying to understand the alignment length calculation in get_align_len().

Based on the definition of CIGAR_OPS_ALL:

CIGAR_OPS_ALL = [0, 1, 2, 4]

... together with the definition of these integer values in the PySam docs I get that get_align_len() counts all:

  • Matches
  • Insertions
  • Deletions
  • Softclips

Based on this I've go two questions:

  1. What is the main motivation for not using pysam.AlignedSegment.query_alignment_length directly?
  2. I notice that the query_alignment_length does not count deletions, which I understandmight be one reason why you did not want it?
  3. I'm still a little confused as to why you are not including mismatches in your calculation, which is something that is done in query_alignment_length?

Sorry for my if I have missed some explanation of these implementation details somewhere!

Also to be clear, I'm probably just ignorant of how the calculation should be done, so mostly asking to understand this better!

@samuell samuell changed the title Question about the alignment length calculation Some questions about the alignment length calculation Dec 19, 2024
@samuell
Copy link
Contributor Author

samuell commented Dec 19, 2024

I'm still a little confused as to why you are not including mismatches in your calculation, which is something that is done in query_alignment_length?

Ah, regarding 3. above, I have now learned that apparently the original standard of the CIGAR format is used here, where explicit mismatches are not included. So that explains that.

Still curious about questions 1 and 2, if you have the time though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant