Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Internally assigning Z for HMM class #57

Open
erfanshekarriz opened this issue Dec 9, 2023 · 3 comments
Open

[Feature Request] Internally assigning Z for HMM class #57

erfanshekarriz opened this issue Dec 9, 2023 · 3 comments

Comments

@erfanshekarriz
Copy link

erfanshekarriz commented Dec 9, 2023

Hello!

Thank you for the awesome and much-needed package.

I wanted to ask if there could be an internal way to determine the Z value or if it can be stored as an internal attribute of the plan7. HMMFile class. For example :

from pyhmmer.plan7 import HMMFile
from pyhmmer.plan7 import SequenceFile

with HMMFile("path/to/hmm/db.hmm") as hmm_fiel:
   hmms = hmm_file.read()
   Z_val = hmms.Z 

   with SequenceFile(proteins, digital=True) as seqs:
      prots = seq_file.read_block()
      pli = Pipeline(hmms.alphabet, Z=Z_val, bit_cutoffs="gathering")
      hits = pli.search_hmm(hmms, prots)

Sometimes I am working with hmm databases where I don't know the z value in advance. I usually find out using

z_score=$(grep -c "NAME" ${input.db})

but it would be nice if it could be easily internally handled!

Best,

Erfan

@erfanshekarriz erfanshekarriz changed the title [Feature Request] Internally calculating Z value [Feature Request] Internally assigning Z for HMMFile class Dec 9, 2023
@erfanshekarriz erfanshekarriz changed the title [Feature Request] Internally assigning Z for HMMFile class [Feature Request] Internally assigning Z for HMM class Dec 9, 2023
@erfanshekarriz
Copy link
Author

erfanshekarriz commented Dec 9, 2023

I've found an easy workaround with:

 with HMMFile(hmmdb) as hmm_file:
   hmms = list(hmm_file)
   Z_val = len(hmms)

but I'll still keep the issue open since an attribute for the class might make it more intuitive.

Also, correct me if I'm wrong, but can we pass the Z_val and bit_cuttoffs to pyhmmer.hmmer.hmmsearch as follow:

pyhmmer.hmmer.hmmsearch(hmms, prots, cpus=cores_n, Z=Z_val, bit_cutoffs="gatherings")

Not sure because I couldn't find it anywhere in the documentation. I could only find inputs of Z and bit_cutoff in Pipeline class but would be nice to also include in the HMMER class options too.

Thanks!

@althonos
Copy link
Owner

I wanted to ask if there could be an internal way to determine the Z value or if it can be stored as an internal attribute of the plan7. HMMFile class.

Unfortunately I this cannot be done in PyHMMER, because I'm wrapping the original HMMER implementation and HMM file format, so I cannot change what is getting serialized and stored in there. I would recommend that you just store the Z value somewhere else if you can control the HMMs beings used (such as versioned Pfam).

I've found an easy workaround ...

The big problem here is that you load all HMMs into a list when you're only interested in counting them 😉 If you need to pre-compute the Z value, consider:

 with HMMFile(hmmdb) as hmm_file:
   Z = sum(1 for hmm in hmm_file)

@althonos
Copy link
Owner

Oh and regarding this:

Not sure because I couldn't find it anywhere in the documentation. I could only find inputs of Z and bit_cutoff in Pipeline class but would be nice to also include in the HMMER class options too.

The hmmsearch documentation states that:

Any additional arguments passed to the hmmsearch function will be passed transparently to the Pipeline to be created.

Which is what happens there; by passing additional keyword arguments, you configure the Pipeline and make it use the provided Z and bit_cutoffs arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants