Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about FDR with the results of BUSTED #1767

Open
SWei2333 opened this issue Nov 25, 2024 · 5 comments
Open

Questions about FDR with the results of BUSTED #1767

SWei2333 opened this issue Nov 25, 2024 · 5 comments

Comments

@SWei2333
Copy link

Dear professor,

I performed BUSTED analysis on 7,401 genes to identify convergent positively selected genes. Due to the species I selected for my study, I ended up with very few significant p-values. When I applied FDR correction, many ideal candidate genes were excluded. May I ask if it is acceptable to proceed with the analysis using the raw p-values, or is it better to rely on the q-values obtained after FDR correction?

Uploading 屏幕截图 2024-11-25 205828.png…

@SWei2333
Copy link
Author

image

@spond
Copy link
Member

spond commented Nov 25, 2024

Dear @SWei2333,

  1. How many sequences do you have? The fraction of significant results looks pretty low.
  2. What BUSTED options did you use (and which version did you run)?
  3. It looks like you simply may have unavoidable power limitations. Instead of running genome wide screens, if you have a set of candidate genes beforehand, you might want to constrain testing to those genes. Relaxing test stringency is inviting, but if you do that, then you can't control false positives. I would at least continue using FDR.

Best,
Sergei

@SWei2333
Copy link
Author

Dear Professor,

  1. I have a total of 7,401 genes.

  2. My command(HYPHY 2.5.42) :
    hyphy BUSTED --alignment ../one2one.filter.fa/ENSMUST00000000001.4.flt.cds.fa --tree ./one2one.unroot.FG.tree/ENSMUST00000000001.4.unroot.FG.nwk --branches Foreground --output ./result/ENSMUST00000000001.4.json

3.I would like to ask why reducing the number of genes included in the analysis would affect the p-values. Isn’t each gene analyzed independently in HyPhy?

  1. I am not sure if the reason for the small number of significant genes is due to my species selection. My gene tree is structured in such a way that it can only be divided into four groups based on traits.(Branches with color are set as foreground branches.)
    image

@spond
Copy link
Member

spond commented Nov 25, 2024

Dear @SWei2333,

It does seem like you simply have a small sample situation; relatively few branches which are relatively short. I would suggest

  1. Get the latest hyphy version, because I did a lot of work on BUSTED in the last year.
  2. For smaller datasets, it sometimes helps to simplify the model

hyphy busted --rates 2 --syn-rates 2 --starting-points 5 --branches Foreground ...

  1. If you reduce the number of genes you tested, the p-values themselves don't change, but you have to correct for fewer tests.

Best,
Sergei

@SWei2333
Copy link
Author

SWei2333 commented Nov 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants