Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSEApy #8

Open
yulianatan opened this issue Apr 8, 2021 · 0 comments
Open

GSEApy #8

yulianatan opened this issue Apr 8, 2021 · 0 comments

Comments

@yulianatan
Copy link

yulianatan commented Apr 8, 2021

trying to run gsea on for loop for 43 subclusters, but sometimes the kernel died during es/nes calculation. not sure what is the issue, but was thinking probably due to the memory usage because it ran OK when I reduce the permutation from 1000 to 100. Thinking of running it outside of the notebook (1000 permutation is the default settings). how do I effectively transform this code so I can run it with sbatch?

import gseapy as gp
import numpy as np
import pandas as pd

main_dir = "/projects/robson-lab/research/endometriosis/"
sample_id = "Endo-Tissue-EC19001-EC20015"

subclusters = pd.read_csv(f"{main_dir}analysis/{sample_id}/DEG/edgeR-input/subtypelist.csv", sep=",", index_col=0)["0"].tolist() #subclusters contains the name of each subtypes, eg ['CTL', 'NK1', 'mid-secretory',...]

for clustername in subclusters:
    deg = pd.read_csv(f"{main_dir}analysis/{sample_id}/DEG/edgeR-output/{clustername}-DEG.csv", 
                      sep=",", index_col=0)
    deg = deg[deg.FDR < 1e-03]
    pairs = deg.loc[:,deg.columns.str.contains("logFC")].columns.tolist()
    
    for pair in pairs:
        get_list = deg[pair].reset_index()
        get_list.rename(columns={"index":"names",pair:"logfoldchanges"},inplace=True)
        get_list.sort_values(by="logfoldchanges",ascending=False,inplace=True)
        res = gp.prerank(rnk=get_list, gene_sets="GO_Biological_Process_2018",
                         outdir=f"{main_dir}/analysis/{sample_id}/DEG/GSEA-output/{clustername}-{pair}/",
                         no_plot=True,verbose=False,permutation_num=1000)
        terms = res.res2d[res.res2d.fdr<0.05].sort_values(by="nes",ascending=False)
        terms.to_csv(f"{main_dir}/analysis/{sample_id}/DEG/GSEA-output/GOBP-filtered/gsea_{clustername}-{pair}.csv")`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant