You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
trying to run gsea on for loop for 43 subclusters, but sometimes the kernel died during es/nes calculation. not sure what is the issue, but was thinking probably due to the memory usage because it ran OK when I reduce the permutation from 1000 to 100. Thinking of running it outside of the notebook (1000 permutation is the default settings). how do I effectively transform this code so I can run it with sbatch?
import gseapy as gp
import numpy as np
import pandas as pd
main_dir = "/projects/robson-lab/research/endometriosis/"
sample_id = "Endo-Tissue-EC19001-EC20015"
subclusters = pd.read_csv(f"{main_dir}analysis/{sample_id}/DEG/edgeR-input/subtypelist.csv", sep=",", index_col=0)["0"].tolist() #subclusters contains the name of each subtypes, eg ['CTL', 'NK1', 'mid-secretory',...]
for clustername in subclusters:
deg = pd.read_csv(f"{main_dir}analysis/{sample_id}/DEG/edgeR-output/{clustername}-DEG.csv",
sep=",", index_col=0)
deg = deg[deg.FDR < 1e-03]
pairs = deg.loc[:,deg.columns.str.contains("logFC")].columns.tolist()
for pair in pairs:
get_list = deg[pair].reset_index()
get_list.rename(columns={"index":"names",pair:"logfoldchanges"},inplace=True)
get_list.sort_values(by="logfoldchanges",ascending=False,inplace=True)
res = gp.prerank(rnk=get_list, gene_sets="GO_Biological_Process_2018",
outdir=f"{main_dir}/analysis/{sample_id}/DEG/GSEA-output/{clustername}-{pair}/",
no_plot=True,verbose=False,permutation_num=1000)
terms = res.res2d[res.res2d.fdr<0.05].sort_values(by="nes",ascending=False)
terms.to_csv(f"{main_dir}/analysis/{sample_id}/DEG/GSEA-output/GOBP-filtered/gsea_{clustername}-{pair}.csv")`
The text was updated successfully, but these errors were encountered:
trying to run gsea on for loop for 43 subclusters, but sometimes the kernel died during es/nes calculation. not sure what is the issue, but was thinking probably due to the memory usage because it ran OK when I reduce the permutation from 1000 to 100. Thinking of running it outside of the notebook (1000 permutation is the default settings). how do I effectively transform this code so I can run it with sbatch?
The text was updated successfully, but these errors were encountered: