Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset Required #1

Open
yasirniazi opened this issue Nov 12, 2020 · 6 comments
Open

dataset Required #1

yasirniazi opened this issue Nov 12, 2020 · 6 comments

Comments

@yasirniazi
Copy link

Hi dear,
Hope so you are well and healthy.
I just start working on it. I want to run this code for understanding complete work. for that purpose, I need the all_data.txt file required.
the link given for the dataset is not understandable for me. So kindly provide a complete dataset for source code running.
Thanks

@mdahasan
Copy link
Owner

mdahasan commented Nov 13, 2020 via email

@scchess
Copy link

scchess commented Mar 20, 2022

Thanks @mdahasan . Can I please also get a copy of the gene_snp_frequency.txt file missing in the repo? Thanks!

@mdahasan
Copy link
Owner

hi @scchess , I apologize, it's been many years, and this project isn't actively maintained.
I was looking for the file you requested, but it seems like I can't find it. (Also, this is poor python code, my early work and not the best).
However, I was looking at the code. I think the gene_snp_frequency.txt file is a product of 1_data_preprocess.py file. If you check this line

all_sample_cancer_snp_data[gene_index] += 1
This should be the "per gene snp count". I'm not sure why this isn't stored in a file called gene_snp_frequency.txt but maybe you can just write all_sample_cancer_snp_data in a file name gene_snp_frequency.txt and that should work.

Again, I apologize for the inconvenience. As I said, it's an old work from an ignorant python coder.

@scchess
Copy link

scchess commented Apr 16, 2022

What about?

import sys
import pandas as pd

df = pd.read_csv(sys.argv[1], sep="\t")
sums = dict(df.sum(axis=0))
x = dict(sums)
with open("gene_snp_frequency.txt", "w") as w:
    for gene in sums:
        if gene != "Cancer_type":
            w.write(gene + "\t" + str(sums[gene]) + "\n")
print("Generated: gene_snp_frequency.txt")

@mdahasan
Copy link
Owner

I can't say for sure if it'll work on not but seems like it should. The file gene_snp_frequency.txt is simply just the gene name and corresponding SNP count for that gene across all samples. Should be pretty straightforward.

@scchess
Copy link

scchess commented Apr 17, 2022

Thanks. Looks like the file gene_snp_frequency.txt is working. However, running 6_feature_selection_with_mi.py got a missing All_Class_feature_MI_down.txt error. I'm not sure how to generate this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants