dataset Required #1

yasirniazi · 2020-11-12T13:01:45Z

Hi dear,
Hope so you are well and healthy.
I just start working on it. I want to run this code for understanding complete work. for that purpose, I need the all_data.txt file required.
the link given for the dataset is not understandable for me. So kindly provide a complete dataset for source code running.
Thanks

mdahasan · 2020-11-13T01:43:18Z

Hello, You can download the data from my shared drive. https://drive.google.com/file/d/1DWmKMHkZtmu7S-DuPF3IRTdnWR9gfyWj/view?usp=sharing Hope this helps. Thank you.

…

On Thu, Nov 12, 2020 at 5:02 AM yasirniazi ***@***.***> wrote: Hi dear, Hope so you are well and healthy. I just start working on it. I want to run this code for understanding complete work. for that purpose, I need the all_data.txt file required. the link given for the dataset is not understandable for me. So kindly provide a complete dataset for source code running. Thanks — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE3WVZFAUT3Z5Y7O3XANIA3SPPMERANCNFSM4TTIF7SA> .

-- *Md. Abid Hasan Ph.D.* Algorithms and Computational Biology Lab Department of Computer Science and Engineering Bourns College of Engineering University of California Riverside, CA 92521 Principal Scientist I Bioinformatics Roche Sequencing Solutions, Inc. Pleasanton, CA 94588

scchess · 2022-03-20T20:53:03Z

Thanks @mdahasan . Can I please also get a copy of the gene_snp_frequency.txt file missing in the repo? Thanks!

mdahasan · 2022-03-20T21:33:38Z

hi @scchess , I apologize, it's been many years, and this project isn't actively maintained.
I was looking for the file you requested, but it seems like I can't find it. (Also, this is poor python code, my early work and not the best).
However, I was looking at the code. I think the gene_snp_frequency.txt file is a product of 1_data_preprocess.py file. If you check this line

mClass---Multiple-cancer-classification/1_data_preprocess.py

Line 71 in 387ead1

all_sample_cancer_snp_data[gene_index] += 1

This should be the "per gene snp count". I'm not sure why this isn't stored in a file called gene_snp_frequency.txt but maybe you can just write all_sample_cancer_snp_data in a file name gene_snp_frequency.txt and that should work.

Again, I apologize for the inconvenience. As I said, it's an old work from an ignorant python coder.

scchess · 2022-04-16T20:59:03Z

What about?

import sys
import pandas as pd

df = pd.read_csv(sys.argv[1], sep="\t")
sums = dict(df.sum(axis=0))
x = dict(sums)
with open("gene_snp_frequency.txt", "w") as w:
    for gene in sums:
        if gene != "Cancer_type":
            w.write(gene + "\t" + str(sums[gene]) + "\n")
print("Generated: gene_snp_frequency.txt")

mdahasan · 2022-04-16T22:13:07Z

I can't say for sure if it'll work on not but seems like it should. The file gene_snp_frequency.txt is simply just the gene name and corresponding SNP count for that gene across all samples. Should be pretty straightforward.

scchess · 2022-04-17T00:39:31Z

Thanks. Looks like the file gene_snp_frequency.txt is working. However, running 6_feature_selection_with_mi.py got a missing All_Class_feature_MI_down.txt error. I'm not sure how to generate this file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset Required #1

dataset Required #1

yasirniazi commented Nov 12, 2020

mdahasan commented Nov 13, 2020 via email

scchess commented Mar 20, 2022

mdahasan commented Mar 20, 2022

scchess commented Apr 16, 2022

mdahasan commented Apr 16, 2022

scchess commented Apr 17, 2022

dataset Required #1

dataset Required #1

Comments

yasirniazi commented Nov 12, 2020

mdahasan commented Nov 13, 2020 via email

scchess commented Mar 20, 2022

mdahasan commented Mar 20, 2022

scchess commented Apr 16, 2022

mdahasan commented Apr 16, 2022

scchess commented Apr 17, 2022