A comprehensive analysis of differential gene expression matrix from The Cancer Genome Atlas (TCGA) repository.
- The installation of R software and packages for counting raw reads
- Use the MS windows generic text editor "Notepad" and if you want to use more efficient text editor then install either a "TextPad" or "Notepad++".
- Broad GDAC firehose (https://gdac.broadinstitute.org/)
-
Before starting the analysis, always make a separate directory so that they may not replace the files of the previous analysis i.e. "C:/Users/User_name/Desktop/GitHub/HNSCC_DEGs/" on the Desktop.
-
In this turoial, you will be using the mRNAseq_preprocess archived file of Head and Neck squamous cell carcinoma obtained from the Broad GDAC firehose. Decompress the downloaded file and use the "HNSC.mRNAseq_raw_counts.txt" file as a source for "raw_counts.txt".
-
For assigning features to the raw counts, also retrieve the clinical archived file that is termed as "Clinical_Pick_Tier1". After decompressing the file, import "All_CDEs.txt" into MS Excel application and transpose the data frame of the sheet. Also remember that you should look and remove
-
Prepare a meta_data.txt from the clinical
-
Download mRNAseq_preprocess dataset. In this turoial, we will be using the mRNAseq_preprocess file of Head and Neck squamous cell carcinoma.
-
In addition, we will also need to download clinical dataset which includes information related to the age, sample type, gender, cancer stage etc of the sample IDs.
-
Create a metafile from the clinical dataset
- Change the directory path to your dataset location.
- Customize the name of input and output file names as you like.
- Run the script step by step.
- Let's see the score using statistical cutoff of each gene in the resulting list.