Skip to content

Commit

Permalink
Add anaylsis module (#22)
Browse files Browse the repository at this point in the history
* anaylsis

* finalize notebook, docs

* docs

* gray 2D umap

* docs, umaps

* customized colors

* update docs
  • Loading branch information
roshankern authored Sep 29, 2022
1 parent 19bfa5b commit 9f5312b
Show file tree
Hide file tree
Showing 12 changed files with 1,276 additions and 8 deletions.
4 changes: 2 additions & 2 deletions 3.normalize_data/raw_data_umaps.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"import sys\n",
"sys.path.append(\"../utils\")\n",
"from load_utils import compile_mitocheck_batch_data, split_data\n",
"from analysis_utils import get_2D_umap_embeddings, show_2D_umap"
"from analysis_utils import get_2D_umap_embeddings, show_2D_umap_from_embeddings"
]
},
{
Expand Down Expand Up @@ -539,7 +539,7 @@
"\n",
"for metadata_field in metadata_fields:\n",
" metadata = metadata_dataframe[metadata_field]\n",
" show_2D_umap(x_data, y_data, metadata, f\"{results_dir}/controls_{metadata_field}.png\")"
" show_2D_umap_from_embeddings(x_data, y_data, metadata, f\"{results_dir}/controls_{metadata_field}.png\")"
]
}
],
Expand Down
33 changes: 33 additions & 0 deletions 4.analyze_data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# 4. Analyze Data

In this module, we analyze the normalized training data features from [3.normalize_data](../3.normalize_data/normalized_data/training_data.csv.gz).

### Feature Analysis

We use [UMAP](https://github.com/lmcinnes/umap) for analyis of features.
UMAP was introduced in [McInnes, L, Healy, J, 2018](https://arxiv.org/abs/1802.03426) as a manifold learning technique for dimension reduction.
We use UMAP to reduce the feature data from 1280 features to 1, 2, and 3 dimensions.
We use [Matplotlib](https://matplotlib.org/) to visualize the 1D, 2D, and 3D UMAPS.

For each reduction with UMAP, we create two types of visualizations.
The first visualization colors all points by their phenotypic class.
The second visualization colors points for only certain phenotypic classes, with all other phenotypic classes being colored gray.

**Note:** Phenotypic classes colored in second visualization can be changed with the `classes_2 = [
` variable in [analyze_data.ipynb](analyze_data.ipynb).

## Step 1: Analyze Data

Use the commands below to analyze training data.
All UMAPs will be saved to [umaps/](umaps/).

```sh
# Make sure you are located in 4.analyze_data
cd 4.analyze_data

# Activate mitocheck_data conda environment
conda activate mitocheck_data

# Analyze data
bash analyze_data.sh
```
Loading

0 comments on commit 9f5312b

Please sign in to comment.