Skip to content

Unsupervised machine learning of WSI

License

Notifications You must be signed in to change notification settings

BiCroLab/nucleAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nucleAI

machine learning on WSI

Project Organization

.
├── data
│   ├── covds		<- temporary dir storing covariance descriptors per sample
│   └── features	<- temportary dir storing morphological features per sample
├── docs		<- Documentation about the project
├── models		<- Trained and serialized models, model predictions, or model summaries
├── notebooks		<- jupyter notebooks used in this project
└── py			<- python scripts used in this project

The requirements.txt file list the working environment used when the pipeline was run, though not all packages listed in the file might be necessary.

List of main scripts:

./scripts.sh		<- used to generate features and descriptors for a given cancer type
./script.intensity.sh	<- called by scripts.sh used either to generate features or descriptors
./covd2umap.sh		<- used to get the umap projection of a collection of descriptors
./py/test.covd_with_intensity_parallelOverPatches.py	   <- called by script.intensity.sh, used to get the morphological features
./py/test.mask2descriptor.py				   <- called by script.intensity.sh, used to get the descriptors
./py/test.covd_2dProjection.py				   <- called by covd2umap.sh, used to get the umap projection for a collection of descriptors

To run the pipeline, make sure that the raw svs files and the polygon data are located in a directory with the same structure as the original one:

/media/garner1/hdd2/svs_BRCA   <- example dirpath to BRCA svs samples, with subdirectory per each sample
/media/garner1/hdd2/TCGA_polygons/BRCA/TCGA-05-4245-01Z-00-DX1...svs.tar.gz <- example dirpath to single BRCA sample polygon file

Also, make sure that the anaconda folder is located as expected in the script.intensity.sh (open the script and check where it has been hard-coded)

Then open the scripts.sh file and edit the type of cancer that you want to process (BRCA, LUAD, etc.). Save and exit.

Then run

bash scripts.sh

The data folder will contain the output for both morphological features and covariance descriptors.

To get the UMAP representation of the descriptors see the content of covd2umap.sh to have examples for different cancer types, or to get the global TCGA projection. At the moment of writing these notes a bug seems to affet the umap-learn library when run on numbers of samples exceeding few towsand elements: lmcinnes/umap#496

The UMAP projection still works when run on individual cancer-types.

About

Unsupervised machine learning of WSI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published