A simple package to train deep learning models on ultrasound data for COVID19.
The library itself has few dependencies (see setup.py) with loose requirements.
To run the code, just install the package pocovidnet
in editable mode:
git clone https://github.com/jannisborn/covid19_pocus_ultrasound.git
cd covid19_pocus_ultrasound/pocovidnet/
pip install -e .
NOTE: The repository is constantly updated with new data. If you want to
reproduce the results of our paper, use the repo's state of the arxiv
tag:
git checkout tags/arxiv
Then, please follow the instructions of the README.md
at that state (see
here).
If you want to use the latest version of the database, read below:
First, we have to merge the videos and images to create an image dataset.
You can use the script cross_val_splitter.py
to copy from pocus images and pocus videos. It will cope the images automatically and process all videos (read the frames and save every x-th frame dependent on the framerate supplied in args).
Note: In the script, it is hard-coded that only convex POCUS data is taken, and only the classes covid
, pneumonia
, regular
(there is not enough data for viral
yet). You can change this selection in the script.
From the directory of this README, execute:
python3 scripts/build_image_dataset.py
Now, your data folder should contain a new folder image_dataset
with folders covid
, pneumonia
, regular
and viral
or a subset of those dependent on your selection.
NOTE: The vast majority of data we gathered thus far is available in the data folder. But unfortunately, not all data used to train/evaluate the model is in this repo as we do not have the right to host/distribute the data from Butterfly.
However, we provide a script that automatically processes the data. To reproduce the experiments from the paper, please first complete the following steps:
- Visit the COVID-19 ultrasound_gallery of Butterfly, scroll to the bottom and download the videos (we accessed this source on 17.04.2020 for training our models. Please note that it is not under control whether Butterfly will keep this data online. Feel free to notify us if you observe any changes).
- Place the
.zip
folder into the data folder. cd
into the data folder.- Run:
NOTE: This step requires that you installed the
sh parse_butterfly.sh
pocovidnet
package before (see "Installation").
The butterfly images should now be added to data/image_dataset
.
The next step is to perform the datat split. You can use the script
cross_val_splitter.py
to perform a 5-fold cross validation (it will use the data from data/image_dataset
by default):
From the directory of this README, execute:
python3 scripts/cross_val_splitter.py --splits 5
Now, your data folder should contain a new folder cross_validation
with folders fold_1
, fold_2
. Each folder contains only the test data for
that specific fold.
If you want to add data from an uninformative class, see here.
As described above, the data from butterfly must be downloaded manually. We provide an automatic script to add the videos to the data/pocus_videos/convex
folder:
Assuming that you have already downloaded and unzipped the butterfly folder and renamed it to butterfly
, cd
into the data folder.
Then run:
python ../pocovidnet/scripts/process_butterfly_videos.py
Now all usable butterfly videos should be added to data/pocus_videos/convex
.
Afterwards you can train the model by:
python3 scripts/train_covid19.py --data_dir ../data/cross_validation/ --fold 0 --epochs 2
NOTE: train_covid19.py
will automatically utilize the data from all other
folds for training.
Given a pre-trained model, it can be evaluated on a cross validations split (--data
) with the following command:
python scripts/test.py [-h] [--data DATA] [--weights WEIGHTS] [--m_id M_ID] [--classes CLASSES] [--folds FOLDS] [--save_path SAVE_PATH]
We have explored method for video classification to exploit temporal information in the videos. With the following instructions one can train a video classifier based on 3D convolutions.
python scripts/eval_vid_classifier.py [-h] [--json ../data/video_input_data/cross_val.json] [--genesis_weights GENESIS_WEIGHTS][--cam_weights CAM_WEIGHTS] [--videos ../data/pocus_videos/convex]
A json file is provided that corresponds to the cross validation split in data/cross_validation
. To train a 3D CNN on a split, cd
into the folder of this README and run
python scripts/video_classification.py --output models --fold 0 --epoch 40
The models will be saved to the directory specified in the output
flag.
Current results (5-fold CV) are
Model | Accuracy | Balanced |
---|---|---|
VGG | 89.7% +- 5% | 89.6% +- 5% |
VGG-CAM | 89.5% +-2% | 88.1% +- 3% |
NASNetMobile | 75.7% +-9% | 71.1% +- 7% |
To access the pre-trained models, have a look here. The default configuration in the evaluation class Evaluator
in evaluate_covid19.py
uses the vgg_base
model which is stored in the Google Drive folder trained_models_vgg
. You can place the 5 folders named fold_1
... fold_5
into pocovidnet/trained_models
and should be ready to go to use the Evaluator
class.
- If you experience problems with the code, please open an issue.
- If you have questions about the project, please reach out:
[email protected]
.
The paper is available here.
If you build upon our work or find it useful, please cite our paper:
@article{born2020pocovid,
title={POCOVID-Net: Automatic Detection of COVID-19 From a New Lung Ultrasound Imaging Dataset (POCUS)},
author={Born, Jannis and Br{\"a}ndle, Gabriel and Cossio, Manuel and Disdier, Marion and Goulet, Julie and Roulin, J{\'e}r{\'e}mie and Wiedemann, Nina},
journal={arXiv preprint arXiv:2004.12084},
year={2020}
}