The objective is to build a model that can classify two major acute ischemic stroke etiology subtypes:
- CE (Cardioembolic)
- LAA (Large Artery Atherosclerosis).
The dataset for this competition comprises over a thousand high-resolution whole-slide digital pathology images. Each slide depicts a blood clot from a patient that had experienced an acute ischemic stroke. The slides comprising the training and test sets depict clots with an etiology (that is, origin) known to be either CE (Cardioembolic) or LAA (Large Artery Atherosclerosis).
train.csv:
image_id
: A unique identifier for this instance having the form{patient_id}
_{image_num}
. Corresponds to the image{image_id}.tif
.center_id
: Identifies the medical center where the slide was obtained.patient_id
: Identifies the patient from whom the slide was obtained.image_num
:Enumerates images of clots obtained from the same patient.label
: The etiology of the clot, eitherCE
orLAA
. This field is the classification target.
An example can be seen:
image_id | center_id | patient_id | image_num | label |
---|---|---|---|---|
008e5c_0 | 11 | 008e5c | 0 | CE |
00c058_0 | 11 | 00c058 | 0 | LAA |
026c97_0 | 4 | 026c97 | 0 | CE |
049194_0 | 5 | 49194 | 0 | CE |
049194_1 | 5 | 49194 | 1 | CE |
The training WSI (Whole Slide Images) are massive in filesize due to their high resolutions. I was able to shrink the dataset down from ~241 gigabytes down to a few gigabytes. The preprocessing can be generalized:
- Load large .tif WSI
- Crop WSI using PyVips smart crop with attention features
- Resize image to specified width x height
- Delete parts of image that contain low signal
- Export as JPEG with quality set to 100%
I was able to submit two entries for evaluation due to time constraints and issues with loading images without running out of memory. First, I tried AutoGluon with swin_large_patch4_window7_224. Second, I used Keras with Tensorflow to apply transfer learning & fine-tuning techniques by using the latest EfficientNet B4 with NoisyStudent + RandAugment pre-trained weights.
I attempted to use Monai, fastMonai, PathML, and cuCIM, but I encountered problems properly loading the WSI (memory constraints or unknown error) or slow processing. However, these libraries appear promising, and I would like to experiment with them again in the future.
Additionally, this challenge introduced me to the concept of MIL (multiple instance learning) and how it can be used to train WSIs by reducing memory constraints and training on unmodified tiles. Finally, I plan on going through the winning solutions and attempting to understand other approaches to tackling this challenge.
There was a total of 896 teams competiting, 1,025 competitors, and 6,980 entries. Based on the final results, my model ranked within the top 28% of submissions and placed 240/888.