This is the instruction for training MASA. You can train MASA with any raw images you collected and transform your detector into a multiple object tracker. MASA training is consist of two steps: (1). using SAM to segment every object in the raw images (2). training MASA with those segments. We describe these two steps in detail below.
SA-1B datasets is a huge datasets contains raw images from diverse open world scenarios. In the paper, we use a subset of 500K images sampled from SA-1B dataset to train the default MASA tracker.
You can download the SA-1B dataset from here. You can create a folder to store SAM's data, eg data/sam/
. Then, extract images into one folder, eg. data/sam/batch0/
.
Since the SA-1B has already provide exhaustive segmentation generated by SAM. We can use them directly. For other raw images, you can run SAM-H model to get the segments of every object automatically in the images and save the results. We give an example on COCO images below.
(a). Download the 500K image name list from here. Then, you can put it in the data/sam/sam_annotations/jsons/sa1b_coco_fmt_iminfo_500k.json
(b). Run following script to convert the annotations into coco format.
python tools/format_conversion/convert_sa1b_to_coco.py --img_list data/sam/sam_annotations/jsons/sa1b_coco_fmt_iminfo_500k.json --input_directory data/sam/batch0 --output_folder data/sam/sam_annotations/jsons/
After running the script, yuo will get two json files in the data/sam/sam_annotations/jsons/
folder. One is the annotations of the segments, the other is containing the bounding boxes.
The bounding boxes are extracted from the segments. The latter is much smaller than the former, so we use the latter to train MASA. However, some advanced argumentation techniques may require the mask annotations, such as copy and paste, so we provide both of them.
You can also use any customize raw images for training your tracker. We give an example below of using COCO images.
You can download the COCO dataset from here. You can create a folder to store SAM's data, eg data/coco/
. Then, extract images into one folder, eg. data/coco/images/
.
You can use SAM-H model to get the segments of every object automatically. Specifically, you can run this script to generate the segments of the raw images.
You need to install SAM first and follow the instructions in the original repo. Then, you will get the annotations in the SAM format. To convert the SAM format into COCO format, you can use the above script (tools/format_conversion/convert_sa1b_to_coco.py
) we provided.
After generating the segments, you can train MASA with the segments. We provide the training script in the tools/train.py
. You can run the following command to train MASA.
Here is a multiple GPU training using Grounding-Dino as example:
-
Download the pre-trained grounding-dino weights from here. Then, put it in the
saved_models/pretrain_weights/
folder. -
Run the following command to train MASA with 8 GPUs:
tools/dist_train.sh configs/masa-gdino/masa_gdino_swinb_train.py 8 --work-dir saved_models/masa_gdino/
Training with other models is similar. You can find the configuration files in the configs/masa-gdino/
folder. You can also modify the configuration files for other models. If you want to train the SAM-based models, we provide the converted SAM ViT-B: sam_vit_b_01ec64_mmdet.pth and ViT-H: sam_vit_h_4b8939_mmdet.pth
weights here. You can download them and put them in the saved_models/pretrain_weights/
folder. Then, you can modify the configuration files to use these weights.