Training 2D pose model with combined datasets COCO, AI, OccHuman #858
Replies: 4 comments 1 reply
-
In my experience, finetuning the model on the target dataset (after jointly training models on multiple datasets), always improves the performance. However, using different joint-training schemes will produce similar results. It does not matter whether you use the intersection (the mentioned first approach) and/or union (the mentioned second approach) of different sets of keypoints. |
Beta Was this translation helpful? Give feedback.
-
@jin-s13 thanks you for your quick response. I have one concern regarding the first approach of joint training on multiple datasets with the help of disabled supervision on unlabeled key-points in some datasets. In this case, do you think the lack of supervision on the unlabeled key-points could cause any confusion to the training? Take one sample image from AIChallenger as an example, the eyes are visible in the image but the eye keypoints are not labeled, in this case, the supervision from the eyes will be missed. In contrast, when a sample image comes from COCO [during the same training], the eye key-points will be available and supervision is enabled. In conclusion, from images to images, even the two eyes are visible, but the supervision could be not consistent, sometimes available sometime not. Best |
Beta Was this translation helpful? Give feedback.
-
I think it is ok to skip some supervision, i.e. setting zero-losses for these unlabeled keypoints. |
Beta Was this translation helpful? Give feedback.
-
@jin-s13 |
Beta Was this translation helpful? Give feedback.
-
Hi MMPose Team,
I would like to ask a question about training a 2D pose model with combined datasets COCO, AI, OccHuman, etc.
There would be a much higher amount of training data by combining these datasets, so I think it would be interesting to discuss various approaches to the problem. In the below, I summarize two main approaches based on my reserach.
One-stage training by disabling training supervision on unlabeled key-points: First convert all datasets to the same format, and mark missing key-points as unlabeled. For example, AI has 14 key-points while COCO has 17 key-points. The combined dataset will have 17 key-points, but the 3 missing key-points from AI are set to unlabeled. During the training, the supervision on unlabeled key-poitns will be disabled,
Two-stage training: find a shared/minimum keypoint format from all datasets and perform an initial training on this shared dataset. In the second stage, change the number of output channels and then fine-tune on a target dataset. For example, for combining COCO and AI, we can first train a model with 14 keypoints format of the shared dataset. In the second step, we fine-tune the pre-trained model from the first step but with 17 output channels on the COCO dataset.
The second approach is less convenient because it requires multi-stages. But it seems to work, as tested in this research. The first approach is more convenient but I am not sure if it works or not.
Could you please let me know your opinions about the first approach? or other solutions for joint training that you can think of?
I would really appreciate it.
Best
Khanh Ha
Beta Was this translation helpful? Give feedback.
All reactions