Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes required to run experiments #3

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

botcs
Copy link

@botcs botcs commented Oct 27, 2019

Hi @krumo @yuhuayc ,

A few modifications and fixes were required to be able to run the tools/train_net.py with the provided config files - mostly just syntax updates, detectron dependency removals, and conversion script copying.

The aim was to change nothing crucial that could alter the outcome of the experiments.

after following the instructions for the sim10k -> cityscapes experiment, running the configs/da_faster_rcnn/e2e_da_faster_rcnn_R_50_C4_sim10k.yaml resulted in the following scores:

bbox: ('AP', 2.7570837762239852e-05), ('AP50', 0.00010808941156124346), ('AP75', 0.0) ...

the log file can be found here

Thank you for your time,
Csabi

@botcs
Copy link
Author

botcs commented Oct 29, 2019

Differential test of Domain Adaptation module:

For reference, to make sure that these modifications didn't ruin the underlying model's performance (i.e. without the domain adaptation turned on durinng training time) I have evaluated the score on the training data sim10k_cocostyle and the test data cityscapes_car_val_cocostyle:

DOMAIN_ADAPTATION_ON: False

2019-10-29 11:41:26,611 maskrcnn_benchmark.inference INFO: Start evaluation on sim10k_cocostyle dataset(10000 images).
100%|███████████████████████████████████████████████████████████| 10000/10000 [35:35<00:00,  4.68it/s]
2019-10-29 12:17:01,888 maskrcnn_benchmark.inference INFO: Total inference time: 0:35:35.276306 (0.21352763056755067 s / img per device, on 1 devices)
2019-10-29 12:17:02,605 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2019-10-29 12:17:02,605 maskrcnn_benchmark.inference INFO: Preparing bbox results
2019-10-29 12:17:03,988 maskrcnn_benchmark.inference INFO: Evaluating predictions
Loading and preparing results...
DONE (t=1.31s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=30.67s).
Accumulating evaluation results...
DONE (t=3.07s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.540
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.774
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.569
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.160
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.626
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.898
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.156
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.553
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.566
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.223
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.662
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.920
2019-10-29 12:17:41,270 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 0.5401814552497484), ('AP50', 0.7744829630955286), ('AP75', 0.5689471584971065), ('APs', 0.1601824173229022), ('APm', 0.626169522564484), ('APl', 0.8979002271831785)]))])
2019-10-29 12:17:41,526 maskrcnn_benchmark.inference INFO: Start evaluation on cityscapes_car_val_cocostyle dataset(500 images).
100%|███████████████████████████████████████████████████████████████| 500/500 [00:52<00:00,  9.56it/s]
2019-10-29 12:18:33,806 maskrcnn_benchmark.inference INFO: Total inference time: 0:00:52.279926 (0.10455985116958619 s / img per device, on 1 devices)
2019-10-29 12:18:33,834 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2019-10-29 12:18:33,834 maskrcnn_benchmark.inference INFO: Preparing bbox results
2019-10-29 12:18:33,862 maskrcnn_benchmark.inference INFO: Evaluating predictions
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=1.23s).
Accumulating evaluation results...
DONE (t=0.06s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.002
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.003
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.002
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.008
2019-10-29 12:18:35,220 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 0.00011697497180736782), ('AP50', 0.0006127225712633681), ('AP75', 2.195341485368049e-05), ('APs', 0.0020909364999688804), ('APm', 6.360421556555598e-05), ('APl', 0.00026351227760679067)]))])

DOMAIN_ADAPTATION_ON: True

2019-10-29 14:55:51,764 maskrcnn_benchmark.inference INFO: Start evaluation on sim10k_cocostyle dataset(10000 images).
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [39:28<00:00,  4.22it/s]
2019-10-29 15:35:20,046 maskrcnn_benchmark.inference INFO: Total inference time: 0:39:28.282218 (0.23682822177410126 s / img per device, on 1 devices)
2019-10-29 15:35:20,955 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2019-10-29 15:35:20,956 maskrcnn_benchmark.inference INFO: Preparing bbox results
2019-10-29 15:35:23,878 maskrcnn_benchmark.inference INFO: Evaluating predictions
Loading and preparing results...
DONE (t=6.06s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=125.60s).
Accumulating evaluation results...
DONE (t=8.89s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.009
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.009
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.022
2019-10-29 15:37:54,941 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 1.7166901112905856e-05), ('AP50', 9.456993660554297e-05), ('AP75', 2.3919286758761237e-06), ('APs', 0.0), ('APm', 1.437278834962424e-05), ('APl', 6.19350441388492e-05)]))])
2019-10-29 15:37:55,669 maskrcnn_benchmark.inference INFO: Start evaluation on cityscapes_car_val_cocostyle dataset(500 images).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [01:59<00:00,  4.18it/s]
2019-10-29 15:39:55,157 maskrcnn_benchmark.inference INFO: Total inference time: 0:01:59.487850 (0.23897570037841798 s / img per device, on 1 devices)
2019-10-29 15:39:55,197 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2019-10-29 15:39:55,197 maskrcnn_benchmark.inference INFO: Preparing bbox results
2019-10-29 15:39:55,281 maskrcnn_benchmark.inference INFO: Evaluating predictions
Loading and preparing results...
DONE (t=0.15s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=9.13s).
Accumulating evaluation results...
DONE (t=0.34s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.002
2019-10-29 15:40:05,425 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 8.143605149258735e-07), ('AP50', 3.612181721638052e-06), ('AP75', 4.3048718880888285e-07), ('APs', 4.143609627234912e-05), ('APm', 4.2675902321716617e-07), ('APl', 1.381882607238703e-06)]))])

Conclusion

  1. Minor syntax changes were required to be able to run with currently supported libraries. These modifications were imported from the base repository maskrcnn-benchmark. The changes did not break the training performance.

  2. Turning on the domain adaptation module results in complete failure both in the source domain and the target domain

@krumo
Copy link
Owner

krumo commented Oct 30, 2019

@botcs Thanks for your testing! Unfortunately I'm busy with another project now and I cannot test your changes in the future weeks. Your training loss seems normal. However, your performance on target domain with/without domain adaptation both look weird. As stated in the original paper, even with a VGG16 backbone, the network without adaptation could still achieve 30.12 car AP@50 on sim10k->cityscapes tasks. So I think the problems might be:

  1. There might be something wrong with your sim10k or cityscapes car annotation
  2. My code's compatibility with PyTorch 1.3 is bad

I will review your change as soon as I complete my current project. Thanks for your patience.

If you are interested, here is my training log for sim10k->cityscapes car detection with domain adaptation. Hope it can help you to find what might be going wrong.

@botcs
Copy link
Author

botcs commented Oct 30, 2019

There might be something wrong with your sim10k or cityscapes car annotation

I downloaded the dataset from the original sources and used the provided conversion scripts. The correctness of the converted dataset is indicated by the case when the domain adaptation head was turned off. I can also evaluate the model using the cityscapes as training dataset, or I can visualize the dataset with the annotations but I believe that the labels are correct.

My code's compatibility with PyTorch 1.3 is bad

No, apparently it's not. All syntax differences had been resolved, the only minor difference from PyTorch 1.0 and 1.3 is caused by the LR scheduler, but that's negligible. The code-base functions properly when the domain adaptation head is turned off or the lambda parameters are set to 0.

If you are interested, here is my training log for sim10k->cityscapes car detection with domain adaptation. Hope it can help you to find what might be going wrong.

Could you please upload the complete log file? This log stops at 40840 iterations. Also if you could upload the checkpoint file as well for the trained model it would be great!

Thanks,
Csabi

@krumo
Copy link
Owner

krumo commented Oct 30, 2019

I can also evaluate the model using the cityscapes as training dataset, or I can visualize the dataset with the annotations but I believe that the labels are correct.

I'm still doubtful about it......Would you mind sharing your annotation files to me via email? Just in case you didn't realize it, the Cityscapes conversion scripts in this repo would generate a json file with 9 classes(including background) and class car would be mapped to class 3. For Sim10k->Cityscapes car detection task, you should generate a json file with only 2 classes(class 0 for background and class 1 for car) for both datasets. If your annotations are correct, then things will be tricky and I will have to reproduce your problems first to understand what might be going wrong.

Could you please upload the complete log file? This log stops at 40840 iterations.

Actually, the log file is complete. Github Gist will truncate large file for viewing. You could check the complete file by downloading it.

Also if you could upload the checkpoint file as well for the trained model it would be great!

You could find the final checkpoint here.

@botcs
Copy link
Author

botcs commented Oct 31, 2019

Just in case you didn't realize it, the Cityscapes conversion scripts in this repo would generate a json file with 9 classes(including background) and class car would be mapped to class 3

That is not the case. Following instructions from the README:

Follow the example in Detectron-DA-Faster-RCNN to download dataset and generate coco style annoation files

In Detectron-DA-Faster-RCNN the usage example is covering the sim10k -> cityscapes adaptation case. In that repository, there are two specific converter scripts, tools/convert_cityscapes_to_caronly_coco.py and tools/convert_sim10k_to_coco.py. In this PR, originally I have included the latter, but left out the former. In commit 251b7b1 I have added tools/convert_cityscapes_to_caronly_coco.py as well, as it can be found in the Detectron implementation. Point is that this script produces COCO format instance entries that has category_id=1 values everywhere (which is expected), so it is not an indexing problem. After inspecting both files manually, I've found that both files are using category_id=1 for every instance.

Minor issue is the following: using tools/convert_cityscapes_to_caronly_coco.py script the instance ID starts from 0 instead of 1, that causes the pycocotools to fail hideously failure reproduction, related issue.

You could find the final checkpoint here.
I have downloaded and tested the checkpoint with the following results:

2019-10-30 23:42:01,224 maskrcnn_benchmark.inference INFO: Start evaluation on cityscapes_car_val_cocostyle dataset(500 images).
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [01:56<00:00,  4.31it/s]
2019-10-30 23:43:57,313 maskrcnn_benchmark.inference INFO: Total inference time: 0:01:56.089670 (0.23217934036254884 s / img per device, on 1 devices)
2019-10-30 23:43:57,355 maskrcnn_benchmark.inference INFO: Preparing results for COCO format
2019-10-30 23:43:57,355 maskrcnn_benchmark.inference INFO: Preparing bbox results
2019-10-30 23:43:57,387 maskrcnn_benchmark.inference INFO: Evaluating predictions
Loading and preparing results...
DONE (t=0.09s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=2.81s).
Accumulating evaluation results...
DONE (t=0.10s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.004
2019-10-30 23:44:00,500 maskrcnn_benchmark.inference INFO: OrderedDict([('bbox', OrderedDict([('AP', 3.39123121909629e-05), ('AP50', 0.00017958264992158225), ('AP75', 0.0), ('APs', 0.00099009900990099), ('APm', 0.0004950495049504951), ('APl', 7.021372016765286e-05)]))])

I have also inspected the inference/cityscapes_car_val_cocostyle/predictions.pth file and this is the prediction for the first image:

In [1]: a = torch.load("inference/cityscapes_car_val_cocostyle/predictions.pth")

In [2]: a[0].bbox                                                                                                                                                                                             
Out[2]: 
tensor([[ 197.4827,  266.2302,  341.2820,  316.0907],
        [1055.1331,  127.2901, 1195.9763,  379.5427],
        [ 299.5657,  271.6073,  319.3306,  276.8076],
        [ 158.1660,  304.7715,  189.4633,  320.7888],
        [ 289.7717,  273.4324,  326.4760,  285.6581],
        [ 157.2610,  286.9877,  208.0251,  318.0790],
        [   3.9300,  245.0801,  203.0042,  467.7722]])

In [9]: a[0].get_field("labels")                                                                                                                                                                              
Out[9]: tensor([1, 1, 1, 1, 1, 1, 1])

In [10]: a[0].get_field("scores")                                                                                                                                                                             
Out[10]: tensor([1.0000, 0.9998, 0.1903, 0.0805, 0.0599, 0.5537, 0.9981])```

@botcs
Copy link
Author

botcs commented Nov 1, 2019

I've investigated the counts of instances in the converted COCO format json. The two scripts are functioning differently.

tools/cityscapes/convert_cityscapes_to_coco.py -> category_id for cars = 3 | BBOX mode = (x, y, w+1, h+1)
../Detectron-DA-Faster-RCNN/tools/convert_cityscapes_to_caronly_coco.py -> catgory_id for cars = 1 | BBOX mode = (xmin, ymin, xmax, ymax) WRONG!

with the the former conversion script, using the provided weight file the scores are:
OrderedDict([('bbox', OrderedDict([('AP', 0.21999040175783008), ('AP50', 0.40802113381174443), ('AP75', 0.21781745864063617), ('APs', 0.02603636904496653), ('APm', 0.2231147873458656), ('APl', 0.5169057723877354)]))])

Compared to the results in the provided log file were:
OrderedDict([('bbox', OrderedDict([('AP', 0.219860040789306), ('AP50', 0.4077909470910257), ('AP75', 0.21723063709934554), ('APs', 0.02603636904496653), ('APm', 0.2225690552234578), ('APl', 0.5171064121665854)]))])

I would assume that this difference in performance is due to the implementation artifact with +1s around the edges are coming from Detectron. I will check it out soon.

@botcs
Copy link
Author

botcs commented Nov 1, 2019

So after adding +1 to the W and H parameters of the boxes, the results are the same:
OrderedDict([('bbox', OrderedDict([('AP', 0.21999040175783008), ('AP50', 0.40802113381174443), ('AP75', 0.21781745864063617), ('APs', 0.02603636904496653), ('APm', 0.2231147873458656), ('APl', 0.5169057723877354)]))])

After training the network locally using the corresponding sim10k config file, the scores are the following:
OrderedDict([('bbox', OrderedDict([('AP', 0.20045069303623747), ('AP50', 0.3822008332007441), ('AP75', 0.19762480184020723), ('APs', 0.023155550948632732), ('APm', 0.18710966470216675), ('APl', 0.49617430549587216)])) which are definitely lower than the checkpoint's scores, so I'd like to find out where these differences come from. Another observation here is if the source data also had issues with the XYXY and XYWH format, then the result on the target dataset wouldn't be this high.

What I'm really curious about is the following: why did the performance drop on the source domain when the domain adaptation was turned on?

@krumo
Copy link
Owner

krumo commented Nov 3, 2019

@botcs Sorry for the late reply. I have checked your annotations and find that you were using xyxy mode instead of xywh mode for cityscapes_caronly annotations (and xywh mode for sim10k annotations). You said that you generated this file using the conversion code in Detectron-DA-Faster-RCNN, but I run the same code and get a json file with xywh mode. Could you tell me how do you get the conversion code in your commit?

It seems that your weird results are caused by xyxy mode in your annotation. After changing the annotations, could you achieve the similar performance as reported in the paper?

I think this PR contains too many discussions unrelated to the original topic now. If you have any further questions, could you consider to open new issues? I think it would also benefit others who might be interested. Thanks for your experiments!

@botcs
Copy link
Author

botcs commented Nov 3, 2019

Could you tell me how do you get the conversion code in your commit?

cp ../Detectron-DA-Faster-RCNN/tools/convert_cityscapes_to_caronly_coco.py tools/. I ran the script this way: python ../Detectron-DA-Faster-RCNN/tools/convert_cityscapes_to_caronly_coco.py --dataset cityscapes_car_only --outdir Detectron-DA-Faster-RCNN-cityscapes/ --datadir /home/csbotos/datasets/cityscapes/ ... And I have found the issue: the detectron dependency issue just came along and I just copied the utility function poly_to_box(poly) without converting xyxy to xywh. Mea culpa, I'm sorry. But still, there are some further topics to discuss:

I think this PR contains too many discussions unrelated to the original topic now.

I agree. What separate issues can you think of? My idea would be investigating the following:

  • [this PR] How to make this repo run out of the box, without any additional modifications? Conversion scripts should be provided, to keep the experiments self-contained.
  • [maybe related to this PR] Why does the training end up with a lower performance than the reference checkpoint file (0.382 VS 0.408)
  • [maybe related to this PR] Why does running the same reference checkpoint file has a higher score when running locally than the score reported in the reference log file (0.4080 VS 0.4078)?
  • Was it necessary to remove the git history of the base GH repository maskrcnn-benchmark? Many issues have been fixed in the base repo that could have been easily merged here if the git histories weren't unrelated.
  • [most important] Why does invalid bounding-box format on the target domain cause catastrophic performance drop on the source domain when the domain adaptation's lambda coefficient is non-zero? It implies that the target domain labels influence the training, which is clearly forbidden.

@krumo
Copy link
Owner

krumo commented Nov 19, 2019

@botcs Sorry for the late reply and thank you for your patience. Now I think I have time to discuss with you.

[this PR] How to make this repo run out of the box, without any additional modifications? Conversion scripts should be provided, to keep the experiments self-contained.

I totally agree with you. A conversion code will be released soon.

Why does the training end up with a lower performance than the reference checkpoint file (0.382 VS 0.408)

Adversarial training is very unstable. It's quite normal that the performance of your final checkpoints in different trials vary a lot and the best performance is obtained before the training stop. As far as I know, there is still not an effective and robust method to stabilize the adversarial training. So here's my suggestions:

  1. Run several times the same experimental setting to see how well the algorithm could achieve.
  2. Or follow the setting in ICCV 2019 work Constructing Self-motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach, randomly pick several images from the target domain training set as a validation set for model selection.

Why does running the same reference checkpoint file has a higher score when running locally than the score reported in the reference log file (0.4080 VS 0.4078)?

I run the checkpoint on my generated annotation file again and get the same results. I think the difference might come from the different annotation file. As I mentioned above, I would share a conversion code ASAP.

Was it necessary to remove the git history of the base GH repository maskrcnn-benchmark? Many issues have been fixed in the base repo that could have been easily merged here if the git histories weren't unrelated.

Seeing from now I agree it was unnecessary. However, when I began to develop this repo, I assumed maskrcnn_benchmark is designed as a base framework for further instance segmentation/panoptic segmentation algorithms. I was worried that adding new features might change the APIs and it would be difficult to incorporate new commits from the maskrcnn_benchmark because you have to distinguish whether these commits have any conflicts with the old code. Now with the advent of (detectron2)[https://github.com/facebookresearch/detectron2], I think there won't be large change for the maskrcnn_benchmark in the future. I would consider to update this repo to a latest stable version of maskrcnn_benchmark, but I have to admit that I would not put a high priority to it.

Why does invalid bounding-box format on the target domain cause catastrophic performance drop on the source domain when the domain adaptation's lambda coefficient is non-zero? It implies that the target domain labels influence the training, which is clearly forbidden.

I have avoided any calling for the target domain labels in the implementation. Plus, I generate an annotation file by assigning a fake bbox with the size (0, 0, image_width, image_height) to each image as the target domain training samples and get a 40.2 mAP@50 finally.
I also test the checkpoint provided above on my source training set and get 75.0 mAP@50, which is lower than your 77.4 mAP@50. I think it might be a rational drop for the generalization. What's your opinion?

@krumo krumo self-requested a review November 20, 2019 03:08
@botcs
Copy link
Author

botcs commented Nov 20, 2019

Adversarial training is very unstable. It's quite normal that the performance of your final checkpoints in different trials vary a lot and the best performance is obtained before the training stop. As far as I know, there is still not an effective and robust method to stabilize the adversarial training.

I have evaluated the model every 100 iterations and got this result:
DA-Faster-RCNN (train_ sim10k, val_ cityscapes)
That shows that car AP can be 0.44 at one iteration and after a few iterations it is below 0.35. The point where we decrease the learning rate can be crucial.

Or follow the setting in ICCV 2019 work Constructing Self-motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach, randomly pick several images from the target domain training set as a validation set for model selection.

I was just using this method, but it's good to know that there's a study about it. Thank you!

I have avoided any calling for the target domain labels in the implementation.

While in the DA heads definition annotations are optional, in loss definition annotations are a hard requirement.
I cannot find any prevention in the roi head loss computation that would prevent target labels being used. But definitely there must be something that is blocking training signal from the target domain, otherwise target scores should be way better :) @krumo could you point out where do you mask out the gradients coming from the target domain?

Plus, I generate an annotation file by assigning a fake bbox with the size (0, 0, image_width, image_height) to each image as the target domain training samples and get a 40.2 mAP@50 finally.

I did not fully understand this part. Can you put it in another way?

@krumo
Copy link
Owner

krumo commented Nov 20, 2019

could you point out where do you mask out the gradients coming from the target domain?

I filter out the target data loss computation at

and
class_logits = class_logits[domain_masks, :]
box_regression = box_regression[domain_masks, :]
labels = labels[domain_masks]
regression_targets = regression_targets[domain_masks, :]

The reason I need target annotation variable in the detection network is that I use it to store domain labels. I admit it's confusing and I am considering to improve it.

@botcs
Copy link
Author

botcs commented Nov 20, 2019

thanks, that's really helpful to know.

Could you please also explain why do you do permutations here?

da_img_per_level = da_img_per_level.permute(0, 2, 3, 1)
da_img_label_per_level = torch.zeros_like(da_img_per_level, dtype=torch.float32)
da_img_label_per_level[masks, :] = 1
da_img_per_level = da_img_per_level.reshape(N, -1)
da_img_label_per_level = da_img_label_per_level.reshape(N, -1)

@botcs
Copy link
Author

botcs commented Nov 20, 2019

also here I think the concatenation has to be carried out on dim=1 (after flattening)

da_img_flattened = torch.cat(da_img_flattened, dim=0)
da_img_labels_flattened = torch.cat(da_img_labels_flattened, dim=0)

Because otherwise you would have different length vectors for different sizes after flattening (one at each backbone stage), e.g.:

torch.Size([2, 32768])
torch.Size([2, 8192])
torch.Size([2, 2048])
torch.Size([2, 512])
torch.Size([2, 128])

This issue may not have come up before because in your config files FPN is turned off. Is it correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants