SHI-Labs · praeclarumjj3 · Jun 8, 2023 · Jun 8, 2023 · Jun 10, 2023 · Jun 15, 2023
diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md
@@ -26,6 +26,26 @@ python train_net.py --dist-url 'tcp://127.0.0.1:50163' \
     OUTPUT_DIR outputs/ade20k_swin_large WANDB.NAME ade20k_swin_large
 ```
 
+### Training on Multiple Nodes
+
+```bash
+### Node 1
+python train_net.py --dist-url <URL> \
+    --num-gpus 8 \
+    --num-machines 2 \
+    --machine-rank 0 \
+    --config-file configs/ade20k/intern_image/oneformer_intern_image_huge_bs16_160k_896x896_1024.yaml \
+    OUTPUT_DIR outputs/ade20k_intern_image_huge WANDB.NAME ade20k_intern_image_huge
+
+### Node 2
+python train_net.py --dist-url <URL> \
+    --num-gpus 8 \
+    --num-machines 2 \
+    --machine-rank 1 \
+    --config-file configs/ade20k/intern_image/oneformer_intern_image_huge_bs16_160k_896x896_1024.yaml \
+    OUTPUT_DIR outputs/ade20k_intern_image_huge WANDB.NAME ade20k_intern_image_huge
+```
+
 ## Evaluation
 
 - You need to pass the value of `task` token. `task` belongs to [panoptic, semantic, instance].

diff --git a/INSTALL.md b/INSTALL.md
@@ -58,4 +58,12 @@ We use an evironment with the following specifications, packages and dependencie
   sh make.sh
   cd ../../../..
   ```
-
+
+- Setup CUDA Kernel for DCNv3. Requires CUDA installed.
+
+  ```bash
+  # Setup DCNv3
+  cd oneformer/modeling/backbone/ops_dcnv3
+  sh make.sh
+  cd ../../../..
+  ```
diff --git a/README.md b/README.md
@@ -9,7 +9,7 @@
 
 <sup>&dagger;</sup> Equal Contribution
 
-[[`Project Page`](https://praeclarumjj3.github.io/oneformer/)] [[`arXiv`](https://arxiv.org/abs/2211.06220)] [[`pdf`](https://arxiv.org/pdf/2211.06220.pdf)] [[`BibTeX`](#4citation)]
+[[`Project Page`](https://praeclarumjj3.github.io/oneformer/)] [[`arXiv`](https://arxiv.org/abs/2211.06220)] [[`pdf`](https://openaccess.thecvf.com/content/CVPR2023/papers/Jain_OneFormer_One_Transformer_To_Rule_Universal_Image_Segmentation_CVPR_2023_paper.pdf)] [[`Slides`](https://drive.google.com/file/d/12XhiOXD08_LwzBwosoLVk7i8D45V8YfW/view?usp=sharing)] [[`Poster`](https://drive.google.com/file/d/1-U3hCYVNVht26NM-zbE87p1V4idc5bCt/view?usp=sharing)] [[`BibTeX`](#4citation)]
 
 This repo contains the code for our paper **OneFormer: One Transformer to Rule Universal Image Segmentation**.
 
@@ -38,6 +38,7 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U
 
 ## News
 
+- **[July 6, 2023]**: OneFormer achieves SOTA performance on COCO panoptic segmentation with **60.0 PQ**, on ADE20K panoptic segmentation with **54.5 PQ** and on Cityscapes instance segmentation with **50.6 AP** scores. We release the corresponding models with InternImage-H backbone publicly!
 - **[February 27, 2023]**: OneFormer is accepted to CVPR 2023!
 - **[January 26, 2023]**: OneFormer sets new SOTA performance on the the Mapillary Vistas val (both panoptic & semantic segmentation) and Cityscapes test (panoptic segmentation) sets. We’ve released the checkpoints too!
 - **[January 19, 2023]**: OneFormer is now available as a part of the 🤗 **HuggingFace [transformers](https://huggingface.co/docs/transformers/main/en/model_doc/oneformer) library** and **[model hub](https://huggingface.co/models?filter=oneformer)**! 🚀
@@ -97,6 +98,8 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U
 | OneFormer | DiNAT-L<sup>&dagger;</sup> | 1280&times;1280 | 51.5 | 37.1 | 58.3 | 58.7 | 223M | [config](configs/ade20k/dinat/oneformer_dinat_large_bs16_160k_1280x1280.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/1280x1280_250_16_dinat_l_oneformer_ade20k_160k.pth) |
 | OneFormer (COCO-Pretrained) | DiNAT-L<sup>&dagger;</sup> | 1280&times;1280 | 53.4 | 40.2 | 58.4 | 58.8 | 223M | [config](configs/ade20k/dinat/coco_pretrain_oneformer_dinat_large_bs16_160k_1280x1280_coco_pretrain.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/coco_pretrain_1280x1280_150_16_dinat_l_oneformer_ade20k_160k.pth) &#124; [pretrained](https://shi-labs.com/projects/oneformer/coco/150_16_dinat_l_oneformer_coco_100ep.pth) |
 | OneFormer | ConvNeXt-XL<sup>&dagger;</sup> | 640&times;640 | 50.1 | 36.3 | 57.4 | 58.8 | 372M | [config](configs/ade20k/convnext/oneformer_convnext_xlarge_bs16_160k.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/250_16_convnext_xl_oneformer_ade20k_160k.pth) |
+| OneFormer (emb_dim=256) | InternImage-H<sup>&dagger;</sup> | 896&times;896 | 54.5 | 40.2 | 60.4 | 60.8 | 1.10B | [config](configs/ade20k/intern_image/oneformer_intern_image_huge_bs16_160k_896x896.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/896x896_250_16_intern_image_h_oneformer_ade20k_160k.pth) |
+| OneFormer (emb_dim=1024, COCO-Pretrained) | InternImage-H<sup>&dagger;</sup> | 896&times;896 | 55.5 | 44.2 | 60.7 | 60.7 | 1.35B | [config](configs/ade20k/coco_pretrain_intern_image/oneformer_intern_image_huge_bs16_160k_896x896.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/coco_pretrain_896x896_250_16_intern_image_h_oneformer_ade20k_160k.pth) |
 
 ### Cityscapes
 
@@ -108,13 +111,15 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U
 | OneFormer | DiNAT-L<sup>&dagger;</sup> | 67.6 | 45.6 | 83.1 | 84.0 | 223M | [config](configs/cityscapes/dinat/oneformer_dinat_large_bs16_90k.yaml) | [model](https://shi-labs.com/projects/oneformer/cityscapes/250_16_dinat_l_oneformer_cityscapes_90k.pth) |
 | OneFormer | ConvNeXt-XL<sup>&dagger;</sup> | 68.4 | 46.7 | 83.6 | 84.6 | 372M | [config](configs/cityscapes/convnext/oneformer_convnext_xlarge_bs16_90k.yaml) | [model](https://shi-labs.com/projects/oneformer/cityscapes/250_16_convnext_xl_oneformer_cityscapes_90k.pth) |
 | OneFormer (Mapillary Vistas-Pretrained) | ConvNeXt-XL<sup>&dagger;</sup> | 69.7 | 48.9 | 84.5 | 85.8 | 372M | [config](configs/cityscapes/convnext/mapillary_pretrain_oneformer_convnext_xlarge_bs16_90k.yaml) | [model](https://shi-labs.com/projects/oneformer/cityscapes/mapillary_pretrain_250_16_convnext_xl_oneformer_cityscapes_90k.pth) &#124; [pretrained](https://shi-labs.com/projects/oneformer/mapillary/mapillary_pretrain_250_16_convnext_xl_oneformer_mapillary_300k.pth) |
+| OneFormer (emb_dim=256) | InternImage-H<sup>&dagger;</sup> | 70.6 | 50.6 | 85.1 | 85.7 | 1.10B | [config](configs/cityscapes/intern_image/oneformer_intern_image_huge_bs16_90k.yaml) | [model](https://shi-labs.com/projects/oneformer/cityscapes/250_16_intern_image_h_oneformer_cityscapes_90k.pth) |
 
 ### COCO
 
 | Method | Backbone |  PQ   |  PQ<sup>Th</sup>   |  PQ<sup>St</sup>   | AP | mIoU | #params | config | Checkpoint |
 |   :---:| :---:    | :---: | :---:              | :---:              |:---:| :---:| :---:  |  :---: |    :---:   |
 | OneFormer | Swin-L<sup>&dagger;</sup> | 57.9 | 64.4 | 48.0 | 49.0 | 67.4 | 219M | [config](configs/coco/swin/oneformer_swin_large_bs16_100ep.yaml) | [model](https://shi-labs.com/projects/oneformer/coco/150_16_swin_l_oneformer_coco_100ep.pth) |
 | OneFormer | DiNAT-L<sup>&dagger;</sup> | 58.0 | 64.3 | 48.4 | 49.2 | 68.1 | 223M | [config](configs/coco/dinat/oneformer_dinat_large_bs16_100ep.yaml) | [model](https://shi-labs.com/projects/oneformer/coco/150_16_dinat_l_oneformer_coco_100ep.pth) |
+| OneFormer (emb_dim=1024) | InternImage-H<sup>&dagger;</sup> | 60.0 | 67.1 | 49.2 | 52.0 | 68.8 | 1.35B | [config](configs/coco/intern_image/oneformer_intern_image_huge_bs16_100ep_1024.yaml) | [model](https://shi-labs.com/projects/oneformer/coco/250_16_intern_image_h_oneformer_coco_100ep_1024.pth) |
 
 ### Mapillary Vistas
 
@@ -123,6 +128,7 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U
 | OneFormer | Swin-L<sup>&dagger;</sup> | 46.7 | 62.9 | 64.1 | 219M | [config](configs/mapillary_vistas/swin/oneformer_swin_large_bs16_300k.yaml) | [model](https://shi-labs.com/projects/oneformer/mapillary/250_16_swin_l_oneformer_mapillary_300k.pth) |
 | OneFormer | ConvNeXt-L<sup>&dagger;</sup> | 47.9 | 63.2 | 63.8 | 220M | [config](configs/mapillary_vistas/convnext/oneformer_convnext_large_bs16_300k.yaml) | [model](https://shi-labs.com/projects/oneformer/mapillary/250_16_convnext_l_oneformer_mapillary_300k.pth) |
 | OneFormer | DiNAT-L<sup>&dagger;</sup> | 47.8 | 64.0 | 64.9 | 223M | [config](configs/mapillary_vistas/dinat/oneformer_dinat_large_bs16_300k.yaml) | [model](https://shi-labs.com/projects/oneformer/mapillary/250_16_dinat_l_oneformer_mapillary_300k.pth) |
+| OneFormer (emb_dim=1024) | InternImage-H<sup>&dagger;</sup> | 52.9 | 67.3 | 67.5 | 1.35B | [config](configs/mapillary_vistas/intern_image/oneformer_intern_image_huge_bs16_300k_1024.yaml) | [model](https://shi-labs.com/projects/oneformer/mapillary/250_16_intern_image_h_oneformer_mapillary_300k_1024.pth) |
 
 
 ## Citation

diff --git a/...ade20k/intern_image/coco_pretrain_oneformer_intern_image_huge_bs16_160k_896x896_1024.yaml b/...ade20k/intern_image/coco_pretrain_oneformer_intern_image_huge_bs16_160k_896x896_1024.yaml
@@ -0,0 +1,54 @@
+_BASE_: ../oneformer_R50_bs16_160k.yaml
+MODEL:
+  BACKBONE:
+    NAME: "D2InternImage"
+  SEM_SEG_HEAD:
+    CONVS_DIM: 1024
+    MASK_DIM: 1024
+  INTERNIMAGE:
+    CHANNELS: 320
+    DEPTHS: [6, 6, 32, 6]
+    GROUPS: [10, 20, 40, 80]
+    WITH_CP: True
+    MLP_RATIO: 4.0
+    DW_KERNEL_SIZE: 5
+    LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
+  WEIGHTS: "250_16_intern_image_h_oneformer_coco_100ep_1024.pth"
+  PIXEL_MEAN: [123.675, 116.280, 103.530]
+  PIXEL_STD: [58.395, 57.120, 57.375]
+  ONE_FORMER:
+    HIDDEN_DIM: 1024
+    NUM_OBJECT_QUERIES: 250
+    NHEADS: 32
+    DIM_FEEDFORWARD: 4096
+  TEXT_ENCODER:
+    WIDTH: 1024
+    CONTEXT_LENGTH: 77
+    N_CTX: 16
+INPUT:
+  MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 896) for x in range(5, 21)]"]
+  MIN_SIZE_TRAIN_SAMPLING: "choice"
+  MIN_SIZE_TEST: 896
+  MAX_SIZE_TRAIN: 3584
+  MAX_SIZE_TEST: 3584
+  CROP:
+    ENABLED: True
+    TYPE: "absolute"
+    SIZE: (896, 896)
+    SINGLE_CATEGORY_MAX_AREA: 1.0
+  COLOR_AUG_SSD: True
+  SIZE_DIVISIBILITY: 896  # used in dataset mapper
+  FORMAT: "RGB"
+TEST:
+  DETECTIONS_PER_IMAGE: 250
+  EVAL_PERIOD: 5000
+  AUG:
+    ENABLED: False
+    MIN_SIZES: [448, 678, 896, 1120, 1344, 1568]
+    MAX_SIZE: 6272
+    FLIP: True
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.00004
+  AMP:
+    ENABLED: False
diff --git a/configs/ade20k/intern_image/oneformer_intern_image_huge_bs16_160k_896x896.yaml b/configs/ade20k/intern_image/oneformer_intern_image_huge_bs16_160k_896x896.yaml
@@ -0,0 +1,39 @@
+_BASE_: ../oneformer_R50_bs16_160k.yaml
+MODEL:
+  BACKBONE:
+    NAME: "D2InternImage"
+  INTERNIMAGE:
+    CHANNELS: 320
+    DEPTHS: [6, 6, 32, 6]
+    GROUPS: [10, 20, 40, 80]
+    WITH_CP: True
+    MLP_RATIO: 4.0
+    DW_KERNEL_SIZE: 5
+    LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
+  WEIGHTS: "internimage_h_jointto22k_384.pkl"
+  PIXEL_MEAN: [123.675, 116.280, 103.530]
+  PIXEL_STD: [58.395, 57.120, 57.375]
+  ONE_FORMER:
+    NUM_OBJECT_QUERIES: 250
+INPUT:
+  MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 896) for x in range(5, 21)]"]
+  MIN_SIZE_TRAIN_SAMPLING: "choice"
+  MIN_SIZE_TEST: 896
+  MAX_SIZE_TRAIN: 3584
+  MAX_SIZE_TEST: 3584
+  CROP:
+    ENABLED: True
+    TYPE: "absolute"
+    SIZE: (896, 896)
+    SINGLE_CATEGORY_MAX_AREA: 1.0
+  COLOR_AUG_SSD: True
+  SIZE_DIVISIBILITY: 896  # used in dataset mapper
+  FORMAT: "RGB"
+TEST:
+  DETECTIONS_PER_IMAGE: 250
+  EVAL_PERIOD: 5000
+  AUG:
+    ENABLED: False
+    MIN_SIZES: [448, 678, 896, 1120, 1344, 1568]
+    MAX_SIZE: 6272
+    FLIP: True
diff --git a/...cityscapes/intern_image/mapillary_pretrain_oneformer_intern_image_huge_bs16_90k_1024.yaml b/...cityscapes/intern_image/mapillary_pretrain_oneformer_intern_image_huge_bs16_90k_1024.yaml
@@ -0,0 +1,54 @@
+_BASE_: ../oneformer_R50_bs16_90k.yaml
+MODEL:
+  BACKBONE:
+    NAME: "D2InternImage"
+  SEM_SEG_HEAD:
+    CONVS_DIM: 1024
+    MASK_DIM: 1024
+  INTERNIMAGE:
+    CHANNELS: 320
+    DEPTHS: [6, 6, 32, 6]
+    GROUPS: [10, 20, 40, 80]
+    WITH_CP: True
+    MLP_RATIO: 4.0
+    DW_KERNEL_SIZE: 5
+    LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
+  WEIGHTS: "mapillary_pretrain_250_16_intern_image_h_oneformer_mapillary_300k_1024.pth"
+  PIXEL_MEAN: [123.675, 116.280, 103.530]
+  PIXEL_STD: [58.395, 57.120, 57.375]
+  ONE_FORMER:
+    HIDDEN_DIM: 1024
+    NUM_OBJECT_QUERIES: 250
+    NHEADS: 32
+    DIM_FEEDFORWARD: 4096
+  TEXT_ENCODER:
+    WIDTH: 1024
+    CONTEXT_LENGTH: 77
+    N_CTX: 16
+TEST:
+  DETECTIONS_PER_IMAGE: 250
+INPUT:
+  MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 896) for x in range(5, 21)]"]
+  MIN_SIZE_TRAIN_SAMPLING: "choice"
+  MIN_SIZE_TEST: 896
+  MAX_SIZE_TRAIN: 3584
+  MAX_SIZE_TEST: 3584
+  CROP:
+    ENABLED: True
+    TYPE: "absolute"
+    SIZE: (896, 896)
+    SINGLE_CATEGORY_MAX_AREA: 1.0
+  COLOR_AUG_SSD: True
+  SIZE_DIVISIBILITY: 896  # used in dataset mapper
+  FORMAT: "RGB"
+TEST:
+  DETECTIONS_PER_IMAGE: 250
+  EVAL_PERIOD: 5000
+  AUG:
+    ENABLED: False
+    MIN_SIZES: [448, 678, 896, 1120, 1344, 1568]
+    MAX_SIZE: 6272
+    FLIP: True
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.00004
diff --git a/configs/cityscapes/intern_image/oneformer_intern_image_huge_bs16_90k.yaml b/configs/cityscapes/intern_image/oneformer_intern_image_huge_bs16_90k.yaml
@@ -0,0 +1,19 @@
+_BASE_: ../oneformer_R50_bs16_90k.yaml
+MODEL:
+  BACKBONE:
+    NAME: "D2InternImage"
+  INTERNIMAGE:
+    CHANNELS: 320
+    DEPTHS: [6, 6, 32, 6]
+    GROUPS: [10, 20, 40, 80]
+    WITH_CP: True
+    MLP_RATIO: 4.0
+    DW_KERNEL_SIZE: 5
+    LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
+  WEIGHTS: "internimage_h_jointto22k_384.pkl"
+  PIXEL_MEAN: [123.675, 116.280, 103.530]
+  PIXEL_STD: [58.395, 57.120, 57.375]
+  ONE_FORMER:
+    NUM_OBJECT_QUERIES: 250
+TEST:
+  DETECTIONS_PER_IMAGE: 250
diff --git a/configs/coco/intern_image/oneformer_intern_image_huge_bs16_100ep_1024.yaml b/configs/coco/intern_image/oneformer_intern_image_huge_bs16_100ep_1024.yaml
@@ -0,0 +1,37 @@
+_BASE_: ../oneformer_R50_bs16_50ep.yaml
+MODEL:
+  BACKBONE:
+    NAME: "D2InternImage"
+  SEM_SEG_HEAD:
+    NAME: "OneFormerHead"
+    CONVS_DIM: 1024
+    MASK_DIM: 1024
+  INTERNIMAGE:
+    CHANNELS: 320
+    DEPTHS: [6, 6, 32, 6]
+    GROUPS: [10, 20, 40, 80]
+    WITH_CP: True
+    MLP_RATIO: 4.0
+    DW_KERNEL_SIZE: 5
+    LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
+  WEIGHTS: "internimage_h_jointto22k_384.pkl"
+  PIXEL_MEAN: [123.675, 116.280, 103.530]
+  PIXEL_STD: [58.395, 57.120, 57.375]
+  ONE_FORMER:
+    HIDDEN_DIM: 1024
+    NUM_OBJECT_QUERIES: 250
+    NHEADS: 32
+    DIM_FEEDFORWARD: 4096
+  TEXT_ENCODER:
+    WIDTH: 1024
+    CONTEXT_LENGTH: 77
+    N_CTX: 16
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.00004
+  STEPS: (655556, 735184)
+  MAX_ITER: 737500
+  AMP:
+    ENABLED: False
+TEST:
+  DETECTIONS_PER_IMAGE: 250
diff --git a/configs/mapillary_vistas/intern_image/oneformer_intern_image_huge_bs16_300k_1024.yaml b/configs/mapillary_vistas/intern_image/oneformer_intern_image_huge_bs16_300k_1024.yaml
@@ -0,0 +1,32 @@
+_BASE_: ../oneformer_R50_bs16_300k.yaml
+MODEL:
+  BACKBONE:
+    NAME: "D2InternImage"
+  SEM_SEG_HEAD:
+    CONVS_DIM: 1024
+    MASK_DIM: 1024
+  INTERNIMAGE:
+    CHANNELS: 320
+    DEPTHS: [6, 6, 32, 6]
+    GROUPS: [10, 20, 40, 80]
+    WITH_CP: True
+    MLP_RATIO: 4.0
+    DW_KERNEL_SIZE: 5
+    LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
+  WEIGHTS: "pretrain/internimage_h_jointto22k_384.pkl"
+  PIXEL_MEAN: [123.675, 116.280, 103.530]
+  PIXEL_STD: [58.395, 57.120, 57.375]
+  ONE_FORMER:
+    HIDDEN_DIM: 1024
+    NUM_OBJECT_QUERIES: 250
+    NHEADS: 32
+    DIM_FEEDFORWARD: 4096
+  TEXT_ENCODER:
+    WIDTH: 1024
+    CONTEXT_LENGTH: 77
+    N_CTX: 16
+TEST:
+  DETECTIONS_PER_IMAGE: 250
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.00002
diff --git a/demo/demo.py b/demo/demo.py
@@ -29,6 +29,7 @@
     add_swin_config,
     add_dinat_config,
     add_convnext_config,
+    add_internimage_config
 )
 from predictor import VisualizationDemo
 
@@ -44,6 +45,7 @@ def setup_cfg(args):
     add_dinat_config(cfg)
     add_convnext_config(cfg)
     add_oneformer_config(cfg)
+    add_internimage_config(cfg)
     cfg.merge_from_file(args.config_file)
     cfg.merge_from_list(args.opts)
     cfg.freeze()

diff --git a/demo/predictor.py b/demo/predictor.py
@@ -52,6 +52,8 @@ def run_on_image(self, image, task):
         # Convert image from OpenCV BGR format to Matplotlib RGB format.
         image = image[:, :, ::-1]
         vis_output = {}
+
+        assert task in ['panoptic', 'semantic', 'instance'], "task should be one of 'panoptic', 'semantic', 'instance'"
 
         if task == 'panoptic':
             visualizer = Visualizer(image, metadata=self.metadata, instance_mode=ColorMode.IMAGE)
@@ -61,14 +63,14 @@ def run_on_image(self, image, task):
             panoptic_seg.to(self.cpu_device), segments_info, alpha=0.7
         )
 
-        if task == 'panoptic' or task == 'semantic':
+        if task == 'semantic':
             visualizer = Visualizer(image, metadata=self.metadata, instance_mode=ColorMode.IMAGE_BW)
             predictions = self.predictor(image, task)
             vis_output['semantic_inference'] = visualizer.draw_sem_seg(
                 predictions["sem_seg"].argmax(dim=0).to(self.cpu_device), alpha=0.7
             )
 
-        if task == 'panoptic' or task == 'instance':
+        if task == 'instance':
             visualizer = Visualizer(image, metadata=self.metadata, instance_mode=ColorMode.IMAGE_BW)
             predictions = self.predictor(image, task)
             instances = predictions["instances"].to(self.cpu_device)