Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InternImage-H Results #64

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,26 @@ python train_net.py --dist-url 'tcp://127.0.0.1:50163' \
OUTPUT_DIR outputs/ade20k_swin_large WANDB.NAME ade20k_swin_large
```

### Training on Multiple Nodes

```bash
### Node 1
python train_net.py --dist-url <URL> \
--num-gpus 8 \
--num-machines 2 \
--machine-rank 0 \
--config-file configs/ade20k/intern_image/oneformer_intern_image_huge_bs16_160k_896x896_1024.yaml \
OUTPUT_DIR outputs/ade20k_intern_image_huge WANDB.NAME ade20k_intern_image_huge

### Node 2
python train_net.py --dist-url <URL> \
--num-gpus 8 \
--num-machines 2 \
--machine-rank 1 \
--config-file configs/ade20k/intern_image/oneformer_intern_image_huge_bs16_160k_896x896_1024.yaml \
OUTPUT_DIR outputs/ade20k_intern_image_huge WANDB.NAME ade20k_intern_image_huge
```

## Evaluation

- You need to pass the value of `task` token. `task` belongs to [panoptic, semantic, instance].
Expand Down
10 changes: 9 additions & 1 deletion INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,12 @@ We use an evironment with the following specifications, packages and dependencie
sh make.sh
cd ../../../..
```


- Setup CUDA Kernel for DCNv3. Requires CUDA installed.

```bash
# Setup DCNv3
cd oneformer/modeling/backbone/ops_dcnv3
sh make.sh
cd ../../../..
```
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

<sup>&dagger;</sup> Equal Contribution

[[`Project Page`](https://praeclarumjj3.github.io/oneformer/)] [[`arXiv`](https://arxiv.org/abs/2211.06220)] [[`pdf`](https://arxiv.org/pdf/2211.06220.pdf)] [[`BibTeX`](#4citation)]
[[`Project Page`](https://praeclarumjj3.github.io/oneformer/)] [[`arXiv`](https://arxiv.org/abs/2211.06220)] [[`pdf`](https://openaccess.thecvf.com/content/CVPR2023/papers/Jain_OneFormer_One_Transformer_To_Rule_Universal_Image_Segmentation_CVPR_2023_paper.pdf)] [[`Slides`](https://drive.google.com/file/d/12XhiOXD08_LwzBwosoLVk7i8D45V8YfW/view?usp=sharing)] [[`Poster`](https://drive.google.com/file/d/1-U3hCYVNVht26NM-zbE87p1V4idc5bCt/view?usp=sharing)] [[`BibTeX`](#4citation)]

This repo contains the code for our paper **OneFormer: One Transformer to Rule Universal Image Segmentation**.

Expand Down Expand Up @@ -38,6 +38,7 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U

## News

- **[July 6, 2023]**: OneFormer achieves SOTA performance on COCO panoptic segmentation with **60.0 PQ**, on ADE20K panoptic segmentation with **54.5 PQ** and on Cityscapes instance segmentation with **50.6 AP** scores. We release the corresponding models with InternImage-H backbone publicly!
- **[February 27, 2023]**: OneFormer is accepted to CVPR 2023!
- **[January 26, 2023]**: OneFormer sets new SOTA performance on the the Mapillary Vistas val (both panoptic & semantic segmentation) and Cityscapes test (panoptic segmentation) sets. We’ve released the checkpoints too!
- **[January 19, 2023]**: OneFormer is now available as a part of the 🤗 **HuggingFace [transformers](https://huggingface.co/docs/transformers/main/en/model_doc/oneformer) library** and **[model hub](https://huggingface.co/models?filter=oneformer)**! 🚀
Expand Down Expand Up @@ -97,6 +98,8 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U
| OneFormer | DiNAT-L<sup>&dagger;</sup> | 1280&times;1280 | 51.5 | 37.1 | 58.3 | 58.7 | 223M | [config](configs/ade20k/dinat/oneformer_dinat_large_bs16_160k_1280x1280.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/1280x1280_250_16_dinat_l_oneformer_ade20k_160k.pth) |
| OneFormer (COCO-Pretrained) | DiNAT-L<sup>&dagger;</sup> | 1280&times;1280 | 53.4 | 40.2 | 58.4 | 58.8 | 223M | [config](configs/ade20k/dinat/coco_pretrain_oneformer_dinat_large_bs16_160k_1280x1280_coco_pretrain.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/coco_pretrain_1280x1280_150_16_dinat_l_oneformer_ade20k_160k.pth) &#124; [pretrained](https://shi-labs.com/projects/oneformer/coco/150_16_dinat_l_oneformer_coco_100ep.pth) |
| OneFormer | ConvNeXt-XL<sup>&dagger;</sup> | 640&times;640 | 50.1 | 36.3 | 57.4 | 58.8 | 372M | [config](configs/ade20k/convnext/oneformer_convnext_xlarge_bs16_160k.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/250_16_convnext_xl_oneformer_ade20k_160k.pth) |
| OneFormer (emb_dim=256) | InternImage-H<sup>&dagger;</sup> | 896&times;896 | 54.5 | 40.2 | 60.4 | 60.8 | 1.10B | [config](configs/ade20k/intern_image/oneformer_intern_image_huge_bs16_160k_896x896.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/896x896_250_16_intern_image_h_oneformer_ade20k_160k.pth) |
| OneFormer (emb_dim=1024, COCO-Pretrained) | InternImage-H<sup>&dagger;</sup> | 896&times;896 | 55.5 | 44.2 | 60.7 | 60.7 | 1.35B | [config](configs/ade20k/coco_pretrain_intern_image/oneformer_intern_image_huge_bs16_160k_896x896.yaml) | [model](https://shi-labs.com/projects/oneformer/ade20k/coco_pretrain_896x896_250_16_intern_image_h_oneformer_ade20k_160k.pth) |

### Cityscapes

Expand All @@ -108,13 +111,15 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U
| OneFormer | DiNAT-L<sup>&dagger;</sup> | 67.6 | 45.6 | 83.1 | 84.0 | 223M | [config](configs/cityscapes/dinat/oneformer_dinat_large_bs16_90k.yaml) | [model](https://shi-labs.com/projects/oneformer/cityscapes/250_16_dinat_l_oneformer_cityscapes_90k.pth) |
| OneFormer | ConvNeXt-XL<sup>&dagger;</sup> | 68.4 | 46.7 | 83.6 | 84.6 | 372M | [config](configs/cityscapes/convnext/oneformer_convnext_xlarge_bs16_90k.yaml) | [model](https://shi-labs.com/projects/oneformer/cityscapes/250_16_convnext_xl_oneformer_cityscapes_90k.pth) |
| OneFormer (Mapillary Vistas-Pretrained) | ConvNeXt-XL<sup>&dagger;</sup> | 69.7 | 48.9 | 84.5 | 85.8 | 372M | [config](configs/cityscapes/convnext/mapillary_pretrain_oneformer_convnext_xlarge_bs16_90k.yaml) | [model](https://shi-labs.com/projects/oneformer/cityscapes/mapillary_pretrain_250_16_convnext_xl_oneformer_cityscapes_90k.pth) &#124; [pretrained](https://shi-labs.com/projects/oneformer/mapillary/mapillary_pretrain_250_16_convnext_xl_oneformer_mapillary_300k.pth) |
| OneFormer (emb_dim=256) | InternImage-H<sup>&dagger;</sup> | 70.6 | 50.6 | 85.1 | 85.7 | 1.10B | [config](configs/cityscapes/intern_image/oneformer_intern_image_huge_bs16_90k.yaml) | [model](https://shi-labs.com/projects/oneformer/cityscapes/250_16_intern_image_h_oneformer_cityscapes_90k.pth) |

### COCO

| Method | Backbone | PQ | PQ<sup>Th</sup> | PQ<sup>St</sup> | AP | mIoU | #params | config | Checkpoint |
| :---:| :---: | :---: | :---: | :---: |:---:| :---:| :---: | :---: | :---: |
| OneFormer | Swin-L<sup>&dagger;</sup> | 57.9 | 64.4 | 48.0 | 49.0 | 67.4 | 219M | [config](configs/coco/swin/oneformer_swin_large_bs16_100ep.yaml) | [model](https://shi-labs.com/projects/oneformer/coco/150_16_swin_l_oneformer_coco_100ep.pth) |
| OneFormer | DiNAT-L<sup>&dagger;</sup> | 58.0 | 64.3 | 48.4 | 49.2 | 68.1 | 223M | [config](configs/coco/dinat/oneformer_dinat_large_bs16_100ep.yaml) | [model](https://shi-labs.com/projects/oneformer/coco/150_16_dinat_l_oneformer_coco_100ep.pth) |
| OneFormer (emb_dim=1024) | InternImage-H<sup>&dagger;</sup> | 60.0 | 67.1 | 49.2 | 52.0 | 68.8 | 1.35B | [config](configs/coco/intern_image/oneformer_intern_image_huge_bs16_100ep_1024.yaml) | [model](https://shi-labs.com/projects/oneformer/coco/250_16_intern_image_h_oneformer_coco_100ep_1024.pth) |

### Mapillary Vistas

Expand All @@ -123,6 +128,7 @@ This repo contains the code for our paper **OneFormer: One Transformer to Rule U
| OneFormer | Swin-L<sup>&dagger;</sup> | 46.7 | 62.9 | 64.1 | 219M | [config](configs/mapillary_vistas/swin/oneformer_swin_large_bs16_300k.yaml) | [model](https://shi-labs.com/projects/oneformer/mapillary/250_16_swin_l_oneformer_mapillary_300k.pth) |
| OneFormer | ConvNeXt-L<sup>&dagger;</sup> | 47.9 | 63.2 | 63.8 | 220M | [config](configs/mapillary_vistas/convnext/oneformer_convnext_large_bs16_300k.yaml) | [model](https://shi-labs.com/projects/oneformer/mapillary/250_16_convnext_l_oneformer_mapillary_300k.pth) |
| OneFormer | DiNAT-L<sup>&dagger;</sup> | 47.8 | 64.0 | 64.9 | 223M | [config](configs/mapillary_vistas/dinat/oneformer_dinat_large_bs16_300k.yaml) | [model](https://shi-labs.com/projects/oneformer/mapillary/250_16_dinat_l_oneformer_mapillary_300k.pth) |
| OneFormer (emb_dim=1024) | InternImage-H<sup>&dagger;</sup> | 52.9 | 67.3 | 67.5 | 1.35B | [config](configs/mapillary_vistas/intern_image/oneformer_intern_image_huge_bs16_300k_1024.yaml) | [model](https://shi-labs.com/projects/oneformer/mapillary/250_16_intern_image_h_oneformer_mapillary_300k_1024.pth) |


## Citation
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
_BASE_: ../oneformer_R50_bs16_160k.yaml
MODEL:
BACKBONE:
NAME: "D2InternImage"
SEM_SEG_HEAD:
CONVS_DIM: 1024
MASK_DIM: 1024
INTERNIMAGE:
CHANNELS: 320
DEPTHS: [6, 6, 32, 6]
GROUPS: [10, 20, 40, 80]
WITH_CP: True
MLP_RATIO: 4.0
DW_KERNEL_SIZE: 5
LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
WEIGHTS: "250_16_intern_image_h_oneformer_coco_100ep_1024.pth"
PIXEL_MEAN: [123.675, 116.280, 103.530]
PIXEL_STD: [58.395, 57.120, 57.375]
ONE_FORMER:
HIDDEN_DIM: 1024
NUM_OBJECT_QUERIES: 250
NHEADS: 32
DIM_FEEDFORWARD: 4096
TEXT_ENCODER:
WIDTH: 1024
CONTEXT_LENGTH: 77
N_CTX: 16
INPUT:
MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 896) for x in range(5, 21)]"]
MIN_SIZE_TRAIN_SAMPLING: "choice"
MIN_SIZE_TEST: 896
MAX_SIZE_TRAIN: 3584
MAX_SIZE_TEST: 3584
CROP:
ENABLED: True
TYPE: "absolute"
SIZE: (896, 896)
SINGLE_CATEGORY_MAX_AREA: 1.0
COLOR_AUG_SSD: True
SIZE_DIVISIBILITY: 896 # used in dataset mapper
FORMAT: "RGB"
TEST:
DETECTIONS_PER_IMAGE: 250
EVAL_PERIOD: 5000
AUG:
ENABLED: False
MIN_SIZES: [448, 678, 896, 1120, 1344, 1568]
MAX_SIZE: 6272
FLIP: True
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.00004
AMP:
ENABLED: False
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
_BASE_: ../oneformer_R50_bs16_160k.yaml
MODEL:
BACKBONE:
NAME: "D2InternImage"
INTERNIMAGE:
CHANNELS: 320
DEPTHS: [6, 6, 32, 6]
GROUPS: [10, 20, 40, 80]
WITH_CP: True
MLP_RATIO: 4.0
DW_KERNEL_SIZE: 5
LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
WEIGHTS: "internimage_h_jointto22k_384.pkl"
PIXEL_MEAN: [123.675, 116.280, 103.530]
PIXEL_STD: [58.395, 57.120, 57.375]
ONE_FORMER:
NUM_OBJECT_QUERIES: 250
INPUT:
MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 896) for x in range(5, 21)]"]
MIN_SIZE_TRAIN_SAMPLING: "choice"
MIN_SIZE_TEST: 896
MAX_SIZE_TRAIN: 3584
MAX_SIZE_TEST: 3584
CROP:
ENABLED: True
TYPE: "absolute"
SIZE: (896, 896)
SINGLE_CATEGORY_MAX_AREA: 1.0
COLOR_AUG_SSD: True
SIZE_DIVISIBILITY: 896 # used in dataset mapper
FORMAT: "RGB"
TEST:
DETECTIONS_PER_IMAGE: 250
EVAL_PERIOD: 5000
AUG:
ENABLED: False
MIN_SIZES: [448, 678, 896, 1120, 1344, 1568]
MAX_SIZE: 6272
FLIP: True
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
_BASE_: ../oneformer_R50_bs16_90k.yaml
MODEL:
BACKBONE:
NAME: "D2InternImage"
SEM_SEG_HEAD:
CONVS_DIM: 1024
MASK_DIM: 1024
INTERNIMAGE:
CHANNELS: 320
DEPTHS: [6, 6, 32, 6]
GROUPS: [10, 20, 40, 80]
WITH_CP: True
MLP_RATIO: 4.0
DW_KERNEL_SIZE: 5
LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
WEIGHTS: "mapillary_pretrain_250_16_intern_image_h_oneformer_mapillary_300k_1024.pth"
PIXEL_MEAN: [123.675, 116.280, 103.530]
PIXEL_STD: [58.395, 57.120, 57.375]
ONE_FORMER:
HIDDEN_DIM: 1024
NUM_OBJECT_QUERIES: 250
NHEADS: 32
DIM_FEEDFORWARD: 4096
TEXT_ENCODER:
WIDTH: 1024
CONTEXT_LENGTH: 77
N_CTX: 16
TEST:
DETECTIONS_PER_IMAGE: 250
INPUT:
MIN_SIZE_TRAIN: !!python/object/apply:eval ["[int(x * 0.1 * 896) for x in range(5, 21)]"]
MIN_SIZE_TRAIN_SAMPLING: "choice"
MIN_SIZE_TEST: 896
MAX_SIZE_TRAIN: 3584
MAX_SIZE_TEST: 3584
CROP:
ENABLED: True
TYPE: "absolute"
SIZE: (896, 896)
SINGLE_CATEGORY_MAX_AREA: 1.0
COLOR_AUG_SSD: True
SIZE_DIVISIBILITY: 896 # used in dataset mapper
FORMAT: "RGB"
TEST:
DETECTIONS_PER_IMAGE: 250
EVAL_PERIOD: 5000
AUG:
ENABLED: False
MIN_SIZES: [448, 678, 896, 1120, 1344, 1568]
MAX_SIZE: 6272
FLIP: True
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.00004
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
_BASE_: ../oneformer_R50_bs16_90k.yaml
MODEL:
BACKBONE:
NAME: "D2InternImage"
INTERNIMAGE:
CHANNELS: 320
DEPTHS: [6, 6, 32, 6]
GROUPS: [10, 20, 40, 80]
WITH_CP: True
MLP_RATIO: 4.0
DW_KERNEL_SIZE: 5
LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
WEIGHTS: "internimage_h_jointto22k_384.pkl"
PIXEL_MEAN: [123.675, 116.280, 103.530]
PIXEL_STD: [58.395, 57.120, 57.375]
ONE_FORMER:
NUM_OBJECT_QUERIES: 250
TEST:
DETECTIONS_PER_IMAGE: 250
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
_BASE_: ../oneformer_R50_bs16_50ep.yaml
MODEL:
BACKBONE:
NAME: "D2InternImage"
SEM_SEG_HEAD:
NAME: "OneFormerHead"
CONVS_DIM: 1024
MASK_DIM: 1024
INTERNIMAGE:
CHANNELS: 320
DEPTHS: [6, 6, 32, 6]
GROUPS: [10, 20, 40, 80]
WITH_CP: True
MLP_RATIO: 4.0
DW_KERNEL_SIZE: 5
LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
WEIGHTS: "internimage_h_jointto22k_384.pkl"
PIXEL_MEAN: [123.675, 116.280, 103.530]
PIXEL_STD: [58.395, 57.120, 57.375]
ONE_FORMER:
HIDDEN_DIM: 1024
NUM_OBJECT_QUERIES: 250
NHEADS: 32
DIM_FEEDFORWARD: 4096
TEXT_ENCODER:
WIDTH: 1024
CONTEXT_LENGTH: 77
N_CTX: 16
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.00004
STEPS: (655556, 735184)
MAX_ITER: 737500
AMP:
ENABLED: False
TEST:
DETECTIONS_PER_IMAGE: 250
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
_BASE_: ../oneformer_R50_bs16_300k.yaml
MODEL:
BACKBONE:
NAME: "D2InternImage"
SEM_SEG_HEAD:
CONVS_DIM: 1024
MASK_DIM: 1024
INTERNIMAGE:
CHANNELS: 320
DEPTHS: [6, 6, 32, 6]
GROUPS: [10, 20, 40, 80]
WITH_CP: True
MLP_RATIO: 4.0
DW_KERNEL_SIZE: 5
LEVEL2_POST_NORM_BLOCK_IDS: [5, 11, 17, 23, 29]
WEIGHTS: "pretrain/internimage_h_jointto22k_384.pkl"
PIXEL_MEAN: [123.675, 116.280, 103.530]
PIXEL_STD: [58.395, 57.120, 57.375]
ONE_FORMER:
HIDDEN_DIM: 1024
NUM_OBJECT_QUERIES: 250
NHEADS: 32
DIM_FEEDFORWARD: 4096
TEXT_ENCODER:
WIDTH: 1024
CONTEXT_LENGTH: 77
N_CTX: 16
TEST:
DETECTIONS_PER_IMAGE: 250
SOLVER:
IMS_PER_BATCH: 16
BASE_LR: 0.00002
2 changes: 2 additions & 0 deletions demo/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
add_swin_config,
add_dinat_config,
add_convnext_config,
add_internimage_config
)
from predictor import VisualizationDemo

Expand All @@ -44,6 +45,7 @@ def setup_cfg(args):
add_dinat_config(cfg)
add_convnext_config(cfg)
add_oneformer_config(cfg)
add_internimage_config(cfg)
cfg.merge_from_file(args.config_file)
cfg.merge_from_list(args.opts)
cfg.freeze()
Expand Down
6 changes: 4 additions & 2 deletions demo/predictor.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ def run_on_image(self, image, task):
# Convert image from OpenCV BGR format to Matplotlib RGB format.
image = image[:, :, ::-1]
vis_output = {}

assert task in ['panoptic', 'semantic', 'instance'], "task should be one of 'panoptic', 'semantic', 'instance'"

if task == 'panoptic':
visualizer = Visualizer(image, metadata=self.metadata, instance_mode=ColorMode.IMAGE)
Expand All @@ -61,14 +63,14 @@ def run_on_image(self, image, task):
panoptic_seg.to(self.cpu_device), segments_info, alpha=0.7
)

if task == 'panoptic' or task == 'semantic':
if task == 'semantic':
visualizer = Visualizer(image, metadata=self.metadata, instance_mode=ColorMode.IMAGE_BW)
predictions = self.predictor(image, task)
vis_output['semantic_inference'] = visualizer.draw_sem_seg(
predictions["sem_seg"].argmax(dim=0).to(self.cpu_device), alpha=0.7
)

if task == 'panoptic' or task == 'instance':
if task == 'instance':
visualizer = Visualizer(image, metadata=self.metadata, instance_mode=ColorMode.IMAGE_BW)
predictions = self.predictor(image, task)
instances = predictions["instances"].to(self.cpu_device)
Expand Down
Loading