Skip to content
This repository has been archived by the owner on May 2, 2024. It is now read-only.

Implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE-VILBERT-MULTI-TASK
Notifications You must be signed in to change notification settings

ThalesGroup/ConceptBERT

Repository files navigation

ConceptBert

This repository is the implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering.

Original paper: François Gardères, Maryam Ziaeefard, Baptiste Abeloos, Freddy Lécué: ConceptBert: Concept-Aware Representation for Visual Question Answering. EMNLP (Findings) 2020: 489-498 https://aclanthology.org/2020.findings-emnlp.44.pdf

For an overview of the pipleline, please refere to the following picture:

Pipeline

License

This work is dual-licensed under the Thales Digital Solutions Canada license and MIT License.

  • The main license is the Thales Digital Solutions Canada one. You can find the license file here.
  • This repository is based on and inspired by Facebook research (vilbert-multi-task). We sincerely thank for their sharing of the codes. The code related to vilbert-multi-task is licensed by the MIT License, please for more information refer to the file.

Pre-requisite

  • python 3.6.12
  • docker environment

Recommended

If you want to be able to develop on docker, we recommend you to use VSCODE with the container plugin.

Disclaimer

Currently, the project requires a lot of resources to be able to run correctly.

It is necessary to count at least 6 days of training for the first training with a GTX 1080 Ti(11Go RAM), and 17hours in an Kubernetes environment with 7GPU (7 Titan-v(32Go)). All the pipelines were tested on GPU server with four GeForce RTX 2080 Ti (12Go)

🔌 Data

ℹ️ Notes:

Our implementation uses the pretrained features from bottom-up-attention, 100 fixed features per image and the GloVe vectors. The data might be saved in a folder along with pretrained_models and organized as shown below:

vilbert
├── data2
│   ├── coco (visual features)
│   ├── conceptnet (conceptnet facts)
│   ├── conceptual_captions (captions for each image, extracted from (https://github.com/google-research-datasets/conceptual-captions))
│   ├── kilbert_base_model (pre-trained weights for initial conceptBert model)
│   ├── OK-VQA (OK-VQA dataset)
│   ├── save_final (final saved models and outputs)
│   ├── tensorboards (location to save tensorboard files)
│   ├── VQA (VQA dataset)
│   ├── VQA_bert_base_6layer_6conect-pretrained (pre-trained weights for initial vilbert model trained on vqa)

The model checkpoints will be saved in the output : ./outputs/

🐋 Docker installation (recommended)

You can choose to run ConceptBert with Docker or from your environment

Build

  docker build -t conceptbert .

Start the container

  docker run -it -v /path/to/you/nas/:/nas-data/ conceptbert:latest bash

Additional parameters

  docker run -it -v --shm-size=10g -e CUDA_VISIBLE_DEVICES=0,1,2,3 -v /path/to/you/nas/:/nas-data/ conceptbert:latest bash
  • --shm-size is used to prevent Shared Memory error. Here the value is 10Go (refer docker documentation)
  • -e CUDA_VISIBLE_DEVICES is used to use specific GPU available. Here we want to use 4 GPU.

When the container is up, go to the section 1. Train with VQA

Other installation

You can use the requirements.txt file to install the dependencies of the project.

Pre-requisite:

  • Compile the tools cd conceptBert/tools/refer && make
  • python 3.6.x

If you have difficulties to create your environment, look at the contents of the Dockerfile for the necessary dependencies that you might miss.

🚀 Training and Validation

Note: models and json used in the following examples are the current best results

1. Train with VQA

First we use VQA dataset to train a baseline model. Use the following command:

  python3 -u train_tasks.py --model_version 3 --bert_model=bert-base-uncased --from_pretrained_conceptBert None \
      --from_pretrained=/nas-data/vilbert/data2/kilbert_base_model/pytorch_model_9.bin \
      --config_file config/bert_base_6layer_6conect.json \
      --output_dir=/nas-data/outputs/train1_vqa_trained_model/ \
      --summary_writer /nas-data/tensorboards/ \
      --num_workers 16 \
      --tasks 0

Command description

Parameter Description
u -u is used to force stdin, stdout and stderr to be totally unbuffered, which otherwise is line buffered on the terminal
model_version Which version of the model you want to use
bert_model Bert pre-trained model selected in the list: bert-base-uncased, bert-large-uncased, bert-base-cased, bert-base-multilingual, bert-base-chinese.
from_pretrained_conceptBert folder of the previous trained model. In this case, it's the first train, so the value isNone
from_pretrained pre-trained Bert model (VQA)
config_file 3 config files are available in conceptBert/config/
output_dir folder where the results are saved
summary_writer folder used to save tensorboard items. A sub-folder will be created with the date of the day
num_worker Tells the data loader instance how many sub-processes to use for data loading. **Use your own value in
regard of your environment**
task task = 0, we use VQA dataset

2. Train with OK-VQA (fine-tuning)

Then we use OK-VQA dataset and the trained model from step 1 to train a model. Use the following command:

  python3 -u train_tasks.py --model_version 3 --bert_model=bert-base-uncased \
      --from_pretrained=/nas-data/vilbert/data2/save_final/VQA_bert_base_6layer_6conect-beta_vilbert_vqa/pytorch_model_11.bin \
      --from_pretrained_conceptBert /nas-data/outputs/train1_vqa_trained_model/VQA_bert_base_6layer_6conect/pytorch_model_19.bin \
      --config_file config/bert_base_6layer_6conect.json \
      --output_dir=/nas-data/outputs/train2_okvqa_trained_model/ \
      --summary_writer /outputs/tensorboards/  \
      --num_workers 16 \
      --tasks 42

Command description

The parameters are the same as above, but these values change:

Parameter Description
from_pretrained_conceptBert The path of the model trained previously (step1 VQA). Corresponding of the last pytorch_model_**.bin file generated
from_pretrained pre-trained Bert model (OK-VQA)
task task = 42 OKVQA dataset is used

3. Validation with OK-VQA

To validate on held out validation split, we use the model trained in step 2 using following command: VQA_bert_base_6layer_6conect

  python3 -u eval_tasks.py --model_version 3 --bert_model=bert-base-uncased \
      --from_pretrained=/nas-data/vilbert/data2/save_final/VQA_bert_base_6layer_6conect-beta_vilbert_vqa/pytorch_model_11.bin  \
      --from_pretrained_conceptBert=/nas-data/outputs/train2_okvqa_trained_model/OK-VQA_bert_base_6layer_6conect/pytorch_model_99.bin \
      --config_file config/bert_base_6layer_6conect.json \
      --output_dir=/nas-data/outputs/validation_okvqa_trained_model/ \
      --num_workers 16 \
      --tasks 42 \
      --split val

Two files will be generated:

  • Val_other give 8 top answers for each questions
  • val_result used in the evaluation

Command description

The parameters are the same as above, but theses values change:

Parameter Description
from_pretrained_conceptBert The path of the model trained previously (step2 OKVQA). Corresponding of the last pytorch_model_**.bin file generated
from_pretrained same pre-trained Bert model (OK-VQA) as step2
task task = 42 OKVQA is used

🚀 Evaluation

Run the evaluation :

Start the training with:

  python3 PythonEvaluationTools/vqaEval_okvqa.py \
      --json_dir /nas-data/outputs/validation_okvqa_trained_model/ \
      --output_dir /nas-data/outputs/validation_okvqa_trained_model/

Command description

  • json_dir: path where is located the val_result.json
  • output_path: folder where the accuracy will be saved
  • /nas-data/outputs/validation_okvqa_trained_model/: is the final json. You must change this by the path of the json you want to evaluate.

🐛 Known issues

  • If python-prctl return "python-prctl" Command "python setup.py egg_info" failed with error error, use this command :
  sudo apt-get install libcap-dev python3-dev

💡 Compare the results

Step 1: Training with VQA

  • 20 checkpoints must have been created (last file name must be pytorch_model_19.bin)

Step 2: Training with OK-VQA

  • 100 checkpoints must have been created (last file name must be pytorch_model_99.bin)

Step 3: Validation with OK-VQA

  • The validation generates two json file. val_result.json will be used in the evaluation.
  • Open the logs in the output folder (nas-data-) to check the result of the eval_score:
08/12/2020 13:09:46 - INFO - utils -   Validation [OK-VQA]: loss 3.681 score 33.040

If you want to optimize your model the loss and score must be at least be the same as above.

Evaluation

Compare your results in the accuracy.json file (results must be at least as good as the following ones).

{
  "overall": 33.04,
  "perQuestionType": {
    "one": 30.82,
    "eight": 33.6,
    "other": 32.57,
    "seven": 30.61,
    "four": 36.79,
    "five": 33.66,
    "three": 31.73,
    "nine": 31.43,
    "ten": 45.58,
    "two": 30.23,
    "six": 30.07
  },
  "perAnswerType": {
    "other": 33.04
  }
}

VQA Training

OK-VQA Training

Troubleshooting

CUDA out of memory

Try the following recommendation to resolve the problem:

  • Change the value of num_workers in your training command (ex. --num_workers 1)
  • Try one of the improvements proposition bellow
  • Reduce parameters in vlbert_tasks.yml:
    • max_seq_length
    • batch_size
    • eval_batch_size

Example:

  max_seq_length: 4 # DGX value : 16
  batch_size: 256 # DGX value : 1024
  eval_batch_size: 256 # DGX value : 1024

Improvements

There are several areas for improvement:

  • Search and replace the to.device() parameter in the code to be executed in the better position
  • Load a part of the dataset (create a method to load a batch of the dataset). Dataset management is in vqa_dataset.py , method _load_dataset, variables questions = questions_train + questions_val[:-3000] and answers = answers_train + answers_val[:-3000]
  • Train your own BERT (or find a lighter Bert)
  • Initialise Bert once and load it after

About

Implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE-VILBERT-MULTI-TASK

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •