This is the PyTorch implementation for inference and training of the LLM-Driver described in:
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen, Oleg Sinavski, Jan Hünermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, Jamie Shotton
ICRA 2024
[preprint] [arxiv]
The LLM-Driver utilises object-level vector input from our driving simulator to predict explanable actions using pretrained Language Models, providing a robust and interpretable solution for autonomous driving.
The LLM-Driver running in open-loop prediction using the vector inputs (top-left BEV view), with the results of action prediction (steering angles and acceleration/brake pedals), action justification (captions on the rendered video), Driving Question Answering (table at the bottom).
[2024/01/29]
Thrilled to share that our paper has been accepted by ICRA 2024![2023/12/21]
Please checkout our follow-up workLingoQA
: [code] [arxiv][2023/10/03]
The paper is now avaliable on [arxiv][2023/07/06]
The paper and code have been made available under the paper_code branch for anonymous submission.
- Python 3.x
- pip
- Minimum of 20GB VRAM for running evaluations
- Minimum of 40GB VRAM for training (default setting)
-
Set up a virtual environment (tested with Python 3.8-3.11)
python3 -m venv env source env/bin/activate
-
Install required dependencies
pip install -r requirements.txt.lock
Note: requirements.txt.lock
is generated with pip-compile
from original requirements.txt
for reproducibility.
-
Set up WandB API key
Set up your WandB API key for training and evaluation logging.
export WANDB_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
-
Training/testing data: The datasets have already been checked into the codebase. To unarchive them, use the following commands:
tar -xzvf data/vqa_train_10k.tar.gz -C data/ tar -xzvf data/vqa_test_1k.tar.gz -C data/
-
Re-collect DrivingQA data: While the training and evaluation datasets already include pre-collected DrivingQA data, we also offer a script that illustrates how to collect DrivingQA data using the OpenAI ChatGPT API. If you wish to re-collect the DrivingQA data, simplely run the following command with your OpenAI API key:
python scripts/collect_vqa.py -i data/vqa_test_1k.pkl -o output_folder/ --openai_api xxxxxxxx
-
Evaluate for Perception and Action Prediction
Run the following command:
python train.py \ --mode eval \ --resume_from_checkpoint models/weights/stage2_with_pretrained/ \ --data_path data/vqa_train_10k.pkl \ --val_data_path data/vqa_test_1k.pkl \ --eval_items caption,action \ --vqa
-
Evaluate for DrivingQA
Run the following command:
python train.py \ --mode eval \ --resume_from_checkpoint models/weights/stage2_with_pretrained/ \ --data_path data/vqa_train_10k.pkl \ --val_data_path data/vqa_test_1k.pkl \ --eval_items vqa \ --vqa
-
View Results
The results can be viewed on the WandB project "llm-driver".
-
Grade DrivingQA Results with GPT API
To grade the results with GPT API, run the following command:
python scripts/grade_vqa.py \ -i data/vqa_test_1k.pkl \ -o results/10k_ft.pkl \ -r results/10k_ft.json \ --openai_api xxxxxxxx
Replace the
results/10k_ft.json
with theval_results.table.json
downloaded from WandB to grade your results.
-
Run LLM-Driver Training
Execute the following command to start training:
python train.py \ --mode train \ --eval_steps 50 \ --val_set_size 32 \ --num_epochs 5 \ --resume_from_checkpoint models/weights/stage1_pretrained_model/ \ --data_path data/vqa_train_10k.pkl \ --val_data_path data/vqa_test_1k.pkl \ --vqa
-
Follow the previous section for evaluating LLM-Driver
-
[optional] Train and evaluate Perceiver-BC
Execute the following command to start training and evaluation:
python train_bc.py \ --num_epochs 25 \ --data_path data/vqa_train_10k.pkl \ --val_data_path data/vqa_test_1k.pkl
If you find our work useful in your research, please consider citing:
@inproceedings{chen2024drivingwithllms,
title={Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving},
author={Long Chen and Oleg Sinavski and Jan Hünermann and Alice Karnsund and Andrew James Willmott and Danny Birch and Daniel Maund and Jamie Shotton},
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
year={2024}
}
@article{marcu2023lingoqa,
title={LingoQA: Video Question Answering for Autonomous Driving},
author={Ana-Maria Marcu and Long Chen and Jan Hünermann and Alice Karnsund and Benoit Hanotte and Prajwal Chidananda and Saurabh Nair and Vijay Badrinarayanan and Alex Kendall and Jamie Shotton and Oleg Sinavski},
journal={arXiv preprint arXiv:2312.14115},
year={2023},
}
This project has drawn inspiration from the Alpaca LoRA repository. We would like to express our appreciation for their contributions to the open-source community.