LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS

Authors: Nam V. Nguyen, Thong T. Doan, Luong Tran, Van Nguyen, Quang Pham

📢 Release Notes • 🚀 Quick Start • 📌 About • 🔧 Setup New MoE Layer • 🏋️‍♂️ Training • 🧪 Evaluation • 📌 Citation

📌 About

Mixture of Experts (MoEs) plays an important role in the development of more efficient and effective large language models (LLMs). Due to the enormous resource requirements, studying large scale MoE algorithms remain in-accessible to many researchers. This work develops LibMoE, a comprehensive and modular framework to streamline the research, training, and evaluation of MoE algorithms. Built upon three core principles: (i) modular design, (ii) efficient training; (iii) comprehensive evaluation, LibMoE brings MoE in LLMs more accessible to a wide range of researchers by standardizing the training and evaluation pipelines. Using LibMoE, we extensively benchmarked five state-of-the-art MoE algorithms over three different LLMs and 11 datasets under the zero-shot setting. The results show that despite the unique characteristics, all MoE algorithms perform roughly similar when averaged across a wide range of tasks. With the modular design and extensive evaluation, we believe LibMoE will be invaluable for researchers to make meaningful progress towards the next generation of MoE and LLMs.

📢 Release Notes

Date	Release Notes
2024-11-04	- Additional feature metric analysis for MoE algorithms in the LibMoE paper – ✅
2024-11-01	- Released LibMoE v1.0 preprint report HERE ✅ - LibMoE webpage HERE ✅ - Publicly available checkpoints ✅

🔍 Checkpoints

We are making our entire experiment checkpoints publicly available to contribute to the community's research on the topic of Mixture of Experts (MoE). By reusing our checkpoints at the Pre-Training and Pre-FineTuning stages, we hope to help others save time and computational resources in their own experiments.

Method	Stage	Siglip 224 + Phi3.5	Siglip 224 + Phi3	CLIP 336 + Phi3
Pre-Training		Link	Link	Link
Pre-FineTuning		Link	Link	Link
VIT 665K	SMoE-R	Link	Link	Link
	Cosine-R	Link	Link	Link
	Sigmoid-R	Link	Link	Link
	Hyper-R	Link	Link	Link
	Perturbed Cosine-R	Link	Link	Link

*VIT stands for Visual Instruction Tuning.

more ...

🚀 Quick Start

📥 Installation

Clone this repository:

git clone https://github.com/Fsoft-AIC/LibMoE.git
cd LibMoE

Install dependencies:

We used Python 3.9 venv for all experiments, and it should be compatible with Python 3.9 or 3.10 under Anaconda if you prefer to use it.

Using venv:

python -m venv /path/to/new/virtual/moe
source /path/to/new/virtual/moe/bin/activate

Using Anaconda:

conda create -n moe python=3.9 -y
conda activate moe

Then, install the required packages:

pip install --upgrade pip
pip install -e .
pip install -r ./requirements.txt

Install additional packages:

Choose the FlashAttention version based on your Torch version from the FlashAttention Releases.

Example:

pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+cu118torch2.1cxx11abiFALSE-cp311-cp311-linux_x86_64.whl

📊 Dataset Preparation

For a detailed, step-by-step guide on setting up the dataset, please refer to the dataset guide.

🔧 Setup New MoE Layer

For a detailed step-by-step guide on setting up a new MoE layer, please refer to the model guide.

🏋️‍♂️ Training

After downloading the datasets and the corresponding JSON files, you can proceed to train the model using the following commands. Below is an example using the Phi3 configuration.

Option 1: Run Each Stage Separately

Pre-train the MLP connector:

bash scripts/train/phi3mini/clip/pretrain_phi3.sh

Pre-finetune the whole model:

bash scripts/train/phi3mini/clip/pft_phi3mini.sh

Visual instruction tuning stage:

bash scripts/train/phi3mini/clip/sft_phi3mini.sh

Option 2: Run All Stages

You can run all stages in sequence with the following command:

bash scripts/train/run_train_all.sh

Note:

These scripts are designed for training the model on a single node with 4x A100 GPUs.
You must set the batch_size to the value specified in our scripts (/scripts/train/phi3mini/clip) for each stage (batch_size = gradient_accumulation_steps * batch_size_current).

Test Training the Model

We recommend running all stages with MAX_STEPS=2 to check for issues in each stage. This approach allows you to identify and fix problems quickly, ensuring a stable process. After testing, set MAX_STEPS=-1 to train all steps fully. Also, remember to delete the checkpoint folder that was created during testing.

#!/bin/bash
export TMPDIR=""
export TOOLKIT_DIR=""  # Path to the toolkitmoe directory
export KEY_HF=""       # Hugging Face API key
export ID_GPUS="0,1,2,3"
# Set to -1 to run all steps
export MAX_STEPS=2  # Select a suitable number of steps for testing each stage

echo "Starting pretrain stage"
bash ./scripts/train/phi3mini/pretrain_phi3.sh

echo "Starting pft stage"
bash ./scripts/train/phi3mini/pft_phi3mini.sh

echo "Starting sft stage"
bash ./scripts/train/phi3mini/sft_phi3mini.sh

🧪 Evaluation

We are evaluate multi-benchmark

AI2D
ChartQA
Text VQA
GQA
HallusionBenchmark
MathVista Validation
MMBenchEN
MME
MMMU Validation
MMStar
POPE
SQA IMG Full

To run the evaluation, use the following command:

bash scripts/eval/run_eval.sh

*Note: For the MathVista Validation and HallusionBenchmark, GPT-4 is used for evaluation. You need to provide an API key to perform the evaluation.

Multiple Usages

Evaluation of LLaVA on MME

python3 -m accelerate.commands.launch \
    --num_processes=8 \
    -m lmms_eval \
    --model llava \
    --model_args pretrained="liuhaotian/llava-v1.5-7b" \
    --tasks mme \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix llava_v1.5_mme \
    --output_path ./logs/ \
    --return_id_experts true \  # return selected expert IDs
    --layers_expert_selection 1 2 3  # define specific layers for expert selection; if no layer IDs are defined, all experts from all layers are selected by default

Evaluation of LLaVA on multiple datasets

python3 -m accelerate.commands.launch \
    --num_processes=8 \
    -m lmms_eval \
    --model llava \
    --model_args pretrained="liuhaotian/llava-v1.5-7b" \
    --tasks mme,mmbench_en \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix llava_v1.5_mme_mmbenchen \
    --output_path ./logs/ \
    --return_id_experts true \  # return selected expert IDs
    --layers_expert_selection 1 2 3  # define specific layers for expert selection; if no layer IDs are defined, all experts from all layers are selected by default

For other variants llava. Please change the conv_template in the model_args

conv_template is an arg of the init function of llava in lmms_eval/models/llava.py, you could find the corresponding value at LLaVA's code, probably in a dict variable conv_templates in moe_model/conversation.py

python3 -m accelerate.commands.launch \
    --num_processes=8 \
    -m lmms_eval \
    --model llava \
    --model_args pretrained="liuhaotian/llava-v1.6-mistral-7b,conv_template=mistral_instruct" \
    --tasks mme,mmbench_en \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix llava_v1.5_mme_mmbenchen \
    --output_path ./logs/ \
    --return_id_experts true \  # return selected expert IDs
    --layers_expert_selection 1 2 3  # define specific layers for expert selection; if no layer IDs are defined, all experts from all layers are selected by default

📌 Citation

If you find this repository useful, please consider citing our paper:

@misc{nguyen2024libmoelibrarycomprehensivebenchmarking,
      title={LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models}, 
      author={Nam V. Nguyen and Thong T. Doan and Luong Tran and Van Nguyen and Quang Pham},
      year={2024},
      eprint={2411.00918},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.00918}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
docs		docs
evaluate		evaluate
moe_model		moe_model
scripts		scripts
.gitignore		.gitignore
README.md		README.md
WEIGHT_LICENSE		WEIGHT_LICENSE
cog.yaml		cog.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS

📌 About

📢 Release Notes

🔍 Checkpoints

🚀 Quick Start

📥 Installation

📊 Dataset Preparation

🔧 Setup New MoE Layer

🏋️‍♂️ Training

🧪 Evaluation

Multiple Usages

📌 Citation

About

Releases

Packages

Contributors 2

Languages

Fsoft-AIC/LibMoE

Folders and files

Latest commit

History

Repository files navigation

LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS

📌 About

📢 Release Notes

🔍 Checkpoints

🚀 Quick Start

📥 Installation

📊 Dataset Preparation

🔧 Setup New MoE Layer

🏋️‍♂️ Training

🧪 Evaluation

Multiple Usages

📌 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages