QSync

Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".

Description

QSync aims to explore the potential of removing unnecessary quantized operations to improve training accuracy. It achieves this through the following components:

Quantization perturbation indicator/Replayer for analyzing the global data flow graph's memory and latency under mixed-precision (Predictor)
Allocator for selecting the optimal quantized operations for training (Allocator / Syncer)
Support for low-precision backends (CUTLASS, CUDNN) (LP-PyTorch)

In particular, QSync addresses a specific practical scenario: hybrid-cluster training, which involves inference GPUs with power capabilities (memory, compute) and training GPUs with higher capabilities.

The provided scripts support both convolution-based and transformer-based models.

NOTE: The project is a bit old. The performance of kernel implementation may not catch up with latest PyTorch.

Set Environment

Clone the repo git clone --recursive https://github.com/bytedance/QSync.git

Docker

run build.sh under dockerfile
run run.sh, specifiying the necessary path mounting inside.
run pip install -e . right in the root folder of QSync, compilation of kernels will start.

Manual Installation

Some libs may hard to install without proxy. Change <abspath_to_root> in m_install.sh to the absolute path to the root folder. Then

bash m_install.sh
make

Usage

QSync is implemented under the qsync folder, composed of syncer, predictor and LpTorch.

to use LpTorch and convert your model to mixed-biwdith model, use model = QModule(model)
See detail for usage of predictor and syncer in the corresponding page.
See sample under benchmark_convs / benchmark_transformers

notice the cross-node cost modeling is not as accurate as single-node is. Extra efforts required to align the communication start.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

QSync

Description

Set Environment

Docker

Manual Installation

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

QSync

Description

Set Environment

Docker

Manual Installation

Usage