Releases: flexflow/flexflow-train
Release 22.07
This is the last stable release of FlexFlow before the Unity merge. Unity enables joint optimization of algebraic transformations and parallelization and generally achieves better performance and scalability compared to the original FlexFlow without Unity's optimizations. The Unity merge introduces the following major changes to FlexFlow.
-
With Unity, we now use parallel computation graphs (PCGs) to represent a DNN model. PCG is a unified representation of
distributed DNN training that simultaneously expresses computation, parallelism, and data movement. A detailed description of PCG is available here. -
We add support for Unity's additional forms of parallelism, including reduction parallelism and other operator-specific parallelization strategies.
-
We replace FlexFlow's MCMC search with a three-layer hierarchical search algorithm, which discovers joint optimization of algebraic transformations and parallelization and achieves better performance and scalability compared to FlexFlow's MCMC search.
Starting from this release, Unity's changes will be available in the master branch of the FlexFlow repository.
Release 22.05
This is a stable release of FlexFlow in preparation for the Unity merge.
Frontend support:
- FlexFlow now supports training HuggingFace models using the PyTorch fx interface. An example of training HuggingFace MT5 in FlexFlow is available at https://github.com/flexflow/FlexFlow/tree/master/examples/python/pytorch/mt5
PyTorch Alignment:
- Added unit tests for aligning FlexFlow's operators with PyTorch's. For each operator, the unit test checks if FlexFlow and PyTorch return identical activations/gradients when given the same inputs. More details of the PyTorch alignment is available at https://github.com/flexflow/FlexFlow/tree/master/align
Documentation:
- Initial documentation support added: https://github.com/flexflow/FlexFlow/tree/master/docs
Operators:
- Multiple bug fixes for FlexFlow operators
Broadcast:
- FlexFlow now supports broadcasting for a subset of operators, include elementwise unary and elementwise binary operators. The broadcasting semantic is identical to that of Numpy's
Release 21.09 (September 30th 2021)
Frontend Supports
- Changing PyBind11 as the default Python frontend in FlexFlow.
Control Replication
- FlexFlow now enables Legion's dynamic control replication by default
Distributed training
- FlexFlow now uses NCCL AllReduce for gradients synchronization by default. To switch to distributed parameter server, set
FF_USE_NCCL=OFF
in cmake.
Distributed inference
- Passing
comp_node = comp_node = CompMode::INFERENCE
as an additional argument tomodel.compile
will run a DNN model in the inference model - Various bug fixes and performance improvements for distributed inference in FlexFlow.
Operators
- Additional operators include AggregateSpec, Multi-Head Attention
Machine Model
- FlexFlow now support a new machine model for more precisely modeling network topology and simulating traffics at the granularity of individual packages
Release 21.03 (March 31, 2021)
- Build
- FlexFlow now uses camke build by default, the Makefiles will be deprecated soon.
- Frontend Supports
- In addition to CFFI, FlexFlow now also supports Python interface via PyBind11. To use ByBind11, please set
FF_USE_PYBIND = ON
in cmake.
- In addition to CFFI, FlexFlow now also supports Python interface via PyBind11. To use ByBind11, please set
- Distributed inference
- FlexFlow supports automated performance tuning for both distributed training and inference. For optimizing and performing distributed inference, simply pass
comp_node = CompMode::INFERENCE
as an additional argument tomodel.compile
. An example can be found at https://github.com/flexflow/FlexFlow/blob/master/examples/python/native/bert_proxy_native.py.
- FlexFlow supports automated performance tuning for both distributed training and inference. For optimizing and performing distributed inference, simply pass
- Runtime
- FlexFlow now supports gradients update via either Parameter Server or NCCL Allreduce. To enable NCCL, please set
FF_USE_NCCL = ON
in cmake.
- FlexFlow now supports gradients update via either Parameter Server or NCCL Allreduce. To enable NCCL, please set
- Operators
- New operators including Aggregate, Multi-head Attention, Scalar Multiply, Scalar Add, Scalar Sub, Scalar Divide and Top-K.
- Conv2D now supports group convolutions.
- Examples
- Unit tests of all operators have been added to the tests/ops folder.
Release 20.12 (December 21, 2020)
- Build
- FlexFlow now supports both Makefile and CMake build. More details are available in this instruction.
- Frontend Supports
- PyTorch. FlexFlow now supports training existing PyTorch models with minimal changes to the source code. To run PyTorch models in FlexFlow, users can first export a model to the ONNX format using
torch.onnx
and then load an ONNX model in FlexFlow for distributed training. More examples: https://github.com/flexflow/FlexFlow/tree/master/examples/python/pytorch - ONNX. FlexFlow supports training existing ONNX models through
flexflow.onnx.model
. More examples: https://github.com/flexflow/FlexFlow/tree/master/examples/python/onnx - TensorFlow Keras. Similar to the PyTorch support.
flexflow.keras
enables distributed training of existing TensorFlow Keras models. See this bootcamp talk for more details.
- PyTorch. FlexFlow now supports training existing PyTorch models with minimal changes to the source code. To run PyTorch models in FlexFlow, users can first export a model to the ONNX format using
- Parallelization Optimizer
- Integrated the parallelization optimizer into the FlexFlow runtime. Users can now use the
--search-budget
and--search-alpha
to control the FlexFlow parallelization optimizer for searching for optimized strategies. See this post for the usage of the optimizer.
- Integrated the parallelization optimizer into the FlexFlow runtime. Users can now use the
- Examples
- More PyTorch, ONNX, TensorFlow Keras examples have been added to the
/examples/python
folder. - Updated the cpp examples to use the new runtime interface.
- More PyTorch, ONNX, TensorFlow Keras examples have been added to the
- Mapper
- Implemented a new mapper with improved runtime performance.
- Legion
- Updated the Legion version with improved runtime performance
FlexFlow v1.1.1 Release for the SysML19 Artifact Evaluation
This is v1.1.1 pre-release for SysML19 Artifact Evaluation. Follow the instructions to build FlexFlow and use the script run_experiments.sh to run all experiments.
FlexFlow v1.1 Release for the SysML19 Artifact Evaluation
This is v1.1 pre-release for SysML19 Artifact Evaluation. Follow the instructions to build FlexFlow and use the script run_experiments.sh to run all experiments.
SysML19 Artifact Evaluation
This is a pre-release for SysML19 Artifact Evaluation. Follow the instructions to build FlexFlow and use the script run_experiments.sh
to run all experiments.