Implementation of the paper StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis, which is a pioneering work exploring a better visual representation ''stroke tokens'' on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed.
VQ-Stroke modules encompasses two main stages: “Code to Matrix” stage that transforms SVG code into the matrix format suitable for model input, and “Matrix to Token” stage that transforms the matrix data into stroke tokens.
Overview of VQ-Stroke. Overview of Down-Sample Blocks and Up-Sample Blocks.We check the reproducibility under this environment.
- Python 3.10.13
- CUDA 11.1
Prepare your environment with the following command
git clone https://github.com/ProjectNUWA/StrokeNUWA.git
cd StrokeNUWA
conda create -n strokenuwa python=3.9
conda activate strokenuwa
# install conda
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# install requirements
pip install -r requirements.txt
We utilize Flan-T5 (3B) as our backbone. Download the model under the ./ckpt
directory.
FIGR-8-SVG Dataset
Download the raw FIGR-8 dataset from [Link] and follow Iconshop to further preprocess the datasets. (We thank @Ronghuan Wu --- author of Iconshop for providing the preprocessing scripts.)
python scripts/train_vq.py -cn example
python scripts/test_vq.py -cn config_test CKPT_PATH=/path/to/ckpt TEST_DATA_PATH=/path/to/test_data
After training the VQ-Stroke, we first create the training data by inferencing on the full training data, obtaining the "Stroke" tokens and utilize these "Stroke" tokens to further training the Flan-T5 model.
We have provided an example.sh
and training example data example_dataset/data_sample_edm.pkl
for users for reference.
We appreciate the open source of the following projects: