- Checkout and compile IREE with release build and
export PATH=/path/to/iree/build/release/tools:$PATH
- Compile the full SDXL model:
./compile-txt2img.sh gfx942
(wheregfx942
is the target for MI300X) - Run the benchmark:
./benchmark-txt2img.sh N /path/to/weights/irpa
(whereN
is the GPU index)
Caution
IRs in the following table might be stale. Use the ones in the
base_ir/
directory instead.
Note
SDXL-turbo is only different from SDXL in its usage and training/weights. The model architecture (and therefore the weights-stripped MLIR) are equivalent.
Variant | Submodel | MLIR (No Weights) (Config A) | safetensors | Splat IRPA | MLIR (No Weights) (Config B) |
---|---|---|---|---|---|
SDXL1.0 1024x1024 (f16, BS1, len64) | |||||
UNet + attn | Torch - Linalg | - | - | Azure | |
UNet + PNDMScheduler | Azure | ||||
Clip1 | Azure | - | - | ||
Clip2 | Azure | - | - | ||
VAE decode + attn | Azure | - | = | Azure | |
VAE encode + attn | [GCloud][sdxl-1-1024x1024-f16-stripped-weight-vae-encode] | Same as decode | - | - | |
SDXL1.0 1024x1024 (f32, BS1, len64) | |||||
UNet + attn | Azure | Azure | Azure | Azure | |
Clip1 | Azure | Azure | Azure | - | |
Clip2 | Azure | Azure | Azure | - | |
VAE decode + attn | Azure | Azure | Azure | Azure | |
SDXL compiled pipeline IRPAs (f16) | |||||
UNet | scheduled_unet_f16.irpa | ||||
Prompt Encoder (CLIP1 + CLIP2) | prompt_encoder_f16.irpa | ||||
VAE | vae_decode_f16.irpa |