Release v19.0.0 · halide/Halide

Major improvements

Halide is now available for both C++ and Python usage via Pip. Try pip install halide today!
The Vulkan backend has matured substantially.
The HTML "conceptual statement" output now supports dark mode viewing.
For developers, CMake 3.28 is now required and we no longer require an internet connection during the build.
Thread pool improvements mean that workloads that do a small number of small tasks in parallel (e.g. a cheap operation applied to a small image) are up to 3x faster. If you have schedules that do not use parallelism for small inputs because you found it didn't provide any speedup, you may want to re-benchmark.
You can now query properties of the compiled-for target as Exprs, simplifying helper code that wants to do different things depending on the target architecture. Example: f(x) = select(target_arch_is(Target::ARM), 3, 7). Helpers include target_arch_is, target_os_is, target_has_feature, target_bits, and target_natural_vector_size. These are resolved to constants at compile-time and simplified away. Use with care, as this (intentionally) results in different behavior on different platforms.

Breaking changes

We now distribute libGenGen.a rather than GenGen.cpp.
- Downstream users should link to this library with /WHOLEARCHIVE: or -Wl,--whole-archive rather than build GenGen.cpp themselves.
- Users of the CMake package should be unaffected.
In keeping with our LLVM support policy, support for LLVM 16 has been removed.
We no longer use the le64/le32 generic targets for compiling runtime modules to LLVM. These targets were removed in LLVM upstream.

What's Changed

Apps and tests

Reschedule the matrix multiply performance app by @abadams in #8418
Update lesson_22_jit_performance.cpp by @abadams in #8438
Add threadpool performance test by @abadams in #8447
Don't allow internal_error to pass an error test by @alexreinking in #8458
Get more consistent distributions in parallel scenarios test by @abadams in #8451

Autoschedulers

Consider all Exprs a func uses, not just the RHS, in Li2018 by @abadams in #8326

Build system

Python_bindings-test-as-installed by @LebedevRI in #8355
Bump Halide version to 19 in main branch by @steven-johnson in #8357
Remove warning for unsupported compilers by @alexreinking in #8362
Bump CMake minimum version to 3.28 by @alexreinking in #8363
Quick CMake fixes enabled by 3.28 by @alexreinking in #8365
Distribute GenGen as a static library by @alexreinking in #8367
Clean up serialization build code by @alexreinking in #8369
List headers with target_sources FILE_SETS by @alexreinking in #8370
Clean up autoscheduler dependencies by @alexreinking in #8372
Use a Find module for V8 by @alexreinking in #8373
Use a Find module for NodeJS by @alexreinking in #8374
Move dependencies/wasm to use sites by @alexreinking in #8377
Replace FetchContent with a custom dependency provider by @alexreinking in #8378
Two more build fixes by @LebedevRI in #8371
Rework LLVM into Find module and enact new component policy. by @alexreinking in #8379
Reflow src/CMakeLists.txt in logical groups by @alexreinking in #8383
Introduce HalideFeatures system for optional components by @alexreinking in #8384
Scan generated export files to determine dependencies. by @alexreinking in #8385
Rewrite bundle_static to be much more efficient. by @alexreinking in #8386
Support using vcpkg to build dependencies on all platforms by @alexreinking in #8387
Fix bundling error on buildbots by @alexreinking in #8392
Support CMAKE_OSX_ARCHITECTURES by @alexreinking in #8390
Fix Homebrew LLVM 19 by @alexreinking in #8431
Fix CPack package naming when cross-compiling by @alexreinking in #8492
Fix Apple libtool detection in bundle_static by @alexreinking in #8495

CodeGen

Select condition vector lanes must match the true and false value by @abadams in #8465
Emit vscale_range() fn attribute in correct syntax by @steven-johnson in #8457
Fix #8455 (in combination with #8457) by @steven-johnson in #8456
Fix bonehead mistake in get_md_bool() by @steven-johnson in #8469
Propagate some facts about inequalities with min/max by @shoaibkamil in #8475
- This fixed an issue where predicates in .specialize() directives weren't able to eliminate select() cases. #8443

Debugging

Add LLDB pretty-printing by @alexreinking in #8460
Print constants in scientific precision by @antonysigma in #8506
Adaptive Dark colorscheme for Stmt HTML. Ability to programmatically export conceptual stmt files. by @mcourteaux in #8327

Documentation

Update README.md by @abadams in #8404
Big documentation update by @alexreinking in #8410
- Document how to find Halide from a pip installation by @alexreinking in #8411
- Link to PyPI from Doxygen index.html by @alexreinking in #8415
- Include our Markdown documentation in the Doxygen site. by @alexreinking in #8417
- Add missing backslash by @abadams in #8419

Frontend

Don't let users disguise RVars as Vars by @abadams in #8441
Add helper functions to query properties of the lowered Target (#8192) by @steven-johnson in #8359

Hardware backends

Fix injection of GPU buffers that do not go by a Func name (i.e. alloc groups). by @mcourteaux in #8333
Remove vestigial AMDGPU backend by @alexreinking in #8382
Add ARMv8.x feature flags by @steven-johnson in #4489
[vulkan] Fixes to address outstanding validation failures by @derek-gerstmann in #8448
[vulkan] Reduce descriptor sets, use official headers, improve allocator, remove module destructor by @derek-gerstmann in #8452
[vulkan] Skip async_copy_chain and gpu_allocation_cache correctness tests on Windows by @derek-gerstmann in #8503

LLVM

Don't use le32/le64 by @steven-johnson in #8344
Fix for the removed DataLayout constructor. by @mcourteaux in #8391
Drop support for LLVM 16 in main by @steven-johnson in #8358
Allow LLVM 20 by @steven-johnson in #8352
Fix for top-of-tree LLVM by @steven-johnson in #8421
Fix for top-of-tree LLVM by @steven-johnson in #8425
Fix for top-of-tree LLVM by @steven-johnson in #8442
Fix datalayout for osx-arm-64 by @abadams in #8449
Fix top of LLVM. by @mcourteaux in #8454
Replace all use of getPointerTo() with PointerType::get() by @steven-johnson in #8473

Python

Fix Numpy 2.0 compatibility bug in lesson 10 by @alexreinking in #8381
Pip packaging at last! by @alexreinking in #8405
- Update pip package metadata by @alexreinking in #8412
- Fix classifier spelling by @alexreinking in #8413
- Upgrade LLVM to 19.1.0 in pip package by @alexreinking in #8423
- Update PIP LLVM to 19.1.4 by @alexreinking in #8488
PythonExtensionGen: ~PyHalideBuffer should call device_free() (#8399) by @steven-johnson in #8439

Runtime

Fix profiler to report time spent on GPU kernels again instead of on 'wait for parallel tasks'. by @mcourteaux in #8453
Don't spin on the main mutex while waiting for new work by @abadams in #8433

Minor bugfixes / other cleanup

Remove remaining dregs of tuple_select (oops) by @steven-johnson in #8329
Fix incorrect output in Python tutorial, lesson 5 by @qqaatw in #8331
Make pybind11 minimum version check compatible with pybind11 v3. by @rwgk in #8366
Partially apply clang-tidy fixes we don't enforce yet by @abadams in #8376
Fix incorrect std::array sizes in Target.cpp by @steven-johnson in #8396
Fix _Float16 detection on ARM64 GCC<13 by @alexreinking in #8401
Make run-clang-tidy.sh work on macOS by @alexreinking in #8416
Some minor fixes for C++23 compilation errors. by @zvookin in #8422
Fix two warnings on GCC 14.2.1. by @mcourteaux in #8430
Fix typos by @alexreinking in #8459
Fix two trivial build errors by @steven-johnson in #8467
Fix #8470: fuzz_bounds should use select() rather than Select::make() by @steven-johnson in #8471
Add missing #include by @steven-johnson in #8476
Fix heap-use-after-free error in find_best_fit() by @steven-johnson in #8483
Fix typos in comments by @alexreinking in #8485
Remove unused is_update argument by @alexreinking in #8487
Backport reverse_view to clean up some code by @alexreinking in #8486
Remove type inspection helpers from ApplySplitResult and Split by @alexreinking in #8489
Use std::optional to clean up some code and prevent use-after-free bugs by @abadams in #8484
Fix comment for Buffer::copy() (Fixes #8498) by @steven-johnson in #8500
Make the default constructor for ConstantInterval inlinable by @abadams in #8505
Remove two unused functions by @steven-johnson in #8501
Move some large stack frames off recursive paths. by @abadams in #8507

New Contributors

@qqaatw made their first contribution in #8331
@rwgk made their first contribution in #8366

Full Changelog: v18.0.0...v19.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v19.0.0

Major improvements

Breaking changes

What's Changed

Apps and tests

Autoschedulers

Build system

CodeGen

Debugging

Documentation

Frontend

Hardware backends

LLVM

Python

Runtime

Minor bugfixes / other cleanup

New Contributors

Contributors