Skip to content
Jim Pivarski edited this page Nov 15, 2023 · 89 revisions

Roadmap

Grand view and history

Awkward Array is a project that started in mid-2018 (with Femtocode and OAMap as predecessors) and has undergone two major revisions.

  • Awkward 0.x (another repo) is now deprecated. It was a pure Python project that relied on NumPy for all array manipulation (not always possible).
  • Awkward 1.x (this repo, main-v1 branch) is also deprecated, though it's still open to critical bug-fixes. It was a C++ re-write of the original, allowing for major changes in the user interface.
  • Awkward 2.x (this repo, main branch) is the current production version of Awkward Array. The previous version was mostly ported from C++ to Python, keeping nearly all of the user interface intact. Some features are now handled by external packages, such as dask-awkward and awkward-pandas.

Awkward 0.x and 1.x were released concurrently under two names (that's why there are vestigial awkward0 and awkward1 packages in PyPI). The transition to one consistent "awkward" package was not smooth and will not be repeated.

Awkward 2.x was tested for over a year as a submodule within Awkward 1.x, awkward._v2. Now that it has been released on its own, the _v2 submodule is deprecated.

Major projects and plans

The following major projects are ongoing:

  • numba.cuda: just as Awkward Arrays can be iterated over in numba.jit functions, they will be iterable in Numba's CUDA backend as well.
  • C++ JIT: just as Awkward Arrays can be iterated over in Numba and RDataFrame (which JIT-compiles C++), they will be iterable in plain cppyy as well.
  • Kaitai Struct → Awkward Array extension compiler: an automated procedure to build Awkward Array-generating Python extensions from Kaitai YAML (through C++).

The following are planned:

  • GPU backend: all ak.* functions will be executable on a GPU.
  • LayoutBuilder in Numba: for building arrays with the same interface as C++ LayoutBuilder, in Numba. (Also pure Python, for testing.)
  • More I/O formats (at least for data input). In particular, XML, CSV with JSON in cells, HDF5 vlen, Arrow flight/IPC/Feather, Zarr.
  • ak.str.* vectorized string operations (using pyarrow's implementations).
  • map-lookup behaviors: implementation of the "__array__": "sorted_map" type.

Deprecation schedule

API-breaking changes after 1.0

Version number Release date Deprecated features removed in this version
1.0.0 2020-12-05 Broadcasting NumPy ufuncs through records (#457), lazy_cache="attach" option in ak.from_parquet (#576).
1.1.0 2021-02-09 Removed ak.to_arrayset/ak.from_arrayset in favor of ak.to_buffers/ak.from_buffers (#592).
1.2.0 2021-04-01 (none)
1.3.0 2021-06-01 (none)
1.4.0 2021-07-02 (none)
1.5.0 2021-09-12 (none)
1.6.0 (skipped) A deprecation was scheduled for "1.7.0, Oct 1, 2021," but we were a version number behind when October came, so the number 1.6.0 was skipped.
1.7.0 2021-12-02 ak.fill_none default axis will be -1. Until then, all uses without an explicit axis raise warnings.
1.8.0 2022-03-01 No support for Python 2 or 3.5. Minimum Python version is 3.6.
1.9.0 2022-09-02 Last new features in 1.x.
1.10.0 2022-09-19 Like 1.9.0, but the minimum Python version is 3.7.

API-breaking changes after 2.0

Deprecations are added immediately as warnings and they describe changes that would result in an error two minor versions in the future. That is, deprecations introduced in 2.0 come into effect in 2.2, so even a deprecation introduced right before the 2.0 → 2.1 transition will have a grace period.

There will be at least one month between each minor version release, so if you keep your copy of awkward up to date, you will have at least one month of warning, but usually more.

(Yes, this is essentially SemVer shifted one decimal place to the right.)

Version number Release date Deprecated features removed in this version
2.0.0 2022‑12‑09 Major release: see release notes.
2.1.0 2023‑03‑07 NumPy 1.17.0 became the minimum version.
2.2.0 2023‑05‑01 The flatten_records argument has been removed from reducers (ak.all, ak.any, ..., ak.var).
The Content.to_numpy() method has been removed., and replaced by Content.to_backend_array(..., backend).
EmptyForm and EmptyArray are prohibited from accepting non-None parameters.
2.3.0 2023‑07‑04 No support for Python 3.7 (end of life: 2023-06-27); minimum Python version is 3.8. ak.forms.length_zero_array's highlevel and behavior arguments will be removed.
2.4.0 2023‑09‑04 typestr argument will be removed from Form objects.
Form.type_from_behavior(...) will be removed in favor of Form.type.
ak.typetracer.empty_if_typetracer will be removed in favor of ak.typetracer.length_zero_if_typetracer.
UnionArray.simplified's merge argument will be removed. (It will always merge. We assume elsewhere that it is merged, so maybe this is more of a bug-fix than a deprecation.)
The built-in string and categorical behaviors will be removed. They're already not used internally in the Awkward codebase anymore.
Version number Target date Deprecated features to remove in this version
2.5.0 2023‑11‑01 The ak.to_categorical function will be removed, in favour of the string-only ak.str.to_categorical. This new function requires pyarrow.
The forget_length argument to ak.typetracer.typetracer_with_report will be removed, and lengths will always be forgotten.
2.6.0 2024‑01‑17
Clone this wiki locally