Releases: zarr-developers/VirtualiZarr
v1.2.0
This release brings a stricter internal model for manifest paths, support for appending to existing icechunk stores, an experimental non-kerchunk-based HDF5 reader, handling of nested groups in DMR++ files, as well as many other bugfixes and documentation improvements.
What's Changed
- FAQ updates by @TomNicholas in #266
- Import top-level version of xarray classes by @TomNicholas in #267
- Update index.md by @thodson-usgs in #275
- Remove unused ManifestBackendArray class by @TomNicholas in #282
- Fix bug in RT of parquet detection by @norlandrhagen in #278
- Search for coord_names in separate_coords by @ayushnag in #191
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #283
- Bump minimum Xarray dependency to 2024.10.0 by @TomNicholas in #284
- Dont write _ARRAY_DIMENSIONS to icechunk by @TomNicholas in #286
- Fix release notes for v1.1.0 by @TomNicholas in #288
- dmrpp root and nested group parsing fix by @ayushnag in #265
- Update README.md by @joshmoore in #294
- Remove numcodecs specific install by @mpiannucci in #301
- Update contributors guide by @TomNicholas in #298
- Clarify which features are currently available in FAQ by @TomNicholas in #296
- Fix sphinx warnings by @maxrjones in #300
- Update pkg install in docs contribution guide by @douglatornell in #304
- Support downstream type checking by including py.typed by @maxrjones in #306
- Non-kerchunk backend for HDF5/netcdf4 files. by @sharkinsspatial in #87
- Release note for #87 by @TomNicholas in #307
- Add status badges to README by @douglatornell in #303
- Expand xarray openable type hint with ReadBuffer by @TomNicholas in #316
- Consolidate hdf reader tests into their own tests module by @TomNicholas in #314
- Refactor kerchunk reader tests to call open_virtual_dataset by @TomNicholas in #317
- Add virtual_backend_kwargs argument to open_virtual_dataset by @TomNicholas in #315
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #322
- Refactor dmrpp tests to expose data file path by @TomNicholas in #323
- Paths as URIs by @TomNicholas in #243
- Add list of previous talks to readme by @TomNicholas in #313
- Correct some documentation links to function the public API by @TomNicholas in #325
- Ignore spontaneous mypy errors for h5py classes. ref #324 by @sharkinsspatial in #328
- Append to icechunk stores by @abarciauskas-bgse in #272
- Add new page with links to example notebooks by @TomNicholas in #331
- Release summary for v1.2.0 by @TomNicholas in #332
New Contributors
- @joshmoore made their first contribution in #294
- @douglatornell made their first contribution in #304
- @sharkinsspatial made their first contribution in #87
Full Changelog: v1.1.0...v1.2.0
v1.1.0
This release adds Icechunk support!!
It also brings a complete refactoring of the system of readers and writers internally, which allowed us to make Kerchunk an optional dependency. There are also many other bugfixes and smaller improvements.
What's Changed
- xr.testing with ManifestArray fix (update isnan ufunc) by @ayushnag in #188
- Clarify that virtualizarr is a user-level replacement for kerchunk by @TomNicholas in #192
- Exclude empty
paths
onChunkDict
creation by @ghidalgo3 in #198 - Extend refspec support to [path] entries (without offset/length) by @maresb in #187
- Conformant ZarrV3 codecs and fill values by @ghidalgo3 in #193
- https access fix by @ayushnag in #196
- Handle scalar dataset variables by @ghidalgo3 in #205
- Set ZArray default fill_value as NaT for datetime64 by @thodson-usgs in #206
- Update .pre-commit-config mypy + bump ruff version by @norlandrhagen in #211
- Update static typing by @TomAugspurger in #213
- Adds concurrency to CI w/ cancel-in-progress=True by @norlandrhagen in #214
- Implement pydantic models as dataclasses by @TomAugspurger in #210
- use the theme options for
pydata_sphinx_theme
by @keewis in #223 - Removes default storage options by @norlandrhagen in #228
- open_virtual_dataset with dmr++ by @ayushnag in #113
- Internal refactor to separate reading and writing concerns by @TomNicholas in #231
- Let Xarray handle
decode_times
by @norlandrhagen in #232 - Support specifying single HDF Group in open_virtual_dataset by @scottyhq in #165
- Adds defaults in
open_virtual_dataset_from_v3_store
by @norlandrhagen in #234 - Virtualizarr + Coiled Serverless Example Notebook by @norlandrhagen in #233
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #236
- Add example to create a virtual dataset using lithops by @thodson-usgs in #203
- Update backend.py (tiny typo) by @mdsumner in #240
- Makes mypy a seperate CI job by @norlandrhagen in #254
- Fix mypy errors around numpy functions not being strictly type hinted by @TomNicholas in #252
- Allow
open_virtual_dataset
to read existing Kerchunk references by @norlandrhagen in #251 - Skip tests that require kerchunk by @TomNicholas in #259
- allow creating references for empty archival datasets by @keewis in #260
- Split kerchunk reader up by @TomNicholas in #261
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #250
- Add CI job for testing upstream versions of dependencies by @TomNicholas in #264
- Add Icechunk Support by @mpiannucci in #256
New Contributors
- @ayushnag made their first contribution in #188
- @ghidalgo3 made their first contribution in #198
- @maresb made their first contribution in #187
- @thodson-usgs made their first contribution in #206
- @keewis made their first contribution in #223
- @mdsumner made their first contribution in #240
- @mpiannucci made their first contribution in #256
Full Changelog: v1.0.0...v1.1.0
v1.0.0
This release marks VirtualiZarr as mostly feature-complete, in the sense of achieving feature parity with kerchunk's logic for combining datasets, providing an easier way to manipulate kerchunk references in memory and generate kerchunk reference files on disk.
Future VirtualiZarr development will focus on generalizing and upstreaming useful concepts into the Zarr specification, the Zarr-Python library, Xarray, and possibly some new packages. See the roadmap in the documentation for details.
What's Changed
- Hypothesis test broadcasting by @TomNicholas in #139
- Empty release notes for v0.2 by @TomNicholas in #145
- Mark tests which require network access by @TomNicholas in #144
- Install dependencies for tests via mamba by @maxrjones in #148
- Use default version scheme for setuptools_scm by @maxrjones in #149
- Use 3 numpy arrays for manifest internally by @TomNicholas in #107
- Rename paths in manifest by @TomNicholas in #152
- Ensure _ARRAY_DIMENSIONS get dropped from attrs by @TomNicholas in #153
- Ensure attributes on coordinate variables are preserved during round-tripping by @TomNicholas in #154
- Identify non dimension coords by @TomNicholas in #156
- Also test exporting references to in-memory kerchunk reference dict by @TomNicholas in #158
- Use magic bytes to identify file formats by @scottyhq in #143
- Decoding
cftime_variables
by @jsignell in #122 - Fix opening tiff and fits files by @TomNicholas in #162
- Clarify that virtual datasets are not normal xarray datasets by @TomNicholas in #173
- Warn on index creation by @TomNicholas in #170
- Update roadmap for v1.0 by @TomNicholas in #164
- Add example of using cftime_variables to usage docs by @TomNicholas in #174
- Future-proof offset and size records in chunkmanifest by @moradology in #177
- Use a set to avoid duplicate var names from kerchunk by @moradology in #179
- v1.0 release notes by @TomNicholas in #181
New Contributors
- @scottyhq made their first contribution in #143
- @moradology made their first contribution in #177
Full Changelog: v0.1.0...v1.0
v0.1.0
The first release of VirtualiZarr!
This release presents the basic MVP of this library, including the ability to inspect netCDF4/HDF5 files, store the byte ranges in an xarray.Dataset
via ManifestArray
objects, concatenate those objects, then serialize the result to disk as kerchunk-formatted reference files.
Expect more features and significant optimizations soon.
What's Changed
- Xarray accessor to create kerchunk reference dict by @TomNicholas in #28
- Equality checking by @TomNicholas in #30
- Support xarray concat, including broadcasting by @TomNicholas in #34
- CI for running tests by @TomNicholas in #36
- Roughout of Sphinx Docs by @norlandrhagen in #27
- Updated doc.yml to include pip by @norlandrhagen in #40
- Adds netCDF3 vs netCDF4 distinction to _automatically_determine_filetype. by @norlandrhagen in #43
- Test concat of dimension coordinate not backed by an index by @TomNicholas in #44
- Rename open_dataset_via_kerchunk to open_virtual_dataset by @TomNicholas in #47
- Narrative docs by @TomNicholas in #48
- More narrative docs by @TomNicholas in #50
- Update CI checks with ruff by @norlandrhagen in #54
- open_virtual_dataset with and without indexes by @TomNicholas in #52
- API docs by @TomNicholas in #56
- Updated NetCDF IO path by @norlandrhagen in #55
- Created conftest.py and moved two fixtures into conftest by @norlandrhagen in #57
- Ab/filters dtype by @abarciauskas-bgse in #66
- Switching netcdf3 & netcdf4 filetype detection to file magic 🧙 by @norlandrhagen in #64
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #70
- Load selected variables instead of making them virtual by @TomNicholas in #69
- pin_kerchunk_0.2.2 by @norlandrhagen in #75
- Remove python 3.12 from CI matrix by @TomNicholas in #76
- Convert user defined filetype to FileType by @norlandrhagen in #79
- FAQ page by @TomNicholas in #81
- Try to remove sidebar in docs by @TomNicholas in #82
main.yml
CI installs from pyproject.toml by @norlandrhagen in #90- Update installation instructions by @jbusecke in #91
- Write manifests to zarr store by @TomNicholas in #45
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #101
- Bump Ruff version and add formatting by @jbusecke in #98
- Opening 0D scalars by @TomNicholas in #102
- Fix bug with expand dims of a scalar array by @TomNicholas in #103
- Install xarray from main by @jsignell in #106
- Unpin kerchunk (set floor) and enable Python 3.12 by @jsignell in #108
- Depend on latest version of xarray by @TomNicholas in #109
- Remove python 3.9 by @jsignell in #112
- Adding
reader_options
kwargs to open_virtual_dataset. by @norlandrhagen in #67 - Write to parquet by @jsignell in #110
- Test fsspec roundtrip by @TomNicholas in #42
- Inline loaded variables into kerchunk references by @TomNicholas in #73
- Release notes page by @TomNicholas in #120
- requires-python = ">=3.10" by @abarciauskas-bgse in #127
- Pass args and add test by @abarciauskas-bgse in #128
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #129
- Allow other fsspec protocols than local and s3 by @TomAugspurger in #126
- Add dunder version to top-level init.py by @maxrjones in #133
- Add release workflow by @maxrjones in #136
- Only run distribution workflow on releases by @maxrjones in #140
- change default reader_options to None by @TomNicholas in #137
- Replace np.NaN with np.nan in preparation for numpy 2.0 by @TomNicholas in #138
New Contributors
- @TomNicholas made their first contribution in #28
- @abarciauskas-bgse made their first contribution in #66
- @pre-commit-ci made their first contribution in #70
- @jbusecke made their first contribution in #91
- @jsignell made their first contribution in #106
- @TomAugspurger made their first contribution in #126
- @maxrjones made their first contribution in #133
Full Changelog: https://github.com/zarr-developers/VirtualiZarr/commits/v0.1