Releases: NannyML/nannyml
Releases · NannyML/nannyml
v0.7.0
Changed
- Updated the handling of "leftover" observations when using the
SizeBasedChunker
andCountBasedChunker
.
Renamed the parameter for tweaking that behavior toincomplete
, that can be set tokeep
,drop
orappend
.
Default behavior for both is now to append leftover observations to the last full chunk. - Refactored the
nannyml.drift
module. The intermediate structural level (model_inputs
,model_outputs
,targets
)
has been removed and turned into a single unifiedUnivariateDriftCalculator
. The old built-in statistics have been
re-implemented asMethods
, allowing us to add new methods to detect univariate drift. - Simplified a lot of the codebase (but also complicated some bits) by storing results internally as multilevel-indexed
DataFrames. This means we no longer have to 'convey information' by encoding data column names and method names in
the names of result columns. We've introduced a new paradigm to deal with results. Drill down to the data you really
need by using thefilter
method, which returns a newResult
instance, with a smaller 'scope'. Then turn this
Result
into a DataFrame using theto_df
method. - Changed the structure of the pyproject.toml file due to a Poetry upgrade to version 1.2.1.
Added
- Expanded the
nannyml.io
module with newWriter
implementations:DatabaseWriter
that exports data into multiple
tables in a relational database and thePickleFileWriter
which stores the
pickledResults
on local/remote/cloud disk. - Added a new univariate drift detection method based on the Jensen-Shannon distance.
Used within theUnivariateDriftCalculator
.
Fixed
- Added lightgbm installation instructions to our installation guide.
v0.6.3
Changed
dependencybot
dependency updatesstalebot
setup
Fixed
- CBPE now uses uncalibrated
y_pred_proba
values to calculate realized performance. Fixed for both binary and
multiclass use cases (#98) - Fix an issue where reference data was rendered incorrectly on joy plots
- Updated the 'California Housing' example docs, thanks for the help @NeoKish
- Fix lower confidence bounds and thresholds under zero for regression cases. When the lower limit is set to 0,
the lower threshold will not be plotted. (#127)
v0.6.2
Changed
- Made the
timestamp_column_name
required by all calculators and estimators optional. The main consequences of this
are plots have a chunk-index based x-axis now when no timestamp column name was given. You can also not chunk by
period when the timestamp column name is not specified.
Fixed
- Added missing
s3fs
dependency - Fixed outdated plotting kind constants in the runner (used by CLI)
- Fixed some missing images and incorrect version numbers in the README, thanks @NeoKish!
Added
- Added a lot of additional tests, mainly concerning plotting and the
Runner
class
v0.6.1
Changed
- Use the
problem_type
parameter to determine the correct graph to output when plotting model output drift
Fixed
- Showing the wrong plot title for DLE estimation result plots, thanks @NeoKish
- Fixed incorrect plot kinds in some error feedback for the model output drift calculator
- Fixed missing
problem_type
argument in the Quickstart guide - Fix incorrect visualization of confidence bands on reference data in DLE and CBPE result plots
v0.6.0
Added
- Added support for regression problems across all calculators and estimators.
In some cases a requiredproblem_type
parameter is required during calculator/estimator initialization, this
is a breaking change. Read more about using regression in our
tutorials and about our new performance estimation
for regression using the Direct Loss Estimation (DLE) algorithm.
Changed
- Improved
tox
running speed by skipping some unnecessary package installations.
Thanks @baskervilski!
Fixed
- Fixed an issue where some Pandas column datatypes were not recognized as continuous by NannyML, causing them to be
dropped in calculations. Thanks for reporting @Dbhasin1! - Fixed an issue where some helper columns for visualization crept into the stored reference results. Good catch
@Dbhasin1! - Fixed an issue where a
Reader
instance would raise aWriteException
. Thanks for those eagle eyes
@baskervilski!
v0.5.3
Changed
- We've completely overhauled the way we determine the "stability" of our estimations. We've moved on from determining
a minimumChunk
size to estimating the sampling error for an operation on aChunk
.- A sampling error value will be provided per metric per
Chunk
in the result data for
reconstruction error multivariate drift calculator, all performance calculation metrics and
all performance estimation metrics. - Confidence bounds are now also based on this sampling error and will display a range around an estimation +/- 3
times the sampling error in CBPE and reconstruction error multivariate drift calculator.
Be sure to check out our in-depth documentation
on how it works or dive right into the implementation.
- A sampling error value will be provided per metric per
Fixed
v0.5.2
Changed
- Swapped out ASCII art library from 'art' to 'PyFiglet' because the former was not yet present in conda-forge.
Fixed
- Some leftover parameter was forgotten during cleanup, breaking CLI functionality
- CLI progressbar was broken due to a boolean check with task ID 0.
v0.5.1
Added
- Added simple CLI implementation to support automation and MLOps toolchain use cases. Supports reading/writing to
cloud storage using S3, GCS, ADL, ABFS and AZ protocols. Containerized version available at
dockerhub.
Changed
make clean
now also clears__pycache__
- Fixed some inconsistencies in docstrings (they still need some additional love though)
v0.5.0
Changed
- Replaced the whole Metadata system by a more intuitive approach.
Fixed
v0.4.1
Added
- Added limited support for
regression
use cases: create or extractRegressionMetadata
and use it for drift
detection. Performance estimation and calculation require more research.
Changed
DefaultChunker
splits into 10 chunks of equal size.SizeBasedChunker
no longer drops incomplete last chunk by default, but this is now configurable behavior.