Releases: NannyML/nannyml
Releases · NannyML/nannyml
v0.10.0
Changed
- Telemetry now detects AKS and EKS and NannyML Cloud runtimes. (#325)
- Runner was refactored, so it can be extended with premium NannyML calculators and estimators. (#325)
- Sped up telemetry reporting to ensure it doesn't hinder performance.
- Some love for the docs as @santiviquez tediously standardized variable names. (#338)
- Optimize calculations for L-infinity method. [(#340)]
- Refactored the
CalibratorFactory
to align with our other factory implementations. [(#341)] - Updated the
Calibrator
interface with*args
and**kwargs
for easier extension. - Small refactor to the
ResultComparisonMixin
to allow easier extension.
Added
- Added support for directly estimating the confusion matrix of multiclass classification models using CBPE.
Big thanks to our appreciated alumnus @cartgr for the effort (and sorry it took soooo long). (#287) - Added
DatabaseWriter
support for results fromMissingValuesCaclulator
andUnseenValuesCalculator
. Some
excellent work by @bgalvao, thanks for being a long-time user and supporter!
Fixed
- Fix issues with calculation and filtering in performance calculation and estimation. (#321)
- Fix multivariate reconstruction error plot labels. (#323)
- Log a warning when performance metrics for a chunk will return
NaN
value. (#326) - Fix issues with ReadTheDocs build failing
- Fix erroneous
specificity
calculation, both realized and estimated. Well spotted @nikml! (#334) - Fix threshold computation when dealing with
NaN
values. Major thanks to the eagle-eyed @giodavoli. (#333) - Fix exports for confusion matrix metrics using the
DatabaseWriter
. An inspiring commit that lead to some other changes.
Great job @shezadkhan137! (#335) - Fix incorrect normalization for the business value metric in realized and estimated performance. (#337)
- Fix handling
NaN
values when fitting univariate drift. [(#340)]
v0.9.1
Changed
- Updated Mendable client library version to deal with styling overrides in the RTD documentation theme
- Removed superfluous limits for confidence bands in the CBPE class (these are present in the metric classes instead)
- Threshold value limiting behaviour (e.g. overriding a value and emitting a warning) will be triggered not only when
the value crosses the threshold but also when it is equal to the threshold value. This is because we interpret the
threshold as a theoretical maximum.
Added
- Added a new example notebook walking through a full use case using the NYC Green Taxi dataset, based on the blog of @santiviquez
Fixed
- Fixed broken Docker container build due to changes in public Poetry installation procedure
- Fixed broken image source link in the README, thanks @NeoKish!
v0.9.0
Changed
- Updated API docs for the
nannyml.io
package, thanks @maciejbalawejder (#286) - Restricted versions of
numpy
to be<1.25
, since there seems to be a change in theroc_auc
calculation somehow (#301)
Added
- Support for Data Quality calculators in the CLI runner
- Support for Data Quality results in
Ranker
implementations (#297) - Support
mendable
in the docs (#295) - Documentation landing page (#303)
- Support for calculations with delayed targets (#306)
Fixed
- Small changes to quickstart, thanks @NeoKish (#291)
- Fix an issue passing
*args
and**kwargs
inResult.filter()
and subclasses (#298) - Double listing of the binary dataset documentation page
- Add missing thresholds to
roc_auc
inCBPE
(#294) - Fix plotting issue due to introduction of additional values in the 'display names tuple' (#305)
- Fix broken exception handling due to inheriting from
BaseException
and notException
(#307)
v0.8.6
Changed
- Significant QA work on all the documentation, thanks @santiviquez and
@maciejbalawejder - Reworked the
nannyml.runner
and the accompanying configuration format to improve flexibility (e.g. setting
custom initialization parameters, running a calculator multiple times, excluding a calculator, ...). - Added support for custom thresholds to the
nannyml.runner
- Simplified some of the
nannyml.io
interfaces, especially thenannyml.io.RawFilesWriter
- Reworked the
nannyml.base.Result
- Totally revamped quickstart documentation based on a real life dataset, thanks @jakubnml
Added
- Added new calculators to support simple data quality metrics such as counting missing or unseen values.
For more information, check out the data quality tutorials.
Fixed
- Fixed an issue where x-axis titles would appear on top of plots
- Removed erroneous checks during calculation of realized regression performance metrics. (#279)
- Fixed an issue dealing with
az://
URLs in the CLI, thanks @michael-nml (#283)
v0.8.5
Changed
- Applied new rules for visualizations. Estimated values will be the color indigo and represented with a dashed line.
Calculated values will be blue and have a solid line. This color coding might be overridden in comparison plots.
Data periods will no longer have different colors, we've added some additional text fields to the plot to indicate the data period. - Cleaned up legends in plots, since there will no longer be a different entry for reference and analysis periods of metrics.
- Removed the lower threshold for default thresholds of the KS and Wasserstein drift detection methods.
Added
- We've added the
business_value
metric for both estimated and realized binary classification performance. It allows
you to assign a value (or cost) to true positive, true negative, false positive and false negative occurrences.
This can help you track something like a monetary value or business impact of a model as a metric. Read more in the
business value tutorials (estimated
or realized)
or the how it works page.
Fixed
- Sync quickstart of the README with the dedicated quickstart page. (#256)
Thanks @NeoKish! - Fixed incorrect code snippet order in the thresholding tutorial. (#258)
Thanks once more to the one and only @NeoKish! - Fixed broken container build that had sneakily been going on for a while
- Fixed incorrect confidence band color in comparison plots (#259)
- Fixed incorrect titles and missing legends in comparison plots (#264)
- Fixed an issue where numerical series marked as category would cause issues during Chi2 calculation
v0.8.4
Changed
- Updated univariate drift methods to no longer store all reference data by default (#182)
- Updated univariate drift methods to deal better with missing data (#202)
- Updated the included example datasets
- Critical security updates for dependencies
- Updated visualization of multi-level table headers in the docs (#242)
- Improved typing support for Result classes using generics
Added
- Support for estimating the confusion matrix for binary classification (#191)
- Added
treat_as_categorical
parameter to univariate drift calculator (#239) - Added comparison plots to help visualize two different metrics at once
Fixed
- Fix missing confidence boundaries in some plots (#193)
- Fix incorrect metric names on plot y-axes (#195)
- Fix broken links to external docs (#196)
- Fix missing display name to performance calculation and estimation charts (#200)
- Fix missing confidence boundaries for single metric plots (#203)
- Fix incorrect code in example notebook for ranking
- Fix result corruption when re-using calculators (#206)
- Fix unintentional period filtering (#199)
- Fixed some typing issues (#213)
- Fixed missing data requirements documentation on regression (#215)
- Corrections in the glossary (#214), thanks @sebasmos!
- Fix missing treshold in plotting legend (#219)
- Fix missing annotation in single row & column charts (#221)
- Fix outdated performance estimation and calculation docs (#223)
- Fix categorical encoding of unseen values for DLE (#224)
- Fix incorrect legend for None timeseries (#235)
v0.8.3
Added
- Added some extra semantic methods on results for easy property access. No dealing with multilevel indexes required.
- Added functionality to compare results and plot that comparison. Early release version.
Fixed
- Pinned Sphinx version to 4.5.0 in the documentation requirements.
Version selector, copy toggle buttons and some styling were broken on RTD due to unintended usage of Sphinx 6 which
treats jQuery in a different way.
v0.8.2
Changed
- Log Ranker usage logging
- Remove some redundant parameters in
plot()
function calls for data reconstruction results, univariate drift results,
CBPE results and DLE results. - Support "single metric/column" arguments in addition to lists in class creation (#165)
- Fix incorrect 'None' checks when dealing with defaults in univariate drift calculator
- Multiple updates and corrections to the docs (thanks @nikml!), including:
- Updating univariate drift tutorial
- Updating README
- Update PCA: How it works
- Fix incorrect plots
- Fix quickstart (#171)
- Update chunker docstrings to match parameter names, thanks @mrggementiza!
- Make sequence 'None' checks more readable, thanks @mrggementiza!
- Ensure error handling in usage logging does not cause errors...
- Start using
OrdinalEncoder
instead ofLabelEncorder
in DLE. This allows us to deal with "unseen" values in the
analysis period.
Added
- Added a Store to provide persistence for objects. Main use case for now is storing fitted calculators to be reused
later without needing to fit on reference again. Current store implementation uses a local or remote filesystem as a
persistence layer. Check out the documentation on persisting calculators.
Fixed
- Fix incorrect interpretation of
y_pred
column as continuous values for the included sample binary classification data.
Converting the column explicitly to "category" data type for now, update of the dataset to follow soon.
(#171) - Fix broken image link in README, thanks @mrggementiza!
- Fix missing key in the CLI section on raw files output, thanks @CoffiDev!
- Fix upper and lower thresholds for data reconstruction being swapped (#179)
- Fix stacked bar chart plots (missing bars + too many categories shown)
v0.8.1
Changed
- Thorough refactor of the
nannyml.drift.ranker
module. The abstract base class and factory have been dropped in favor
of a more flexible approach. - Thorough refactor of our Plotly-based plotting modules. These have been rewritten from scratch to make them more
modular and composable. This will allow us to deliver more powerful and meaningful visualizations faster.
Added
- Added a new univariate drift method. The
Hellinger distance
, used for continuous variables. - Added an extensive write-up on when to use which univariate drift method.
- Added a new way to rank the results of univariate drift calculation. The
CorrelationRanker
ranks columns based on
the correlation between the drift value and the change in realized or estimated performance. Read all about it in the
ranking documentation
Fixed
- Disabled usage logging for or GitHub workflows
- Allow passing a single string to the
metrics
parameter of theresult.filter()
function, as per special request.
v0.8.0
Changed
- Updated
mypy
to a new version, immediately resulting in some new checks that failed.
Added
- Added new univariate drift methods. The
Wasserstein distance
for continuous variables,
and theL-Infinity distance
for categorical variables. - Added usage logging to our key functions. Check out the docs to find out more on what, why, how, and how to
disable it if you want to.
Fixed
- Fixed and updated various parts of the docs, reported at warp speed! Thanks @NeoKish!
- Fixed
mypy
issues concerning 'implicit optionals'.