Skip to content

Releases: NannyML/nannyml

v0.7.0

07 Nov 15:11
Compare
Choose a tag to compare

Changed

  • Updated the handling of "leftover" observations when using the SizeBasedChunker and CountBasedChunker.
    Renamed the parameter for tweaking that behavior to incomplete, that can be set to keep, drop or append.
    Default behavior for both is now to append leftover observations to the last full chunk.
  • Refactored the nannyml.drift module. The intermediate structural level (model_inputs, model_outputs, targets)
    has been removed and turned into a single unified UnivariateDriftCalculator. The old built-in statistics have been
    re-implemented as Methods, allowing us to add new methods to detect univariate drift.
  • Simplified a lot of the codebase (but also complicated some bits) by storing results internally as multilevel-indexed
    DataFrames. This means we no longer have to 'convey information' by encoding data column names and method names in
    the names of result columns. We've introduced a new paradigm to deal with results. Drill down to the data you really
    need by using the filter method, which returns a new Result instance, with a smaller 'scope'. Then turn this
    Result into a DataFrame using the to_df method.
  • Changed the structure of the pyproject.toml file due to a Poetry upgrade to version 1.2.1.

Added

  • Expanded the nannyml.io module with new Writer implementations: DatabaseWriter that exports data into multiple
    tables in a relational database and the PickleFileWriter which stores the
    pickled Results on local/remote/cloud disk.
  • Added a new univariate drift detection method based on the Jensen-Shannon distance.
    Used within the UnivariateDriftCalculator.

Fixed

  • Added lightgbm installation instructions to our installation guide.

v0.6.3

22 Sep 13:27
Compare
Choose a tag to compare

Changed

  • dependencybot dependency updates
  • stalebot setup

Fixed

  • CBPE now uses uncalibrated y_pred_proba values to calculate realized performance. Fixed for both binary and
    multiclass use cases (#98)
  • Fix an issue where reference data was rendered incorrectly on joy plots
  • Updated the 'California Housing' example docs, thanks for the help @NeoKish
  • Fix lower confidence bounds and thresholds under zero for regression cases. When the lower limit is set to 0,
    the lower threshold will not be plotted. (#127)

v0.6.2

16 Sep 17:54
Compare
Choose a tag to compare

Changed

  • Made the timestamp_column_name required by all calculators and estimators optional. The main consequences of this
    are plots have a chunk-index based x-axis now when no timestamp column name was given. You can also not chunk by
    period when the timestamp column name is not specified.

Fixed

  • Added missing s3fs dependency
  • Fixed outdated plotting kind constants in the runner (used by CLI)
  • Fixed some missing images and incorrect version numbers in the README, thanks @NeoKish!

Added

  • Added a lot of additional tests, mainly concerning plotting and the Runner class

v0.6.1

09 Sep 15:35
Compare
Choose a tag to compare

Changed

  • Use the problem_type parameter to determine the correct graph to output when plotting model output drift

Fixed

  • Showing the wrong plot title for DLE estimation result plots, thanks @NeoKish
  • Fixed incorrect plot kinds in some error feedback for the model output drift calculator
  • Fixed missing problem_type argument in the Quickstart guide
  • Fix incorrect visualization of confidence bands on reference data in DLE and CBPE result plots

v0.6.0

08 Sep 16:52
Compare
Choose a tag to compare

Added

  • Added support for regression problems across all calculators and estimators.
    In some cases a required problem_type parameter is required during calculator/estimator initialization, this
    is a breaking change. Read more about using regression in our
    tutorials and about our new performance estimation
    for regression using the Direct Loss Estimation (DLE) algorithm.

Changed

  • Improved tox running speed by skipping some unnecessary package installations.
    Thanks @baskervilski!

Fixed

  • Fixed an issue where some Pandas column datatypes were not recognized as continuous by NannyML, causing them to be
    dropped in calculations. Thanks for reporting @Dbhasin1!
  • Fixed an issue where some helper columns for visualization crept into the stored reference results. Good catch
    @Dbhasin1!
  • Fixed an issue where a Reader instance would raise a WriteException. Thanks for those eagle eyes
    @baskervilski!

v0.5.3

30 Aug 12:27
Compare
Choose a tag to compare

Changed

  • We've completely overhauled the way we determine the "stability" of our estimations. We've moved on from determining
    a minimum Chunk size to estimating the sampling error for an operation on a Chunk.
    • A sampling error value will be provided per metric per Chunk in the result data for
      reconstruction error multivariate drift calculator, all performance calculation metrics and
      all performance estimation metrics.
    • Confidence bounds are now also based on this sampling error and will display a range around an estimation +/- 3
      times the sampling error in CBPE and reconstruction error multivariate drift calculator.
      Be sure to check out our in-depth documentation
      on how it works or dive right into the implementation.

Fixed

  • Fixed issue where an outdated version of Numpy caused Pandas to fail reading string columns in some scenarios
    (#93). Thank you, @Bernhard and
    @Gabriel for the investigative work!

v0.5.2

17 Aug 07:26
Compare
Choose a tag to compare

Changed

  • Swapped out ASCII art library from 'art' to 'PyFiglet' because the former was not yet present in conda-forge.

Fixed

  • Some leftover parameter was forgotten during cleanup, breaking CLI functionality
  • CLI progressbar was broken due to a boolean check with task ID 0.

v0.5.1

16 Aug 19:32
Compare
Choose a tag to compare

Added

  • Added simple CLI implementation to support automation and MLOps toolchain use cases. Supports reading/writing to
    cloud storage using S3, GCS, ADL, ABFS and AZ protocols. Containerized version available at
    dockerhub.

Changed

  • make clean now also clears __pycache__
  • Fixed some inconsistencies in docstrings (they still need some additional love though)

v0.5.0

07 Jul 11:41
Compare
Choose a tag to compare

Changed

  • Replaced the whole Metadata system by a more intuitive approach.

Fixed

v0.4.1

19 May 11:53
Compare
Choose a tag to compare

Added

  • Added limited support for regression use cases: create or extract RegressionMetadata and use it for drift
    detection. Performance estimation and calculation require more research.

Changed

  • DefaultChunker splits into 10 chunks of equal size.
  • SizeBasedChunker no longer drops incomplete last chunk by default, but this is now configurable behavior.