Skip to content

Releases: google/yggdrasil-decision-forests

Python API 0.9.0

02 Dec 16:02
Compare
Choose a tag to compare

0.9.0 - 2024-12-02

Breaking

  • Classification Label classes are now consistently ordered lexicographically
    (for string labels) or increasingly (for integer labels).
  • Change typo partial_depepence_plot to partial_dependence_plot on
    model.analyze().

Feature

  • Add support for Avro file for path / distributed training with the "avro:"
    prefix.
  • Add support for discretized numerical features for in-memory datasets.
  • Expose MRR for ranking models.
  • Add model.predict_class to generate the most likely predicted class of
    classification models.
  • Add support for automatic feature selection with the feature_selector
    learner constructor argument. See the feature selection tutorial for
    more details.
  • Add standalone prediction evaluation ydf.evaluate_predictions().
  • Add new hyperparameter sparse_oblique_max_num_projections.
  • Add options "POWER_OF_TWO" and "INTEGER" for sparse oblique weights.
  • Emit proper errors when using lists for multi-dimensional features.

Fix

  • Regression and Ranking CEPs scaling corrected.

Release music

The John B. Sails. Traditional

Python API 0.8.0

23 Sep 16:49
Compare
Choose a tag to compare

0.8.0 - 2024-09-23

Breaking

  • Disallow positional parameters for the learners, except for label and task.
  • Remove the unsupported / invalid hyperparameters from the Isolation Forest
    learner.
  • Remove parameters for distributed training and resuming training from
    learners that do not support these capabilities.
  • By default, model.analyze for a maximum of 20 seconds (i.e.
    maximum_duration=20 by default).
  • Convert boolean values in categorical sets to lowercase, matching the
    treatment of categorical features.

Feature

  • Warn if training on a VerticalDataset and fail if attempting to modify the
    columns in a VerticalDataset during training.
  • User can override the model's task, label or group during evaluation.
  • Add num_examples_per_tree() method to Isolation Forest models.
  • Expose the slow engine for debugging predictions and evaluations with
    use_slow_engine=True.
  • Speed-up training of GBT models by ~10%.
  • Support for categorical and boolean features in Isolation Forests.
  • Add ydf.util.read_tf_record and ydf.util.write_tf_record to facilitate
    TF Record datasets usage.
  • Rename LAMBDA_MART_NDCG5 to LAMBDA_MART_NDCG. The old name is deprecated but
    can still be used.
  • Allow configuring the truncation of NDCG losses.
  • Enable multi-threading when using model.predict and model.evaluate.
  • Default number of threads of model.analyze is equal to the number of
    cores.
  • Add multi-threaded results in model.benchmark.
  • Add argument to control the maximum duration of model.analyze.
  • Add support for Unicode strings, normalize categorical set values in the
    same way as categorical values, and validate their types.
  • Add support for distributed training for ranking gradient boosted tree
    models.

Fix

  • Fix labels of regression evaluation plots
  • Improved errors if Isolation Forest training fails.

Release music

Perpetuum Mobile "Ein musikalischer Scherz", Op. 257. Johann Strauss (Sohn)

v1.10.0

21 Aug 19:51
Compare
Choose a tag to compare

1.10.0 - 2024-08-21

Features

  • Add support for Isolation Forests model.
  • The default value of num_candidate_attributes in the CART learner is
    changed from 0 (Random Forest style sampling) to -1 (no sampling). This is
    the generally accepted logic of CART.
  • Added support for GCS for file I/O.

Python API 0.7.0

21 Aug 19:47
Compare
Choose a tag to compare

Python API 0.7.0 - 2024-08-21

Feature

  • Expose validate_hyperparameters() on the learner.
  • Clarify which parameters in the learner are optional.
  • Add support in JAX FeatureEncoder for non-string categorical feature values.
  • Improve performance of Isolation Forests.
  • Models can be serialized/deserialized to/from bytes with model.serialize()
    and ydf.deserialize_model.
  • Models can be pickled safely.
  • Native support for Xarray as a dataset format for all operations (e.g.,
    training, evaluation, predictions).
  • The output of model.to_jax_function can be converted to a TensorFlow Lite
    model.
  • Change the default number of examples to scan when training on files to
    determine the semantic and dictionaries of columns from 10k to 100k.
  • Various improvements of error messages.
  • Evaluation for Anomaly Detection models.
  • Oblique splits for Anomaly Detection models.

Fix

  • Fix parsing of multidimensional ragged inputs.
  • Fix isolation forest hyperparameter defaults.
  • Fix bug causing distributed training to fail on a sharded dataset containing
    an empty shard.
  • Handle unordered categorical sets in training.
  • Fix dataspec ignoring definitions of unrolled columns, such as
    multidimensional categorical integers.
  • Fix error when defining categorical sets for non-ragged multidimensional
    inputs.
  • MacOS: Fix compatibility with other protobuf-using libraries such as
    Tensorflow.

Release music

Rondo Alla ingharese quasi un capriccio "Die Wut über den verlorenen Groschen",
Op. 129. Ludwig van Beethoven

Python API 0.6.0

26 Jul 13:57
Compare
Choose a tag to compare

Feature

  • model.to_jax_function now always outputs a FeatureEncoder to help feeding
    data to the JAX model.
  • The default value of num_candidate_attributes in the CART learner is
    changed from 0 (Random Forest style sampling) to -1 (no sampling). This is
    the generally accepted logic of CART.
  • model.to_tensorflow_saved_model support preprocessing functions which have
    a different signature than the YDF model.
  • Improve error messages when feeding wrong size Numpy arrays.
  • Add option for weighted evaluation in model.evaluate.

Fix

  • Fix display of confusion matrix with floating point weights.

Known issues

  • MacOS build is broken.

Python API 0.5.0

18 Jun 07:20
Compare
Choose a tag to compare

Feature

  • Add support for Isolation Forests model.
  • Add max_depth argument to model.print_tree.
  • Add verbose argument to train method which is equivalent but sometime
    more convenient thanydf.verbose.
  • Add SKLearn to YDF model converter: ydf.from_sklearn.
  • Improve error messages when calling the model with non supported data.
  • Add support for numpy 2.0.

Tutorials

  • Add anomaly detection tutorial.
  • Add YDF and JAX model composition tutorial.

Fix

  • Fix error when plotting oblique trees (model.plot_tree) in colab.

Python API 0.4.3

08 May 13:53
Compare
Choose a tag to compare

Python API - Changelog

Feature

  • Add model.to_jax_function() function to convert a YDF model into a JAX
    function that can be combined with other JAX operations.
  • Print warnings when categorical features look like numbers.
  • Add support for Python 3.12.

Fix

  • Fix cross-validation for non-classification learners.
  • Fix missing ydf/model/tree/plotter.js
  • Solve dependency collision of YDF Proto between PYDF and TF-DF.

Python API 0.4.1

19 Apr 13:21
Compare
Choose a tag to compare

Python API - Changelog

Fix

  • Solve dependency collision to YDF between PYDF and TF-DF. If TF-DF is
    installed after PYDF, importing YDF will fails with a has no attribute 'DType' error.
  • Allow for training on cached TensorFlow dataset.

Python API 0.4.0

12 Apr 20:41
Compare
Choose a tag to compare

Python API - 0.4.0 - 2024-04-10

Feature

  • Multi-dimensional features can be selected / configured with the features=
    training argument.
  • Programmatic access to partial dependence plots and variable importances.
  • Add model.to_tensorflow_function() function to convert a YDF model into a
    TensorFlow function that can be combined with other TensorFlow operations.
    This function is compatible with Keras 2 and Keras 3.
  • Add arguments servo_api=False and feed_example_proto=False for
    model.to_tensorflow_function(mode="tf") to export TensorFlow SavedModel
    following respectively the Servo API and consuming serialized TensorFlow
    Example protos.
  • Add pre_processing and post_processing arguments to the
    model.to_tensorflow_function function to pack pre/post processing
    operations in a TensorFlow SavedModel.

Tutorials

Python API 0.3.0

15 Mar 20:15
Compare
Choose a tag to compare

Python API 0.3.0 - 2024-03-15

Breaking

  • Custom losses now require to provide the gradient, instead of the negative
    of the gradient.
  • Clarified that YDF may modify numpy arrays returned by a custom loss
    function.

Features

  • Allow using Jax for custom loss definitions.
  • Allow setting may_trigger_gc on custom losses.
  • Add support for MHLD oblique decision trees.
  • Expose hyperparameter sparse_oblique_max_num_projections.
  • HTML plots for trees with model.plot_tree().
  • Fix protobuf version to 4.24.3 to fix some incompatibilities when using
    conda.
  • Allow to list compatible engines with model.list_compatible_engines().
  • Allow to choose a fast engine with model.force_engine(...).

Fix

  • Fix slow engine creation for some combination of oblique splits.
  • Improve error message when feeding multi-dimensional labels.

Documentation

  • Clarified documentation of hyperparameters for oblique splits.
  • Fix plots, typos.

Release music

Doctor Gradus ad Parnassum from "Children's Corner" (L. 113). Claude Debussy