Skip to content

Commit

Permalink
Updated documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
vruusmann committed Mar 11, 2024
1 parent 602d1cb commit db53c56
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 3 deletions.
74 changes: 74 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,77 @@
# 0.104.0 #

## Breaking changes

* Updated Scikit-Learn version requirement from `0.18+` to `1.0+`.

This change helps the SkLearn2PMML package to better cope with breaking changes in Scikit-Learn APIs.
The underlying [JPMML-SkLearn](https://github.com/jpmml/jpmml-sklear) library retains the maximum version coverage, because it is dealing with Scikit-Learn serialized state (Pickle/Joblib or Dill), which is considerably more stable.

## New features

* Added support for Scikit-Learn 1.4.X.

The JPMML-SkLearn library had its integration tests rebuilt with Scikit-Learn `1.4.0` and `1.4.1.post1` versions.
All supported transformers and estimators passed cleanly.

See [SkLearn2PMML-409](https://github.com/jpmml/sklearn2pmml/issues/409) and [JPMML-SkLearn-195](https://github.com/jpmml/jpmml-sklearn/issues/195).

* Added support for `BaseHistGradientBoosting._preprocessor` attribute.

This attribute gets initialized automatically if a `HistGradientBoostingClassifier` or `HistGradientBoostingRegressor` estimator is inputted with categorical features.

In Scikit-Learn 1.0 through 1.3 it is necessary to pre-process categorical features manually.
The indices of (ordinally-) encoded columns must be tracked and passed to the estimator using the `categorical_features` parameter:

``` python
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import OrdinalEncoder
from sklearn2pmml.decoration import CategoricalDomain, ContinuousDomain

mapper = DataFrameMapper(
[([cont_col], ContinuousDomain()) for cont_col in cont_cols] +
[([cat_col], [CategoricalDomain(), OrdinalEncoder()]) for cat_col in cat_cols]
)

regressor = HistGradientBoostingRegressor(categorical_features = [...])

pipeline = Pipeline([
("mapper", mapper),
("regressor", regressor)
])
pipeline.fit(X, y)
```

In Scikit-Learn 1.4, this workflow simplifies to the following:

``` python
# Activate full Pandas' support by specifying `input_df = True` and `df_out = True`
mapper = DataFrameMapper(
[([cont_col], ContinuousDomain()) for cont_col in cont_cols] +
[([cat_col], CategoricalDomain(dtype = "category")) for cat_col in cat_cols]
, input_df = True, df_out = True)

# Auto-detect categorical features by their data type
regressor = HistGradientBoostingRegressor(categorical_features = "from_dtype")

pipeline = Pipeline([
("mapper", mapper),
("regressor", regressor)
])
pipeline.fit(X, y)

# Print out feature type information
# This list should contain one or more `True` values
print(pipeline._final_estimator.is_categorical_)
```

## Minor improvements and fixes

* Improved support for `ColumnTransformer.transformers` attribute.

Column selection using dense boolean arrays.


# 0.103.3 #

## Breaking changes
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ This package is a thin Python wrapper around the [JPMML-SkLearn](https://github.

# News and Updates #

The current version is **0.103.3** (3 March, 2024):
The current version is **0.104.0** (10 March, 2024):

```
pip install sklearn2pmml==0.103.3
pip install sklearn2pmml==0.104.0
```

See the [NEWS.md](https://github.com/jpmml/sklearn2pmml/blob/master/NEWS.md#01033) file.
See the [NEWS.md](https://github.com/jpmml/sklearn2pmml/blob/master/NEWS.md#01040) file.

# Prerequisites #

Expand Down

0 comments on commit db53c56

Please sign in to comment.