Skip to content

Cross Validation, Grid Search and Random Search for TensorFlow 2 Datasets

License

Notifications You must be signed in to change notification settings

franneck94/TensorCross

Repository files navigation

TensorCross

Python License Build codecov Documentation

pip install tensorcross

Grid Search and Random Search with optionally CrossValidation for tf.data.Datasets in TensorFlow (Keras) 2.8+ and Python 3.9+.

Motivation

There was the tf.keras.wrapper.KerasClassifier/KerasRegressor class, which can be used to transform your tf.keras model into a sklearn estimator.
However, this approach is only applicable if your dataset is a numpy.ndarray for your x and y data and it was also removed from newer versions.
If you want to use the new tf.data.Dataset class, you cannot use the sklearn wrappers.
This python package aims to help with this use-case.

API

Dataset and TensorFlow Model for the Examples

    import tensorflow as tf

    dataset = tf.data.Dataset.from_tensor_slices(
        (np.array([1, 2, 3]).reshape(-1, 1),  # x
         np.array([-1, -2, -3]).reshape(-1, 1))  # y
    )

    def build_model(
        optimizer: tf.keras.optimizers.Optimizer,
        learning_rate: float
    ) -> tf.keras.models.Model:
        x_input = tf.keras.layers.Input(shape=2)
        y_pred = tf.keras.layers.Dense(units=1)(x_input)
        model = tf.keras.models.Model(inputs=[x_input], outputs=[y_pred])

        opt = optimizer(learning_rate=learning_rate)

        model.compile(
            loss="mse", optimizer=opt, metrics=["mse"]
        )

        return model

The dataset must be a tf.data.Dataset object and you have to define a function/callable that returns a compiled tf.keras.models.Model object. This object will then be trained in e.g. the GridSearch.

GridSearch Example

Assuming you have a tf.data.Dataset object and a build_model function, defined as above. You can run a GridSearch as below:

    from tensorcross.model_selection import GridSearch

    train_dataset, val_dataset = dataset_split(
        dataset=dataset,
        split_fraction=(1 / 3)
    )

    param_grid = {
        "optimizer": [
            tf.keras.optimizers.Adam,
            tf.keras.optimizers.RMSprop
        ],
        "learning_rate": [0.001, 0.0001]
    }

    grid_search = GridSearch(
        model_fn=build_model,
        param_grid=param_grid,
        verbose=1,
    )

    grid_search.fit(
        train_dataset=train_dataset,
        val_dataset=val_dataset,
        epochs=1,
        verbose=1
    )

    grid_search.summary()

This would result in the following console output:

    --------------------------------------------------
    Best score: 1.1800532341003418 using params: {
        'learning_rate': 0.001, 'optimizer': 'RMSprop'
    }
    --------------------------------------------------
    Idx: 0 - Score: 0.2754371166229248 with param: {
        'learning_rate': 0.001, 'optimizer': 'Adam'
    }
    Idx: 1 - Score: 1.1800532341003418 with param: {
        'learning_rate': 0.001, 'optimizer': 'RMSprop'
    }
    Idx: 2 - Score: 0.055416107177734375 with param: {
        learning_rate': 0.0001, 'optimizer': 'Adam'
    }
    Idx: 3 - Score: 0.12417340278625488 with param: {
        'learning_rate': 0.0001, 'optimizer': 'RMSprop'
    }
    --------------------------------------------------

GridSearchCV Example

Assuming you have a tf.data.Dataset object and a build_model function, defined as above. You can run a GridSearchCV as below:

    from tensorcross.model_selection import GridSearchCV

    param_grid = {
        "optimizer": [
            tf.keras.optimizers.Adam,
            tf.keras.optimizers.RMSprop
        ],
        "learning_rate": [0.001, 0.0001]
    }

    grid_search_cv = GridSearchCV(
        model_fn=build_model,
        param_grid=param_grid,
        n_folds=2,
        verbose=1,
    )

    grid_search_cv.fit(
        dataset=dataset,
        epochs=1,
        verbose=1
    )

    grid_search_cv.summary()

This would result in the following console output:

    --------------------------------------------------
    Best score: 1.1800532341003418 using params: {
        'learning_rate': 0.001, 'optimizer': 'RMSprop'
    }
    --------------------------------------------------
    Idx: 0 - Score: 0.2754371166229248 with param: {
        'learning_rate': 0.001, 'optimizer': 'Adam'
    }
    Idx: 1 - Score: 1.1800532341003418 with param: {
        'learning_rate': 0.001, 'optimizer': 'RMSprop'
    }
    Idx: 2 - Score: 0.055416107177734375 with param: {
        learning_rate': 0.0001, 'optimizer': 'Adam'
    }
    Idx: 3 - Score: 0.12417340278625488 with param: {
        'learning_rate': 0.0001, 'optimizer': 'RMSprop'
    }
    --------------------------------------------------

Callabcks

Adding Callbacks like TensorBoard is easy. Just pass the callbacks list to the .fit mehtod like you would do with a normal keras Model.