Hyperimpute length mismatch #41

preritt · 2023-10-28T00:44:59Z

Question

Length mismatch error

Further Information

I am trying to use hyperimpute on my custom data. I am using the following setup:

method = "hyperimpute"
plugin = Imputers().get(method,
                        optimizer = "hyperband",
                           classifier_seed=["logistic_regression", "catboost", "xgboost", "random_forest"],
                            regression_seed=[
                                "linear_regression",
                                "catboost_regressor",
                                "xgboost_regressor",
                                "random_forest_regressor",
                            ], 
                                # class_threshold: int. how many max unique items must be in the column to be is associated with categorical
                            class_threshold=5,
                            # imputation_order: int. 0 - ascending, 1 - descending, 2 - random
                            imputation_order=2,
                            # n_inner_iter: int. number of imputation iterations
                            n_inner_iter=10,
                            # select_model_by_column: bool. If true, select a different model for each column. Else, it reuses the model chosen for the first column.
                            select_model_by_column=True,
                            # select_model_by_iteration: bool. If true, selects new models for each iteration. Else, it reuses the models chosen in the first iteration.
                            select_model_by_iteration=True,
                            # select_lazy: bool. If false, starts the optimizer on every column unless other restrictions apply. Else, if for the current iteration there is a trend(at least to columns of the same type got the same model from the optimizer), it reuses the same model class for all the columns without starting the optimizer.
                            select_lazy=True,
                            # select_patience: int. How many iterations without objective function improvement to wait.
                            select_patience=5,
                            )
# fit it on the data
plugin.fit(traindataSelected.copy())
# predict the missing values
predictedval = plugin.transform(traindataSelected.copy())

My train data has 1000 rows and 372 columns. When I run, I get the following error:

---> [78] predictedval = plugin.transform(traindataSelected.copy())

ValueError: Length mismatch: Expected axis has 368 elements, new values have 372 elements

Can you please let me know if I am missing something or the reason for the error? Is there a way to manually specify which columns should be considered continuous and which ones should be treated as discrete?

Even when I use mean imputer, my predicted data is 368 columns while my original data has 372 columns.

method = "mean"
plugin = Imputers().get(method)
# fit it on the data
plugin.fit(X.copy())
# predict the missing values
predictedval = plugin.transform(X.copy())

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperimpute length mismatch #41

Hyperimpute length mismatch #41

preritt commented Oct 28, 2023 •

edited

Loading

Hyperimpute length mismatch #41

Hyperimpute length mismatch #41

Comments

preritt commented Oct 28, 2023 • edited Loading

Question

Further Information

preritt commented Oct 28, 2023 •

edited

Loading