Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperimpute length mismatch #41

Open
preritt opened this issue Oct 28, 2023 · 0 comments
Open

Hyperimpute length mismatch #41

preritt opened this issue Oct 28, 2023 · 0 comments

Comments

@preritt
Copy link

preritt commented Oct 28, 2023

Question

Length mismatch error

Further Information

I am trying to use hyperimpute on my custom data. I am using the following setup:

method = "hyperimpute"
plugin = Imputers().get(method,
                        optimizer = "hyperband",
                           classifier_seed=["logistic_regression", "catboost", "xgboost", "random_forest"],
                            regression_seed=[
                                "linear_regression",
                                "catboost_regressor",
                                "xgboost_regressor",
                                "random_forest_regressor",
                            ], 
                                # class_threshold: int. how many max unique items must be in the column to be is associated with categorical
                            class_threshold=5,
                            # imputation_order: int. 0 - ascending, 1 - descending, 2 - random
                            imputation_order=2,
                            # n_inner_iter: int. number of imputation iterations
                            n_inner_iter=10,
                            # select_model_by_column: bool. If true, select a different model for each column. Else, it reuses the model chosen for the first column.
                            select_model_by_column=True,
                            # select_model_by_iteration: bool. If true, selects new models for each iteration. Else, it reuses the models chosen in the first iteration.
                            select_model_by_iteration=True,
                            # select_lazy: bool. If false, starts the optimizer on every column unless other restrictions apply. Else, if for the current iteration there is a trend(at least to columns of the same type got the same model from the optimizer), it reuses the same model class for all the columns without starting the optimizer.
                            select_lazy=True,
                            # select_patience: int. How many iterations without objective function improvement to wait.
                            select_patience=5,
                            )
# fit it on the data
plugin.fit(traindataSelected.copy())
# predict the missing values
predictedval = plugin.transform(traindataSelected.copy())

My train data has 1000 rows and 372 columns. When I run, I get the following error:

---> [78] predictedval = plugin.transform(traindataSelected.copy())

ValueError: Length mismatch: Expected axis has 368 elements, new values have 372 elements

Can you please let me know if I am missing something or the reason for the error? Is there a way to manually specify which columns should be considered continuous and which ones should be treated as discrete?

Even when I use mean imputer, my predicted data is 368 columns while my original data has 372 columns.

method = "mean"
plugin = Imputers().get(method)
# fit it on the data
plugin.fit(X.copy())
# predict the missing values
predictedval = plugin.transform(X.copy())

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant