Allow option to use DataGeometry objects à la scikit-learn pipelines #227

paxtonfitzpatrick · 2019-12-16T17:20:55Z

Currently, if you want to repeatedly transform text samples with hypertools.tools.format_data() using the same parameters, the function re-fits both the vectorizer and text model on each call. This ends up being fairly inefficient, and for expensive/numerous operations, makes working directly with the underlying sklearn classes the better option.

We could add an argument to return the fit models for reuse, but a really nice feature would be something like a scikit-learn Pipeline object that you could create, fit, save, and reuse to perform various processing steps with a single call. This would also be a very attractive feature for hypertools, since it could also additionally implement methods like .plot() and .describe().

The text was updated successfully, but these errors were encountered:

paxtonfitzpatrick added enhancement wish list labels Dec 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow option to use DataGeometry objects à la scikit-learn pipelines #227

Allow option to use DataGeometry objects à la scikit-learn pipelines #227

paxtonfitzpatrick commented Dec 16, 2019

Allow option to use DataGeometry objects à la scikit-learn pipelines #227

Allow option to use DataGeometry objects à la scikit-learn pipelines #227

Comments

paxtonfitzpatrick commented Dec 16, 2019