Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot cast sklearn.feature_extraction.text.TfidfVectorizer to sklearn.Estimator #125

Closed
NikolaevAS89 opened this issue Jan 14, 2019 · 2 comments

Comments

@NikolaevAS89
Copy link

NikolaevAS89 commented Jan 14, 2019

I use:
sckitlearn 0.20.1
python 3.6
jpmml 1.5.9
sklearn2pmml 0.40.0

I try this:

import pandas as pd
import sklearn
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn2pmml.feature_extraction.text import Splitter
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn.externals import joblib

if __name__ == '__main__':
    vect2 = TfidfVectorizer(use_idf=True, ngram_range=(1, 2), min_df=20, tokenizer=Splitter())
    pipeline = PMMLPipeline([("tdidf", vect2)])
    df = pd.read_pickle('df_lemmas.pickle').head(100)
    pipeline.fit_transform(df.TEXT_lemmas.values)
    joblib.dump(pipeline, "pipeline.pkl.z", compress=9)`

then i try convert into pmml format by this:
java -jar jpmml-sklearn-1.5.9.jar --pkl-input pipeline.pkl.z --pmml-output pipeline.pmml

and i catch next exception:

янв 14, 2019 3:12:03 PM org.jpmml.sklearn.Main run
INFO: Parsing PKL..
янв 14, 2019 3:12:03 PM org.jpmml.sklearn.Main run
INFO: Parsed PKL in 31 ms.
янв 14, 2019 3:12:03 PM org.jpmml.sklearn.Main run
INFO: Converting..
янв 14, 2019 3:12:03 PM org.jpmml.sklearn.Main run
SEVERE: Failed to convert
java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.feature_extraction.text.TfidfVectorizer)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
	at org.jpmml.sklearn.TupleUtil.extractElement(TupleUtil.java:48)
	at sklearn2pmml.pipeline.PMMLPipeline.getEstimator(PMMLPipeline.java:535)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:97)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast sklearn.feature_extraction.text.TfidfVectorizer to sklearn.Estimator
	at java.lang.Class.cast(Class.java:3369)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
	... 5 more

Exception in thread "main" java.lang.IllegalArgumentException: Tuple contains an unsupported value (Python class sklearn.feature_extraction.text.TfidfVectorizer)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:43)
	at org.jpmml.sklearn.TupleUtil.extractElement(TupleUtil.java:48)
	at sklearn2pmml.pipeline.PMMLPipeline.getEstimator(PMMLPipeline.java:535)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:97)
	at org.jpmml.sklearn.Main.run(Main.java:145)
	at org.jpmml.sklearn.Main.main(Main.java:94)
Caused by: java.lang.ClassCastException: Cannot cast sklearn.feature_extraction.text.TfidfVectorizer to sklearn.Estimator
	at java.lang.Class.cast(Class.java:3369)
	at org.jpmml.sklearn.CastFunction.apply(CastFunction.java:41)
	... 5 more
@vruusmann
Copy link
Member

The exception happens in method PMMLPipeline#getEstimator(), where the converter is inspecting the last step of the pipeline, and expects to find some Scikit-Learn estimator object there.

You have a Scikit-Learn transformer (TfidfVectorizer) as the last step instead. Such transformation-only pipelines are currently not supported:
jpmml/jpmml-sklearn#86
jpmml/jpmml-evaluator#96

I'm actually waiting for the DMG.org to clarify the "meaning" of transformation-only PMML documents:
http://mantis.dmg.org/view.php?id=228

@vruusmann
Copy link
Member

Currently, this is a usability bug - the PMMLPipeline#getEstimator() method should detect this "the last pipeline step is (a transformer object) not an estimator object" case, and emit a more informative error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants