Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function 'builtins.int' is not supported? #20

Open
liuhuanshuo opened this issue Oct 28, 2022 · 5 comments
Open

Function 'builtins.int' is not supported? #20

liuhuanshuo opened this issue Oct 28, 2022 · 5 comments

Comments

@liuhuanshuo
Copy link

liuhuanshuo commented Oct 28, 2022

Hello Villu,

I am having problems with the sklearn2pmml conversion

Standard output is empty
Standard error:
Exception in thread "main" java.lang.IllegalArgumentException: Function 'builtins.int' is not supported
	at org.jpmml.python.FunctionUtil.encodePythonFunction(FunctionUtil.java:103)
	at org.jpmml.python.FunctionUtil.encodeFunction(FunctionUtil.java:72)
	at org.jpmml.python.ExpressionTranslator.translateFunction(ExpressionTranslator.java:186)
	at org.jpmml.python.ExpressionTranslator.FunctionInvocationExpression(ExpressionTranslator.java:849)
	at org.jpmml.python.ExpressionTranslator.PrimaryExpression(ExpressionTranslator.java:646)
	at org.jpmml.python.ExpressionTranslator.UnaryExpression(ExpressionTranslator.java:594)
	at org.jpmml.python.ExpressionTranslator.MultiplicativeExpression(ExpressionTranslator.java:539)
	at org.jpmml.python.ExpressionTranslator.AdditiveExpression(ExpressionTranslator.java:495)
	at org.jpmml.python.ExpressionTranslator.ComparisonExpression(ExpressionTranslator.java:435)
	at org.jpmml.python.ExpressionTranslator.NegationExpression(ExpressionTranslator.java:390)
	at org.jpmml.python.ExpressionTranslator.LogicalAndExpression(ExpressionTranslator.java:373)
	at org.jpmml.python.ExpressionTranslator.LogicalOrExpression(ExpressionTranslator.java:339)
	at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:320)
	at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:313)
	at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:324)
	at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:313)
	at org.jpmml.python.ExpressionTranslator.translateExpressionInternal(ExpressionTranslator.java:307)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:33)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:22)
	at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:73)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn.pipeline.PipelineTransformer.encodeFeatures(PipelineTransformer.java:65)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.pipeline.FeatureUnion.encodeFeatures(FeatureUnion.java:45)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn_pandas.DataFrameMapper.encodeFeatures(DataFrameMapper.java:67)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:153)
	at com.sklearn2pmml.Main.run(Main.java:91)
	at com.sklearn2pmml.Main.main(Main.java:66)

It seems that the int in the following code is not being used correctly

def make_modify_date_pipeline():
    return make_pipeline(ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8] if len(X[0]) > 0 and int(X[0][0:8]) < 20221230 else '2022-12-30'"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))

Of course, we've talked about this before, and you give tips for better CastTransformer

you should be using the good old CastTransformer instead.

I have upgraded to the latest sklearn2pmml version. What you mean is to change the sklearn version? (this will be an impossible operation, because I am working on the company's notebook and it is not allowed to change the sklearn version!).

Fetch, is there any other form to complete this operation? The reason why I write this is because I cannot compare str with int, so int is needed. If it is pure Python, I have many ways to solve it, but in pipeline, I don't know how to handle it!

@liuhuanshuo
Copy link
Author

I solved this problem with a silly looking piece of code. Instead of comparing str to int, I used int(str) to convert str to int, which, of course, is not supported!

Now I tried the following modification, using str to str comparison,(X[0][0:8] < '20221230') but it worked

def make_modify_date_pipeline():
    return make_pipeline(ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8] if len(X[0]) > 0 and X[0][0:8] < '20221230' else '2022-12-30'"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))

Of course, I'm still curious about how to gracefully convert int to accomplish this task!

@vruusmann
Copy link
Member

Exception in thread "main" java.lang.IllegalArgumentException: Function 'builtins.int' is not supported

The expression translator component of the JPMML-Python library currently does not support using inline Python/Numpy/Pandas/Scipy functions, whose translation would necessitate creating an external function definition (in the form of one or more DerivedField elements).

For example, it's possible to use len(X[0]), because this can be represented as inline Apply@function="length" element.

However, there is currently no way how to represent type casts using the Apply element. And the PMML specification does not contain any special-purpose elements for type casts (changing the data type or the operational type) on-the-fly. There needs to be a standalone DerivedField element, which declares a new field with a new name and type information.

@vruusmann
Copy link
Member

This issue is about a component that lives inside the JPMML-Python library, so moving it there.

@vruusmann vruusmann transferred this issue from jpmml/sklearn2pmml Oct 28, 2022
@vruusmann
Copy link
Member

As for a quick workaround for the example workflow, then I would suggest a two-step approach, where the incoming user input is first kept as int for data validation and sanitization purposes, and when clean, is converted to string for further string manipulation.

def make_modify_date_pipeline():
  int_sanitizer = ExpressionTransformer("X[0] if (pandas.notnull(X[0]) and X[0] > 0 and X[0] < 20221230) else 20221230")
  int2string_caster = CastTransformer(dtype = str)
  str_sanitizer = ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8]")
  return make_pipeline(int_sanitizer, int2string_caster, str_sanitizer)

Alternatively, it would be possible to replace the int_sanitizer transformation with an appropriate sklearn2pmml.decoration.ContinuousDomain decorator:

int_sanitizer = ContinuousDomain(missing_value_treatment = "as_value", missing_value_replacement = 20221230, invalid_value_treatment = "as_missing", outlier_treatment = "as_missing", low_value = 0, high_value = 20221230)

@vruusmann
Copy link
Member

Now I tried the following modification, using str to str comparison,(X[0][0:8] < '20221230') but it worked

My intuition is that string-to-string comparison should NOT be allowed in this place. The natural operational type of strings is categorical, and the main characteristic of categorical values is that they are unordered. Therefore, numeric-like comparisons should not be allowed.

I personally think that string-to-string comparisons (within a Python expression) should raise an error. Maybe if the Python side allows such "hack", then the (J)PMML side should be way more strict here, in order to ensure that there will be no surprises during model deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants