Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import the preprocessing process #171

Open
ryanma9629 opened this issue Jun 15, 2023 · 1 comment
Open

Import the preprocessing process #171

ryanma9629 opened this issue Jun 15, 2023 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@ryanma9629
Copy link

How can I encapsulate the preprocessing process into the scoring process as well when registering python models with pzmm? In the pzmm_binary_classification_model_import.ipynb example, only the decision tree/random forest/gradient boosting models are encapsulated into the pickle file, but preprocessing process such as missing imputation, variable encoding are not included.

@ryanma9629 ryanma9629 added the question Further information is requested label Jun 15, 2023
@smlindauer smlindauer self-assigned this Jun 15, 2023
@smlindauer
Copy link
Collaborator

We do not currently have a functional process built in to pzmm for including a preprocessing process inside a pickle file or as additional code within the generated scoring script (although simple imputation is supported through the missing_values argument of the pzmm.ImportModel.import_model() and pzmm.ScoreCode.write_score_code() functions).

Currently, implementation of additional preprocessing this would require modification of the score code that is generated by sasctl and uploaded to SAS Model Manager. On SAS Viya 4, you can to utilize a few different sasctl functions to pull this off (example below), but requires a bit more work in SAS Viya 3.5. This is due to the different behaviors in regard to the creation of DS2 wrapper code:

  • In SAS Viya 4, the wrapper code is regenerated by SAS Model Manager each time the model is published or a score test is attempted. This allows for modification of the Python model assets without worrying about modifying DS2 code.
  • In SAS Viya 3.5, the wrapper code is generated by an API call and then further modified by sasctl to allow for scoring in either MAS or CAS. Modifications to the score code or other Python assets would require additional calls to this API as well as rerunning some of the logic found in pzmm.write_score_code.py. (Further example clarification below.)

Assuming you are providing an additional pickle file that encapsulates the data preprocessing, you will need to upload the new pickle object and adjust the score code. For SAS Viya 4, after running the pzmm.ImportModel.import_model() function to register the model in SAS Model Manager, this would look like:

# Assuming the preprocessing pickle file and original score code are on disk

from pathlib import Path

from sasctl import Session
from sasctl._services.model_repository import ModelRepository as mr

# Create a session to a SAS Viya server
sess = Session("demo.sas.com", "username", "password", protocol="http")

# Visualize API calls
sess.add_stderr_logger(level=20)

# Collect the model to be modified
model_name = "preprocess_model"
project_name = "preprocess_project"
model = mr.list_models(filter=f"and(eq(projectName,'{project_name}'),"
                              f"eq(name,'{model_name}'))")[0]

# Read in the score code and modify in Python (or modify the score code manually)
with open(Path.cwd() / "path/to/score_preprocess_model.py") as score_file:
    score_code = score_file.readlines()

# Modify the score code to preprocess the input_array inside the score function
for index, line in enumerate(score_code):
    if f"{'':8}with open(" in line:
        score_code[index] = f"{'':8}with open(Path(settings.pickle_path) / " \
                            f"\"preprocess.pickle\", \"rb\") as preprocess_file:\n" \
                            f"{'':12}preprocess = pickle.load(preprocess_file)\n" \
                            + score_code[index]
    elif f"with open(" in line:
        score_code[
            index] = f"with open(Path(settings.pickle_path) / \"preprocess." \
                     f"pickle\", \"rb\") as preprocess_file:\n{'':4}preprocess" \
                     f" = pickle.load(preprocess_file)\n" + score_code[index]
    elif "prediction = " in line:
        score_code[index] = f"{'':4}input_array = preprocess(input_array)\n" \
                            + score_code[index]

# Return score code file to a single string form for uploading
score_code = "".join(score_code)

with open(Path.cwd() / "path/to/preprocess.pickle", "rb") as preprocess_file:
    files = [
        {
            "name": "preprocess.pickle",
            "file": preprocess_file,
            "role": "scoreResource"
        },
        {
            "name": "score_model_preprocess.py",
            "file": score_code,
            "role": "score"
        }
    ]
    for file in files:
        mr.add_model_content(model, **file)

For SAS Viya 3.5, you would need upload the new files like above, then delete the *.sas files present in the model assets on SAS Model Manager, and then convert the model and score code to appropriate formats, This would look like the following, assuming the model variable is the RestObj representation of the model and the new score and preprocessing pickle file have already been uploaded:

from sasctl.core import delete
from sasctl._services.model_repository import ModelRepository as mr
from sasctl.pzmm.write_score_code import ScoreCode as sc

# Get the file list and delete all *.sas files
file_list = mr.get_model_contents(mr.get_model(model_name))
file_uri = [mr.get_link(file, "delete")["uri"] for file in file_list if ".sas" in file.name]
[delete(uri) for uri in file_uri]

# Convert the model score code to CAS and MAS focused scripts and convert the model type as needed
model["scoreCodeType"] = "Python"
model = mr.update_model(model)
mr.convert_python_to_ds2(model)
model_contents = mr.get_model_contents(model)
for file in model_contents:
    if file.name == "score.sas":
        mas_code = mr.get(f"models/{file.modelId}/contents/{file.id}/content")
        sc.upload_and_copy_score_resources(model, [{"name": MAS_CODE_NAME, "file": mas_code, "role": "score"}])
        cas_code = sc.convert_mas_to_cas(mas_code, model)
        sc.upload_and_copy_score_resources(model, [{"name": CAS_CODE_NAME, "file": cas_code, "role": "score"}])
        model["scoreCodeType"] = "ds2MultiType"
        mr.update_model(model)
        break

Feel free to submit code to implement this method in a more defined manner. Otherwise, we can add this as an enhancement request for future releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants