Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOAIENG-4963: ModelMesh should support TLS in payload processors #65

Merged
merged 2 commits into from
Aug 1, 2024

Conversation

ruivieira
Copy link
Member

Motivation

See RHOAIENG-4963.

Currently, ModelMesh only supports HTTP payload processors (TLS-enable endpoints not supported).
However, the ModelMesh truststore is not available by default when building a TLS-enabled processor.

Modifications

  • This PR adds the https protocol as a recognized one for payload processors
  • If the payload processor is TLS-enabled (https), the ModelMesh truststore will be explicitly loaded in order to be used
  • If the payload processor is not TLS-enabled (http) ModelMesh will proceed as before this PR

Result

The TLS-enabled payload processor is able to use ModelMesh's certificates in the truststore to perform HTTPS requests.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 30, 2024

@ruivieira: This pull request references RHOAIENG-4963 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.

In response to this:

Motivation

See RHOAIENG-4963.

Currently, ModelMesh only supports HTTP payload processors (TLS-enable endpoints not supported).
However, the ModelMesh truststore is not available by default when building a TLS-enabled processor.

Modifications

  • This PR adds the https protocol as a recognized one for payload processors
  • If the payload processor is TLS-enabled (https), the ModelMesh truststore will be explicitly loaded in order to be used
  • If the payload processor is not TLS-enabled (http) ModelMesh will proceed as before this PR

Result

The TLS-enabled payload processor is able to use ModelMesh's certificates in the truststore to perform HTTPS requests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ruivieira
Copy link
Member Author

cc @danielezonca @terrytangyuan

@VedantMahabaleshwarkar VedantMahabaleshwarkar requested review from israel-hdez and removed request for Jooho July 30, 2024 13:34
Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruivieira This is only for MM, right? Do we also have a PR for KServe?

@ruivieira
Copy link
Member Author

@ruivieira This is only for MM, right? Do we also have a PR for KServe?

@terrytangyuan only for MM, yes. For KServe we are still investigating a solution.

@mwaykole
Copy link
Member

hello @ruivieira
could u please add testing instruction , how it can be tested ?

Copy link

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to search how the truststore would get populated. The pointer I was able to find is that it is related to the TLS configuration documented here.

If that's the right configurations, and if my understanding is right, for ModelMesh to support payload processors with TLS, it will be necessary to also configure TLS in ModelMesh itself with the same certificates as the payload processors -- i.e. inference endpoints will be protected under HTTPS using the same certificates as the payload processors.

I'd expect the TLS settings of the processors to be independent from the ModelMesh TLS. May I not be seeing something?

If this is the intended behavior, I can approve as upstream CI is passing. But I'd like to get confirmation that this is how it should be.

@ruivieira
Copy link
Member Author

I was trying to search how the truststore would get populated. The pointer I was able to find is that it is related to the TLS configuration documented here.

If that's the right configurations, and if my understanding is right, for ModelMesh to support payload processors with TLS, it will be necessary to also configure TLS in ModelMesh itself with the same certificates as the payload processors -- i.e. inference endpoints will be protected under HTTPS using the same certificates as the payload processors.

I'd expect the TLS settings of the processors to be independent from the ModelMesh TLS. May I not be seeing something?

If this is the intended behavior, I can approve as upstream CI is passing. But I'd like to get confirmation that this is how it should be.

Hi @israel-hdez

For our use case of the payload processor that is indeed the use case.
In our case, TrustyAI creates an internal Kubernetes service with associated OpenShift Serving Certificates.
When the TrustyAI operator configures ModelMesh (to add this service into the env var MM_PAYLOAD_PROCESSORS), it will also mount the certificates on the ModelMesh pod.

The reason for this PR is that the truststore created by ModelMesh is not available under the default SSL context, unless explicitly loaded.

Copy link

@VedantMahabaleshwarkar VedantMahabaleshwarkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code itself seems fine to me, but I have limited expertise in this area and I have not tested this myself. If @israel-hdez or @terrytangyuan give their lgtm I can +1 that

@israel-hdez
Copy link

In our case, TrustyAI creates an internal Kubernetes service with associated OpenShift Serving Certificates.

@ruivieira I think the implied question is if you do want ModelMesh to protect the models using the same serving certificate as TrustyAI?

I do understand that this will help in trusting the TrustyAI certificate, which is what you need. But unless I'm misunderstanding, infer endpoints will also be using HTTPS using the TrustyAI certificate and this is my confusion.

@ruivieira
Copy link
Member Author

ruivieira commented Jul 31, 2024

if you do want ModelMesh to protect the models using the same serving certificate as TrustyAI?

The purpose would be to use the same certificates as TrustyAI for MM's outgoing requests made by the remote payload processor, not necessarily for infer endpoints.

If we have a TrustyAI service listening to payloads at (say) https://trustyai-service.mm.svc.local, the TrustyAI operator sets MM_PAYLOAD_PROCESSORS=https://trustyai-service.mm.svc.local on the mm container and the mm container will forward payloads from inferences to that endpoint.

IIUC simply mounting the secrets is not enough, they must be specified with MM_TLS_KEY_CERT_PATH so that can be imported in https://github.com/opendatahub-io/modelmesh/blob/main/src/main/scripts/start.sh

The other problem is that the truststore created above is not available in the SSL default context, so it's also loaded explicitly.

In the current PR this will make those certs also be used for infer endpoints, unless we change it.

@israel-hdez
Copy link

the TrustyAI operator sets MM_PAYLOAD_PROCESSORS=https://trustyai-service.mm.svc.local on the mm container and the mm container will forward payloads from inferences to that endpoint.

OK, this part is interesting. Maybe it is the context I am missing.
Would this mean that you somehow inject the MM_PAYLOAD_PROCESSORS to the container?

I was mentioning the TLS documented, because that's the usual case. But if you have some other way of injecting MM_PAYLOAD_PROCESSORS to the container, that will solve my doubts.

@ruivieira
Copy link
Member Author

the TrustyAI operator sets MM_PAYLOAD_PROCESSORS=https://trustyai-service.mm.svc.local on the mm container and the mm container will forward payloads from inferences to that endpoint.

OK, this part is interesting. Maybe it is the context I am missing. Would this mean that you somehow inject the MM_PAYLOAD_PROCESSORS to the container?

I was mentioning the TLS documented, because that's the usual case. But if you have some other way of injecting MM_PAYLOAD_PROCESSORS to the container, that will solve my doubts.

@israel-hdez Just to clarify: the (simplified) TrustyAI/ModelMesh interaction is:

  • TrustyAI operator (TAO) creates a TrustyAI Service (TAS) deployment and internal service (foo-service) in a namespace via a CR
  • TAO watches for InferenceServices in that namespace and if a ModelMesh one exists, it will:
    • Register the TAS as a payload processor (add the MM_PAYLOAD_PROCESSOR=http://foo-service.svc to mm container)

What we want to do is:

  • TAO creates a TAS deployment and internal service (foo-service) in a namespace via a CR (the same as before)
  • TAO watches for InferenceServices in that namespace and if a ModelMesh one exists (and MM is still not configured), it will:
    • Register the TAS as a TLS payload processor (add the MM_PAYLOAD_PROCESSOR=https://foo-service.svc to mm container)
    • Mount the foo-service serving certificates on ModelMesh

We could do this without changes on ModelMesh, but we need this PR because the generated truststore is not available for the payload processor (we get PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target). So this explicitly loading the trustore so that it is available to the payload processor HTTP client.

Copy link

@israel-hdez israel-hdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

openshift-ci bot commented Aug 1, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: israel-hdez, ruivieira

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Aug 1, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 2405dba into opendatahub-io:main Aug 1, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants