RHOAIENG-4963: ModelMesh should support TLS in payload processors #65

ruivieira · 2024-07-30T11:42:26Z

Motivation

Currently, ModelMesh only supports HTTP payload processors (TLS-enable endpoints not supported).
However, the ModelMesh truststore is not available by default when building a TLS-enabled processor.

Modifications

This PR adds the https protocol as a recognized one for payload processors
If the payload processor is TLS-enabled (https), the ModelMesh truststore will be explicitly loaded in order to be used
If the payload processor is not TLS-enabled (http) ModelMesh will proceed as before this PR

Result

The TLS-enabled payload processor is able to use ModelMesh's certificates in the truststore to perform HTTPS requests.

Signed-off-by: Rui Vieira <[email protected]>

openshift-ci-robot · 2024-07-30T11:42:29Z

@ruivieira: This pull request references RHOAIENG-4963 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.17.0" version, but no target version was set.

In response to this:

Motivation

See RHOAIENG-4963.

Currently, ModelMesh only supports HTTP payload processors (TLS-enable endpoints not supported).
However, the ModelMesh truststore is not available by default when building a TLS-enabled processor.

Modifications

This PR adds the https protocol as a recognized one for payload processors

If the payload processor is TLS-enabled (https), the ModelMesh truststore will be explicitly loaded in order to be used

If the payload processor is not TLS-enabled (http) ModelMesh will proceed as before this PR

Result

The TLS-enabled payload processor is able to use ModelMesh's certificates in the truststore to perform HTTPS requests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

ruivieira · 2024-07-30T11:43:32Z

cc @danielezonca @terrytangyuan

terrytangyuan

@ruivieira This is only for MM, right? Do we also have a PR for KServe?

ruivieira · 2024-07-30T16:19:43Z

@ruivieira This is only for MM, right? Do we also have a PR for KServe?

@terrytangyuan only for MM, yes. For KServe we are still investigating a solution.

mwaykole · 2024-07-30T16:23:07Z

hello @ruivieira
could u please add testing instruction , how it can be tested ?

israel-hdez

I was trying to search how the truststore would get populated. The pointer I was able to find is that it is related to the TLS configuration documented here.

If that's the right configurations, and if my understanding is right, for ModelMesh to support payload processors with TLS, it will be necessary to also configure TLS in ModelMesh itself with the same certificates as the payload processors -- i.e. inference endpoints will be protected under HTTPS using the same certificates as the payload processors.

I'd expect the TLS settings of the processors to be independent from the ModelMesh TLS. May I not be seeing something?

If this is the intended behavior, I can approve as upstream CI is passing. But I'd like to get confirmation that this is how it should be.

ruivieira · 2024-07-30T19:54:57Z

I was trying to search how the truststore would get populated. The pointer I was able to find is that it is related to the TLS configuration documented here.

If that's the right configurations, and if my understanding is right, for ModelMesh to support payload processors with TLS, it will be necessary to also configure TLS in ModelMesh itself with the same certificates as the payload processors -- i.e. inference endpoints will be protected under HTTPS using the same certificates as the payload processors.

I'd expect the TLS settings of the processors to be independent from the ModelMesh TLS. May I not be seeing something?

If this is the intended behavior, I can approve as upstream CI is passing. But I'd like to get confirmation that this is how it should be.

Hi @israel-hdez

For our use case of the payload processor that is indeed the use case.
In our case, TrustyAI creates an internal Kubernetes service with associated OpenShift Serving Certificates.
When the TrustyAI operator configures ModelMesh (to add this service into the env var MM_PAYLOAD_PROCESSORS), it will also mount the certificates on the ModelMesh pod.

The reason for this PR is that the truststore created by ModelMesh is not available under the default SSL context, unless explicitly loaded.

VedantMahabaleshwarkar

the code itself seems fine to me, but I have limited expertise in this area and I have not tested this myself. If @israel-hdez or @terrytangyuan give their lgtm I can +1 that

israel-hdez · 2024-07-30T22:37:03Z

In our case, TrustyAI creates an internal Kubernetes service with associated OpenShift Serving Certificates.

@ruivieira I think the implied question is if you do want ModelMesh to protect the models using the same serving certificate as TrustyAI?

I do understand that this will help in trusting the TrustyAI certificate, which is what you need. But unless I'm misunderstanding, infer endpoints will also be using HTTPS using the TrustyAI certificate and this is my confusion.

ruivieira · 2024-07-31T00:54:44Z

if you do want ModelMesh to protect the models using the same serving certificate as TrustyAI?

The purpose would be to use the same certificates as TrustyAI for MM's outgoing requests made by the remote payload processor, not necessarily for infer endpoints.

If we have a TrustyAI service listening to payloads at (say) https://trustyai-service.mm.svc.local, the TrustyAI operator sets MM_PAYLOAD_PROCESSORS=https://trustyai-service.mm.svc.local on the mm container and the mm container will forward payloads from inferences to that endpoint.

IIUC simply mounting the secrets is not enough, they must be specified with MM_TLS_KEY_CERT_PATH so that can be imported in https://github.com/opendatahub-io/modelmesh/blob/main/src/main/scripts/start.sh

The other problem is that the truststore created above is not available in the SSL default context, so it's also loaded explicitly.

In the current PR this will make those certs also be used for infer endpoints, unless we change it.

israel-hdez · 2024-07-31T16:07:05Z

the TrustyAI operator sets MM_PAYLOAD_PROCESSORS=https://trustyai-service.mm.svc.local on the mm container and the mm container will forward payloads from inferences to that endpoint.

OK, this part is interesting. Maybe it is the context I am missing.
Would this mean that you somehow inject the MM_PAYLOAD_PROCESSORS to the container?

I was mentioning the TLS documented, because that's the usual case. But if you have some other way of injecting MM_PAYLOAD_PROCESSORS to the container, that will solve my doubts.

ruivieira · 2024-07-31T20:09:49Z

the TrustyAI operator sets MM_PAYLOAD_PROCESSORS=https://trustyai-service.mm.svc.local on the mm container and the mm container will forward payloads from inferences to that endpoint.

OK, this part is interesting. Maybe it is the context I am missing. Would this mean that you somehow inject the MM_PAYLOAD_PROCESSORS to the container?

I was mentioning the TLS documented, because that's the usual case. But if you have some other way of injecting MM_PAYLOAD_PROCESSORS to the container, that will solve my doubts.

@israel-hdez Just to clarify: the (simplified) TrustyAI/ModelMesh interaction is:

TrustyAI operator (TAO) creates a TrustyAI Service (TAS) deployment and internal service (foo-service) in a namespace via a CR
TAO watches for InferenceServices in that namespace and if a ModelMesh one exists, it will:
- Register the TAS as a payload processor (add the MM_PAYLOAD_PROCESSOR=http://foo-service.svc to mm container)

What we want to do is:

TAO creates a TAS deployment and internal service (foo-service) in a namespace via a CR (the same as before)
TAO watches for InferenceServices in that namespace and if a ModelMesh one exists (and MM is still not configured), it will:
- Register the TAS as a TLS payload processor (add the MM_PAYLOAD_PROCESSOR=https://foo-service.svc to mm container)
- Mount the foo-service serving certificates on ModelMesh

We could do this without changes on ModelMesh, but we need this PR because the generated truststore is not available for the payload processor (we get PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target). So this explicitly loading the trustore so that it is available to the payload processor HTTP client.

israel-hdez

/lgtm

openshift-ci · 2024-08-01T16:22:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: israel-hdez, ruivieira

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [israel-hdez]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ruivieira added 2 commits July 22, 2024 09:56

Use SSL context/params in RPP for HTTPS

549b450

Signed-off-by: Rui Vieira <[email protected]>

feat: Add SSL context loading

ed8161a

Signed-off-by: Rui Vieira <[email protected]>

openshift-ci-robot added the jira/valid-reference label Jul 30, 2024

openshift-ci bot requested review from Jooho and VedantMahabaleshwarkar July 30, 2024 11:42

VedantMahabaleshwarkar requested review from israel-hdez and removed request for Jooho July 30, 2024 13:34

terrytangyuan reviewed Jul 30, 2024

View reviewed changes

israel-hdez reviewed Jul 30, 2024

View reviewed changes

VedantMahabaleshwarkar reviewed Jul 30, 2024

View reviewed changes

ruivieira mentioned this pull request Jul 31, 2024

Add TLS endpoint for ModelMesh payload processors. trustyai-explainability/trustyai-service-operator#268

Merged

spolti mentioned this pull request Aug 1, 2024

Update SSL context for TLS Payload Processor connections kserve/modelmesh#147

Open

israel-hdez approved these changes Aug 1, 2024

View reviewed changes

openshift-ci bot assigned israel-hdez Aug 1, 2024

openshift-ci bot added the lgtm label Aug 1, 2024

openshift-ci bot added the approved label Aug 1, 2024

openshift-merge-bot bot merged commit 2405dba into opendatahub-io:main Aug 1, 2024
7 checks passed

ruivieira mentioned this pull request Aug 1, 2024

[cherry-pick] feat: Add SSL context loading #66

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHOAIENG-4963: ModelMesh should support TLS in payload processors #65

RHOAIENG-4963: ModelMesh should support TLS in payload processors #65

ruivieira commented Jul 30, 2024

openshift-ci-robot commented Jul 30, 2024 •

edited by openshift-ci bot

Loading

Motivation

Modifications

Result

ruivieira commented Jul 30, 2024

terrytangyuan left a comment

ruivieira commented Jul 30, 2024

mwaykole commented Jul 30, 2024

israel-hdez left a comment

ruivieira commented Jul 30, 2024

VedantMahabaleshwarkar left a comment

israel-hdez commented Jul 30, 2024

ruivieira commented Jul 31, 2024 •

edited

Loading

israel-hdez commented Jul 31, 2024

ruivieira commented Jul 31, 2024

israel-hdez left a comment

openshift-ci bot commented Aug 1, 2024

RHOAIENG-4963: ModelMesh should support TLS in payload processors #65

RHOAIENG-4963: ModelMesh should support TLS in payload processors #65

Conversation

ruivieira commented Jul 30, 2024

Motivation

Modifications

Result

openshift-ci-robot commented Jul 30, 2024 • edited by openshift-ci bot Loading

Motivation

Modifications

Result

ruivieira commented Jul 30, 2024

terrytangyuan left a comment

Choose a reason for hiding this comment

ruivieira commented Jul 30, 2024

mwaykole commented Jul 30, 2024

israel-hdez left a comment

Choose a reason for hiding this comment

ruivieira commented Jul 30, 2024

VedantMahabaleshwarkar left a comment

Choose a reason for hiding this comment

israel-hdez commented Jul 30, 2024

ruivieira commented Jul 31, 2024 • edited Loading

israel-hdez commented Jul 31, 2024

ruivieira commented Jul 31, 2024

israel-hdez left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Aug 1, 2024

openshift-ci-robot commented Jul 30, 2024 •

edited by openshift-ci bot

Loading

ruivieira commented Jul 31, 2024 •

edited

Loading