-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backend): Support separate artifact repository for each namespace #7219
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
i also need to use the local instead of the kubeflow namespace to generate the minio secret. So it must happen in
Please tell me if i am allowed to move the minio secret settings into that function? |
@jacobmalmberg maybe it fails here pipelines/frontend/src/apis/run/api.ts Line 935 in 2768705
These files are also interesing https://github.com/kubeflow/pipelines/blob/master/frontend/src/components/tabs/MetricsTab.tsx https://github.com/kubeflow/pipelines/blob/master/frontend/src/components/viewers/MetricsVisualizations.tsx Maybe the persistenceagent
|
@juliusvonkohout Entirely possible! I probably won't have time to dig in deeper for the foreseeable future, though. |
@Bobgy gy @jacobmalmberg @ca-scribner
This means one either has to repair this structural deficit or use a global scalable minio instance with multiple buckets and different passwords per bucket. I will test whether the apiserver can get artifacts from different buckets per user. |
@Bobgy @ca-scribner @jacobmalmberg this is my design proposal: We keep the single Minio instance and use bucket name = profile name = namespace name because of the apiserver limitations explained in #7219 (comment). Detailed explanation: pipelines/backend/src/common/util/workflow.go Line 292 in 6b7adfa
This value e.g. 'artifacts/bash-pipeline-cqkb6/2022/01/27/bash-pipeline-cqkb6-310116140/mlpipeline-ui-metadata.tgz' is returned here #
So there is no bucket included, just the postfix
It is set by default to the environment variable OBJECTSTORECONFIG_BUCKETNAME ("mlpipeline")
So we have to change it somehow to use the name of the users profile/namespace (not the user which might contain forbidden letters like @) of the request as bucketname. This elegant solution does not need changes in the python SDK, we can keep the artifact keys without bucket names in the workflow manifest.
This apiserver log shows that the username and profile/namespace name is there for most requests. But of course not for reading artifacts... this is the most important API call that lacks the namespace specification...
We really have to extend the API server artifacts API with the resource_reference_key.type=NAMESPACE and resource_reference_key.id=pavol-bauer. Somehow all other APIs do that properly, so it should be copy and paste. This is because we can share namespaces and e.g. Julius could ask for artifacts from namespace pavol-bauer (if it is shared) and julius-vonkohout. Trying to guess the right namespace/bucket then would be a nightmare. So we just use the namespace ResourceAttribute from the request as bucket name as the other APIs do. If there is no namespace in the request we just use the default bucket from the environment variable (mlpipeline) for backwards compatibility or "shared" pipeline results. This is handled the same way for namespaced pipeline definitions (YAML files) in the API server. We need to slightly modify the persistenceagent too then. It must send the namespace of the workflow in the request to the API server when it wants to check for "mlpipeline-metrics" artifacts to report them later on as run_metrics. At some point in the future also those run metrics should be namespaced the same way we handle namespaced pipeline definitions. At this point we still have to create the buckets, passwords and bucket policies manually. We can merge the kubeflow-pipelines-profile-controller code in this pull request into https://github.com/kubeflow/kubeflow/tree/master/components/profile-controller to get rid of metacontroller with cluster-admin rights and two separate controllers as wanted by @thesuperzapper in #6629 (comment) (the profileresourcetemplate will make it way more simple in the long term). Of course we will ditch the local PVC and MinIO instance for the global one in the kubeflow namespace, but we will keep the namespace specific passwords and change the bucket name. The ml-pipeline-ui will continue to work properly anyway because it uses the artifact proxy in the users namespace. Hopefully we can also get rid of that in the future... We just need to include a routine that creates an S3 bucket, user, policy, quota and password on profile creation and removes it again on profile deletion. The password generation can be obtained from the kubeflow-pipelines-profile-controller code in this pull request. I examined the design proposal with a security advisor. If Alice shares a namespace with bob and then unshares it bob might have extracted and saved the minio, default-editor-token-xxxxx, default-viewer-token-xxxxx and other secrets. Bob could use it to still access alices minio bucket or the whole namespace with the serviceaccount. So ALL automatically (non-user) created secrets have to be deleted on rolebinding changes. Furthermore the minio password must not be deterministic. |
I've only thought about the proposal for a few minutes, but I like it so far. Would be good to have input from someone in the pipelines working group, but this sounds like a good idea to me. |
@kimwnasptd @thesuperzapper @davidspek I will implement this too for this PR. |
this is now handled in #7406 |
Description of your changes:
This fixes #4649
The only limitation is that metrics are not rendered by the UI for whatever reason as shown in #4649 (comment). But MinIO provides them successfully according to the logs and you can download them from the web interface.
@Bobgy @zijianjoy I mainly used the black formatter. If you want a different python style lease tell me.
Checklist: