Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support separate pipeline for each namespace #4197

Open
yaliqin opened this issue Jul 10, 2020 · 61 comments
Open

support separate pipeline for each namespace #4197

yaliqin opened this issue Jul 10, 2020 · 61 comments
Labels
help wanted The community is welcome to contribute. kind/feature lifecycle/frozen

Comments

@yaliqin
Copy link

yaliqin commented Jul 10, 2020

EDIT from @Bobgy: This issue got 47 upvotes when requesting user feedback: #1223 (comment).

Proposed change

Currently with 1.0.2 version, Kubeflow Deployment with kfctl_k8s_istio shares pipelines for all namespaces defined in user profile. Separate pipeline support for each namespace is needed because the multi-user notebook server separation is already supported. It is natural to support pipeline separation.

Alternative options

NA

Who would use this feature?

A lot of enterprises will benefit from this feature as allocating different namespaces to different teams is a common practice in Kubernetes and resources in existing namespaces can be effectively used.

Suggest a solution

NA

@Bobgy
Copy link
Contributor

Bobgy commented Jul 10, 2020

FYI, in existing to be released KF 1.1, pipeline runs are already in user namespaces.

but the static pipeline yaml files are not separated. Do you want this additionally?

@Bobgy
Copy link
Contributor

Bobgy commented Jul 10, 2020

Related to #1223

@Bobgy
Copy link
Contributor

Bobgy commented Jul 10, 2020

There are more clarification in #1223 (comment).

If you agree with the proposal in #1223 (comment), can you rephrase your description to just focus on pipeline resource for this issue.

@yaliqin
Copy link
Author

yaliqin commented Jul 10, 2020

I think the pipeline yaml files should be supported also for E2E separation. Thanks. Already upvote in #1223 .

@stale
Copy link

stale bot commented Oct 10, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Oct 10, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Oct 12, 2020

/frozen

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Oct 12, 2020
@Bobgy Bobgy self-assigned this Oct 12, 2020
@Bobgy Bobgy removed the priority/p0 label Nov 4, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Nov 4, 2020

Sorry, I'm kind of oversubscribed because of extra Kubeflow duties recently, so I may not be able to take this issue soon.

Leaving my previous thoughts about this:

For use-case context: some users want full separation of pipelines, and some want no separation. The distinction is mostly organization culture and I think both requests are valid.

So I think a MVP UX I can imagine that satisfies both is to:

  1. pipelines in DB should have a namespace column
    • if namespace == '' (empty string), then it's in the shared space
    • if namespace is not empty, it's in that namespace
  2. all requests to get pipelines can specify namespace='' to query only shared pipelines
  3. or specify namespace='XXX' to find pipelines in a namespace + shared pipelines
  4. We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines
  5. We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.

Benefits:

  1. The same backend logic seems to be able to support both 'all shared' or 'all separated' use cases.
  2. There's no DB migration needed, this will be a seamless transition from older versions of KFP.

Probably there are cases I haven't fully thought through. I'd suggest anyone who's willing to push this issue forward to do the following:

  1. try to discuss and converge on a set of CUJs (critical user journey) that works for everyone, note we should probably start with MVPs that address immediate concerns, but make sure we can head towards the direction that all valid use-cases can be supported.
  2. after consensus on the UX, discuss with maintainers a rough design of the implementation
  3. contribute PRs to implement this feature in smaller self-contained steps

@Bobgy
Copy link
Contributor

Bobgy commented Nov 4, 2020

/cc @yanniszark
about this

@Bobgy
Copy link
Contributor

Bobgy commented Nov 4, 2020

and
/cc @chensun
who designed multi user backend separation for other KFP resources

@Jeffwan
Copy link
Member

Jeffwan commented Nov 5, 2020

@Bobgy @chensun

We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.

Do we want user to upload their pipeline to share space? Should they use Managed Contributor instead like we did for experiments and runs?

There's no DB migration needed, this will be a seamless transition from older versions of KFP.

Pipeline schema doesn't have namespace concept yet. seamless migration here means implicitly migration, right? for all previous pipeline, we can think they have all empty namespace

@Bobgy
Copy link
Contributor

Bobgy commented Nov 5, 2020

@Bobgy @chensun

We need to add a KFP UI option to switch between uploading a pipeline to shared space or current namespace.

Do we want user to upload their pipeline to share space? Should they use Managed Contributor instead like we did for experiments and runs?

Some customers I worked with prefer this shared space though. A problem with manage contributor is that, they want to take a pipeline from shared space and run it in their own namespace. This will not be possible with adding contributors.

(It might be a good idea, to allow specifying pipelines in other users' namespace and KFP backend checks for that permission like contributors. Is that what you are proposing?)

There's no DB migration needed, this will be a seamless transition from older versions of KFP.

Pipeline schema doesn't have namespace concept yet. seamless migration here means implicitly migration, right? for all previous pipeline, we can think they have all empty namespace

If we do not introduce the shared pipeline concept, then we either need to figure out which pipeline belongs to which namespace or make all past pipelines inaccessible after an upgrade. This is fairly tricky to deal with.

@Jeffwan
Copy link
Member

Jeffwan commented Nov 5, 2020

A problem with manage contributor is that, they want to take a pipeline from shared space and run it in their own namespace. This will not be possible with adding contributors.

Got it, they like to "folk" the pipeline into their own namespace. Managed contributor can only manage pipeline between users but not able to public to all. I think this makes sense.

to allow specifying pipelines in other users' namespace
It might be hard to deal with pipeline discovery (current KFP doesn't have granular control). User may have to provide pipeline name in other user's namespace.

I think what you propose is a simple and better way to support isolated and shared pipelines.

then we either need to figure out which pipeline belongs to which namespace

I notice UI actually determine namespace from centralboard.

window.centraldashboard.CentralDashboardEventHandler.init((cdeh: any) => {
// Binds a callback that gets invoked anytime the Dashboard's
// namespace is changed
cdeh.onNamespaceSelected = (newNamespace: string) => {
namespace = newNamespace;
if (registeredHandler) {
registeredHandler(namespace);
}
};
});
} catch (err) {
.

Do you think it's better to figure out username and corresponding namespace via headers and KFAM?
backend can parse user's email and check kfam to figure out which namespace this profile owns and the namespace other users share with the current user. Then KFP resources can be filtered by these namespaces.

With this we can decouple centraldashboard from KFP. I think multi-user KFP can be used separated with Central dashboard. (probably only need KFAM profile and istio)

@Bobgy
Copy link
Contributor

Bobgy commented Nov 6, 2020

Regarding namespace selector, I think the topic was brought up before too, if we plan to let KFP support multi user mode without central dashboard, we can build a namespace selector similar to the one in centraldashboard. It won't be so much effort on the UI side (it was designed to be very decoupled in UI code from the beginning).

Regarding to KFAM, we have decided to deprecate it, @elikatsis has sent out a PR a few days ago: #4723. Let's move related discussion to that PR, and keep discussion here focused on separating pipeline resources.

@arllanos
Copy link

@Bobgy

  1. pipelines in DB should have a namespace column

Shall we have Namespace in pipeline_versions schema? For example I see Description is in pipelines but not in pipeline_versions. Not sure what is the criteria to decide here, but seems users are not allowed to edit or change pipelines once loaded, right?

  1. We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines

A config option shall allow organizations to enable/disable the possibility pipelines can be public. Where do you think this config can be set (DB, configMap, config.json)?

@Bobgy
Copy link
Contributor

Bobgy commented Nov 19, 2020

@Bobgy

  1. pipelines in DB should have a namespace column

Shall we have Namespace in pipeline_versions schema? For example I see Description is in pipelines but not in pipeline_versions. Not sure what is the criteria to decide here, but seems users are not allowed to edit or change pipelines once loaded, right?

Versions are sub resources of pipelines, so they should be in the same namespace of the pipeline.

Description was more of an unfinished migration to version API, ideally we should have description field on versions too.

  1. We need to add a config option that disables shared pipelines for organizations that definitely do not want shared pipelines

A config option shall allow organizations to enable/disable the possibility pipelines can be public. Where do you think this config can be set (DB, configMap, config.json)?

You should use viper to get this config like other configs in API server, so that it will be configurable via config map, config.json or env var directly.

@juliusvonkohout
Copy link
Member

The next KFP community meeting is on 4/13 5:30PT. (The meeting you were looking at is probably the Kubeflow community meeting on 4/5). Feel free to add an agenda item here: http://bit.ly/kfp-meeting-notes

I hope you can see that
image

Maybe the one day off is the European time difference.

@StefanoFioravanzo
Copy link
Member

StefanoFioravanzo commented Apr 7, 2022

@chensun @james-jwu @zijianjoy Following up from our discussion about namespaced pipelines during last week's community meeting, here is a design doc that details our proposal: https://docs.google.com/document/d/1fM4y2L1IVqVj-iiNjYFRRktdCh7FQXgU2XpaYLaqt-A/edit?usp=sharing&resourcekey=0-kd5loyP7w3PBD0ug6ECmLQ

Note that this doc is about supporting uploading and viewing namespaced pipeline definitions. Dealing with artifacts isolation is not part of this effort and is something @juliusvonkohout is working on. Please take a look and let me know if you agree with the plan! Happy to discuss and iterate.

@juliusvonkohout
Copy link
Member

Following up from our discussion about namespaced pipelines during last week's community meeting, here is a design doc that details our proposal: https://docs.google.com/document/d/1fM4y2L1IVqVj-iiNjYFRRktdCh7FQXgU2XpaYLaqt-A

I had to request access, are you sure that is is properly public?

@StefanoFioravanzo
Copy link
Member

@juliusvonkohout Indeed I had pasted the wrong link, sorry for that. I amended my comment above. Here is the correct link again https://docs.google.com/document/d/1fM4y2L1IVqVj-iiNjYFRRktdCh7FQXgU2XpaYLaqt-A/edit?usp=sharing&resourcekey=0-kd5loyP7w3PBD0ug6ECmLQ

@juliusvonkohout
Copy link
Member

@juliusvonkohout Indeed I had pasted the wrong link, sorry for that. I amended my comment above. Here is the correct link again https://docs.google.com/document/d/1fM4y2L1IVqVj-iiNjYFRRktdCh7FQXgU2XpaYLaqt-A/edit?usp=sharing&resourcekey=0-kd5loyP7w3PBD0ug6ECmLQ

This looks very good. I hope you can merge it soon. The sooner it is merged, the sooner we can modify it for minio namespace isolation.

@StefanoFioravanzo
Copy link
Member

@juliusvonkohout Thanks for the kind words. Let's wait for feedback from the KFP folks :)

@juliusvonkohout
Copy link
Member

What do you think about the following design proposal https://docs.google.com/document/d/1DVHbT1RIv_VaIzWz77YCBYlBX7WEMW61AAm-ZHdWZFU/edit?usp=sharing ? Should i present it at the community call in 9 hours?

I have tried to incorporate some answers to the questions in the community meeting and enabled commenting. You can also reach me via slack.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Apr 29, 2022

Minio and pipeline definition per namespace isolation is planned for the future and was the main talking point of the last pipeline community meeting. James Liu @zijianjoy wants to split it into multiple PRs.

  1. Hopefully moving pipeline definitions/templates from "Minio+MySQL" to "MySQL only" (James Liu Google) and the API change proposal https://docs.google.com/document/d/19OfU-hIsY4xBKA6b_F3dYIbSgrLMrVshucUSsyqotQ4/edit (Diana Atanasova Vmware @difince and @annajung ) are the first two important steps. I even found a quite elegant and not-invasive solution with @difince today.
  2. If this is done i will add the Minio multi-user isolation logic for pipeline artifacts as in https://docs.google.com/document/d/1DVHbT1RIv_VaIzWz77YCBYlBX7WEMW61AAm-ZHdWZFU/edit?usp=sharing and https://github.com/kubeflow/pipelines/pull/7406/files#diff-0b2567187b25771c77e1223dd7ccc6068f599bf663b63423ebb33e19f6d56a32 and discard the rest of this pull request. I will also need to edit sync.py to create the policy, individual users and passwords per profile/namespace, add/change the configmap "kfp-launcher". I am now using one shared bucket but private folders (inside this bucket) per user according to and https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_variables.html. This is even simpler and more compatible with other distributions that use AWS S3 or GCS instead of MinIO and upgrading from older installations. Support policy variable replacement minio/minio#7085 with multiple buckets will be discarded. According to https://argoproj.github.io/argo-workflows/workflow-controller-configmap.yaml i was even able to get rid of per-namespace artifactrepositories configmaps by using workflow.namespace in keyFormat. @zijianjoy you seem to be right regarding the minio licence Add solution to handle Chrome's new Private Network Access update minio/minio#13308 (comment)
  3. Arrikto (@StefanoFioravanzo ) might add the UI changes for the shared/namespaced pipeline definitions/templates too as proposed in support separate pipeline for each namespace #4197 because their UI design is way prettier than mine ;-).
  4. I might even add a fix for the cache-server such that deleted artifacts are not considered successfully cached anymore

In total this would yield a rather pleasant multi-user enterprise-level readiness.
It depends on the speed of all participants if we can get enterprise ready namespace isolation in upstream 1.6 or whether 1.7 is more realistic. I have the Proof of concept working locally in my own Kubeflow instances so far.

The next big step would then to repair the insecure namespace sharing ("Manage contributors" in the UI). Currently it can be easily exploited to access other users namespaces even after they have removed you as contributor or removed you completely from the kubeflow instance (e.g. employee leaving the company). The reason is that serviceaccounts and secrets are not recreated when removing contributors.

@juliusvonkohout
Copy link
Member

I have something elegant and mergeable ready #7725
It is also quite easy to try out independently. If this is accepted i can upstream the UI changes in another merge request.

@yuyanj143
Copy link

Hi there, are namespaced pipelines something the community still wants to support? Can we at least support the namespace param when uploading pipelines and pipeline versions in the python SDK? It would be really useful for our use case

Thanks

@juliusvonkohout
Copy link
Member

Hi there, are namespaced pipelines something the community still wants to support? Can we at least support the namespace param when uploading pipelines and pipeline versions in the python SDK? It would be really useful for our use case

Thanks

You need to ask @StefanoFioravanzo and attend the KFP meeting on Wednesday

@StefanoFioravanzo
Copy link
Member

@hanakawamomo @juliusvonkohout Hi! Please see #8196. The PR include the backend and client changes to support namespaced pipelines.

As soon as that one gets merges, we will open a PR that includes the UI changes described in the doc #4197 (comment)

@elikatsis
Copy link
Member

elikatsis commented Oct 18, 2022

During the preparation of the frontend PR, which as mentioned above depends on merging #8196, we realized a better UX on a specific part.

If you take a look at the Frontend part of the design doc you will notice a screenshot corresponding to the "Upload pipeline or pipeline version" page.

When uploading a new pipeline version, the decision on whether it will be private or shared is implicitly taken and its dependent on the parent pipeline. Thus, the toggle Private/Shared doesn't matter and we should be hiding it. Here's how a user can reach this state:

  1. Pipelines list page -> + Upload pipeline button -> Create a new pipeline version under an existing pipeline toggle -> Choose pipeline
  2. Pipeline details page -> + Upload version (the aforementioned toggle is already selected and the pipeline is already chosen)

However, upon choosing a pipeline to upload a new version for, we can improve the UX dramatically by showing whether it's "Private" or "Shared".

To unlock this better UX, we need to be able to tell whether a pipeline belongs in a namespace or not. For this reason, we decided to also augment the API response for a Pipeline to include the namespace as a resource reference, identically to what happens with Experiments (source).

The change is pretty trivial, we have the PR on its way and I'll edit this comment to link to it: #8375

cc @jlyaoyuli @zijianjoy @chensun @Linchin @juliusvonkohout @StefanoFioravanzo

@juliusvonkohout
Copy link
Member

During the preparation of the frontend PR, which as mentioned above depends on merging #8196, we realized a better UX on a specific part.

If you take a look at the Frontend part of the design doc you will notice a screenshot corresponding to the "Upload pipeline or pipeline version" page.

When uploading a new pipeline version, the decision on whether it will be private or shared is implicitly taken and its dependent on the parent pipeline. Thus, the toggle Private/Shared doesn't matter and we should be hiding it. Here's how a user can reach this state:

  1. Pipelines list page -> + Upload pipeline button -> Create a new pipeline version under an existing pipeline toggle -> Choose pipeline
  2. Pipeline details page -> + Upload version (the aforementioned toggle is already selected and the pipeline is already chosen)

However, upon choosing a pipeline to upload a new version for, we can improve the UX dramatically by showing whether it's "Private" or "Shared".

To unlock this better UX, we need to be able to tell whether a pipeline belongs in a namespace or not. For this reason, we decided to also augment the API response for a Pipeline to include the namespace as a resource reference, identically to what happens with Experiments (source).

The change is pretty trivial, we have the PR on its way and I'll edit this comment to link to it: #8375

cc @jlyaoyuli @zijianjoy @chensun @Linchin @juliusvonkohout @StefanoFioravanzo

That sounds very reasonable

elikatsis added a commit to arrikto/kubeflow-pipelines that referenced this issue Nov 30, 2022


Add a namespace field in the relevant Python wrappers for uploading
namespaced pipeline definitions.
google-oss-prow bot pushed a commit that referenced this issue Dec 1, 2022
…8511)

* feat(backend) Fix authentication in upload requests

Fix the way KFP API server authenticates pipeline upload requests.
We leverage 'isAuthenticated()` function which requires proper
initialization of the context object to include user identity.

* feat(backend): Add namespace field in pipeline upload swagger definition

Extend swagger defintion of the pipeline upload API with a namespace
parameter in order to support uploading namespaced pipelines.

* chore(backend): Generate Go & Python clients

Autogenerate the Go and Python clients after extending the swagger
definitions of upload pipeline APIs with a namespace field.
jlyaoyuli pushed a commit to jlyaoyuli/pipelines that referenced this issue Jan 5, 2023
…ow#4197 (kubeflow#8511)

* feat(backend) Fix authentication in upload requests

Fix the way KFP API server authenticates pipeline upload requests.
We leverage 'isAuthenticated()` function which requires proper
initialization of the context object to include user identity.

* feat(backend): Add namespace field in pipeline upload swagger definition

Extend swagger defintion of the pipeline upload API with a namespace
parameter in order to support uploading namespaced pipelines.

* chore(backend): Generate Go & Python clients

Autogenerate the Go and Python clients after extending the swagger
definitions of upload pipeline APIs with a namespace field.
elikatsis added a commit to arrikto/kubeflow-pipelines that referenced this issue Feb 9, 2023


Add a namespace field in the relevant Python wrappers for uploading
namespaced pipeline definitions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted The community is welcome to contribute. kind/feature lifecycle/frozen
Projects
None yet
Development

Successfully merging a pull request may close this issue.