Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeflow pipeline add support of postgresql #7512

Open
yiyuanyu17 opened this issue Apr 2, 2022 · 25 comments
Open

kubeflow pipeline add support of postgresql #7512

yiyuanyu17 opened this issue Apr 2, 2022 · 25 comments
Labels
help wanted The community is welcome to contribute. kind/feature lifecycle/frozen

Comments

@yiyuanyu17
Copy link

yiyuanyu17 commented Apr 2, 2022

Feature Area

What feature would you like to see?

kubeflow pipeline add support of postgresql

What is the use case or pain point?

for some case , we can not use mysql for kubeflow pipeline , hope kubeflow pipeline can add the suppoort of postgresql

Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

@zijianjoy
Copy link
Collaborator

Hello @yiyuanyu17 , can you help us understand what is the reason of not being able to use mysql? And do you want to use postgresql within cluster or outside the cluster? Or are you looking for a way to configure postgresql as an alternative of cloudsql?

@yiyuanyu17
Copy link
Author

Hello @yiyuanyu17 , can you help us understand what is the reason of not being able to use mysql? And do you want to use postgresql within cluster or outside the cluster? Or are you looking for a way to configure postgresql as an alternative of cloudsql?

hello, we use kubeflow pipeline for AI model training in our platform. In the process of privatization delivery, some customers explicitly require that self built MySQL is not allowed, and the PostgreSQL provided by the customer side must be used. Therefore, our applications are modified into ORM framework to adapt to different database types. However, it is noted that the kubeflow pipeline server has not increased its support for PostgreSQL. Therefore, it proposes this issue and hopes to get the help of the community.

@zijianjoy
Copy link
Collaborator

Thank you for the info, @yiyuanyu17 . I will keep this issue open so people can upvote if they are also interested in this postgresql support. People can create overlay which connects to postgresql but such support is not available in this repo yet.

@imiller445
Copy link

We would also be interested in this feature. We do a lot of on prem and disconnected/airgapped deployments. As such, Cloud Vendor hosted databases are not an option. In most scenarios it is easiest to run our own database clusters colocated on the same k8s environment as we run Kubeflow. The Crunchy Postgres experience on k8s is the best experience we've found to operate RDBMS clusters on k8s and we leverage it for other tooling. Would be nice to leverage it from Kubeflow as well, as operating MySql clusters on k8s is not as seamless an experience.

@vasireddyvkl
Copy link

In our case we only have Postgres as an option for managed on prem DB. So looking for out fo the box Postgres support. @zijianjoy Can you please elaborate what creating overlay means, If that helps connect Kubeflow to postgres, I am interested to give it a try. Thanks!

@RoyerRamirez
Copy link

Hi @zijianjoy,

We also strictly use PostgreSQL internally, since it's better suited for data warehousing purposes.

@zijianjoy
Copy link
Collaborator

overlay is a kustomize concept as described in https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/. An overlay is a kubernetes resource package, it is like a variant of base KFP package.

Here is a list of KFP overlay: https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env

If you look at platform-agnostic folder, you will find that it is depending on mysql: https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/third-party/mysql

So if you want to introduce postgresql, what you need to do is:

  1. Create a postgresql folder under https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/third-party and define the postgresql resources in this folder.
  2. Define an overlay in https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env which allows people to use postgresql.

I would recommend testing this postgresql integration on your environment first before committing to KFP repo, because there is no guarantee/testing to verify KFP working with postgresql.

@javen218
Copy link

javen218 commented Jul 29, 2022

It would be great if kubeflow pipeline support postgres!!!
For some reason, our company also can not use MySQL. We strongly recommend the community to make the database optional @zijianjoy

@javen218
Copy link

javen218 commented Aug 1, 2022

It's a time consuming job for us every user to implement postgresql available for pipelines. So we're eagerly waiting for someone to contribute to it.

There are already pull requests implementing postgres for kubeflow katib (kubeflow/katib#1921), I wander if there any plan about KFP SUPPORT PG?

@Edward-liang
Copy link

Edward-liang commented Aug 1, 2022

MySQL is of no doubt an excellent database, however Oracle's acquisition brought uncertainty to its future. Like the others above, I sincerely hope kubeflow/pipeline can support postgresql soon, which is license friendly, and owns lots of advanced features.

@zijianjoy zijianjoy added the help wanted The community is welcome to contribute. label Aug 9, 2022
@chensun
Copy link
Member

chensun commented Aug 11, 2022

Also note that google/mlmd doesn't support Postgres yet: google/ml-metadata#26

@shalberd
Copy link

shalberd commented Aug 15, 2022

As others have hinted towards here, PostgreSQL, especially with Operator Lifecycle Manager and, if wanted, being a Red-Hat-certified operator, is the way to go in an Enterprise environment that is Kubernetes-based. I wholeheartedly agree with all people who posted here. Database should not come pre-packaged with Kubeflow, as it is not a core component. Let people who really know their stuff handle things like database-ops and deployment, like e.g. Crunchy with PostgreSQL. And then use Postgres as a database for Kubeflow.
Seriously, replication factor of 1, no pgbouncer proxy to improve load handling, no backup strategy .... https://github.com/kubeflow/katib/blob/9fce9dd03bc476b4e1f3d385e9692ac5cef681f4/manifests/v1beta1/components/postgres/postgres.yaml
That cannot seriously be an approach by a project that has its origins with one of the big tech firms.
Same goes for air-gapped functionality support with custom docker registries, HTTP_PROXY support via env variables and custom CA configmap for PKI trust.

@zijianjoy
Copy link
Collaborator

zijianjoy commented Aug 23, 2022

Currently we would like help from community to support PostgresQL integration. For anyone who wants to contribute making Kubeflow Pipelines runnable with PostgresSQL:

  1. Create a postgresql folder under https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/third-party and define the postgresql resources in this folder. (Done)
  2. Implement PostgreSQL integration in KFP API server and cache server.
  3. Define an overlay in https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env which integrates KFP with postgresql
  4. Change MLMD's use of database as PostgreSQL: Support for PostgreSQL? google/ml-metadata#26 (Done)

@Linchin Linchin added this to KFP v2 May 23, 2023
@Linchin Linchin moved this to Post GA in KFP v2 May 23, 2023
@Linchin Linchin self-assigned this May 23, 2023
@chensun chensun moved this from Post GA to P0 in KFP v2 Jul 13, 2023
@Linchin Linchin removed their assignment Aug 28, 2023
@chensun chensun removed this from KFP v2 Sep 14, 2023
@rimolive
Copy link
Member

rimolive commented Mar 7, 2024

Since we have #9813 to track this work, I'll close this issue. Please follow updates in that tracker issue

/close

Copy link

@rimolive: Closing this issue.

In response to this:

Since we have #9813 to track this work, I'll close this issue. Please follow updates in that tracker issue

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@zijianjoy zijianjoy reopened this Mar 7, 2024
@zijianjoy
Copy link
Collaborator

@rimolive Sorry, since this work is not finished yet, the feature request bug is still valid. (Note: We use the upvote count of the original issue in order to track community's interest across the org, thus I am reopening this issue)

@tarilabs
Copy link
Member

tarilabs commented Mar 8, 2024

May I suggest keeping track of MLMD's: google/ml-metadata#194 (comment) for this KFP-with-PostgreSQL scope?

Reason being, when MLMD is backed by PostgreSQL, there is allegedly a practical limits of only ~2K chars in MLMD string properties.

Potential solutions are mentioned (and one presented) with: google/ml-metadata#195

hope this helps!

@rimolive
Copy link
Member

rimolive commented Mar 8, 2024

Thanks @tarilabs for letting us know!

@zijianjoy Can you add this issue as a work item for MLMD integration in #9813? I thinks it's a good first issue and for GSoC.

@zijianjoy
Copy link
Collaborator

Thanks @tarilabs for letting us know!

@zijianjoy Can you add this issue as a work item for MLMD integration in #9813? I thinks it's a good first issue and for GSoC.

Added, however, please note that it is going to be an optional task in terms of postgresql integration with KFP, but a good item to contribute on.

@rimolive
Copy link
Member

Agreed, the idea to add this issue is for tracking purposes.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label May 11, 2024
Copy link

github-actions bot commented Jun 2, 2024

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@github-actions github-actions bot closed this as completed Jun 2, 2024
@rimolive
Copy link
Member

rimolive commented Jun 3, 2024

/reopen
/lifecycle frozen

Copy link

@rimolive: Reopened this issue.

In response to this:

/reopen
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@google-oss-prow google-oss-prow bot reopened this Jun 3, 2024
@google-oss-prow google-oss-prow bot added lifecycle/frozen and removed lifecycle/stale The issue / pull request is stale, any activities remove this label. labels Jun 3, 2024
@github-project-automation github-project-automation bot moved this to Needs triage in KFP Runtime Triage Aug 29, 2024
@maulik-modi22
Copy link

For on-premise VM based deployment, https://www.enterprisedb.com/ offers certified postgres deployment and servics
For on-premise k8 deployment, Crunchy, Fujitsju and many others have Red hat certified operator for Openshift
For on-cloud deployment, All 3 top clouds(Azure, AWS and Google) offer managed cloud services

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted The community is welcome to contribute. kind/feature lifecycle/frozen
Projects
Status: Needs triage
Development

No branches or pull requests