Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/mlflow] password authentication failed for user "bn_mlflow" when deploying with ArgoCD #28893

Closed
lvijnck opened this issue Aug 15, 2024 · 12 comments
Assignees
Labels
mlflow solved stale 15 days without activity tech-issues The user has a technical issue about an application triage Triage is needed

Comments

@lvijnck
Copy link

lvijnck commented Aug 15, 2024

Name and Version

mlflow 1.4.22

What architecture are you using?

amd64

What steps will reproduce the bug?

I've deployed MLFlow to my k8s cluster using ArgoCD, however after a while the MLFlow tracking container crashes due to unable to authenticate to PgSQL. This is with the default setup.

Are you using any custom parameters or values?

# https://artifacthub.io/packages/helm/bitnami/mlflow
mlflow:

  # Disable tracking server password
  tracking: 
    auth:
      enabled: false

What is the expected behavior?

No crashing?

What do you see instead?

Crash

Tracking logs:

│ Traceback (most recent call last):                                                                                                                 │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/cli.py", line 425, in server                                                       │
│     initialize_backend_stores(backend_store_uri, registry_store_uri, default_artifact_root)                                                        │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 355, in initialize_backend_stores                        │
│     _get_tracking_store(backend_store_uri, default_artifact_root)                                                                                  │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 332, in _get_tracking_store                              │
│     _tracking_store = _tracking_store_registry.get_store(store_uri, artifact_root)                                                                 │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/registry.py", line 42, in get_store                     │
│     return self._get_store_with_resolved_uri(resolved_store_uri, artifact_uri)                                                                     │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/registry.py", line 52, in _get_store_with_resolved_uri  │
│     return builder(store_uri=resolved_store_uri, artifact_uri=artifact_uri)                                                                        │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/server/handlers.py", line 153, in _get_sqlalchemy_store                            │
│     return SqlAlchemyStore(store_uri, artifact_uri)                                                                                                │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/store/tracking/sqlalchemy_store.py", line 164, in __init__                         │
│     ] = mlflow.store.db.utils.create_sqlalchemy_engine_with_retry(db_uri)                                                                          │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/mlflow/store/db/utils.py", line 238, in create_sqlalchemy_engine_with_retry               │
│     sqlalchemy.inspect(engine)                                                                                                                     │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/inspection.py", line 140, in inspect                                           │
│     ret = reg(subject)                                                                                                                             │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 303, in _engine_insp                               │
│     return Inspector._construct(Inspector._init_engine, bind)                                                                                      │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 236, in _construct                                 │
│     init(self, bind)                                                                                                                               │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 247, in _init_engine                               │
│     engine.connect().close()                                                                                                                       │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3278, in connect                                         │
│     return self._connection_cls(self)                                                                                                              │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 148, in __init__                                         │
│     Connection._handle_dbapi_exception_noconnection(                                                                                               │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2442, in _handle_dbapi_exception_noconnection            │
│     raise sqlalchemy_exception.with_traceback(exc_info[2]) from e                                                                                  │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 146, in __init__                                         │
│     self._dbapi_connection = engine.raw_connection()                                                                                               │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 3302, in raw_connection                                  │
│     return self.pool.connect()                                                                                                                     │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 449, in connect                                            │
│     return _ConnectionFairy._checkout(self)                                                                                                        │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 1263, in _checkout                                         │
│     fairy = _ConnectionRecord.checkout(pool)                                                                                                       │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 712, in checkout                                           │
│     rec = pool._do_get()                                                                                                                           │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 179, in _do_get                                            │
│     with util.safe_reraise():                                                                                                                      │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__                                    │
│     raise exc_value.with_traceback(exc_tb)                                                                                                         │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get                                            │
│     return self._create_connection()                                                                                                               │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 390, in _create_connection                                 │
│     return _ConnectionRecord(self)                                                                                                                 │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 674, in __init__                                           │
│     self.__connect()                                                                                                                               │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 900, in __connect                                          │
│     with util.safe_reraise():                                                                                                                      │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/util/langhelpers.py", line 146, in __exit__                                    │
│     raise exc_value.with_traceback(exc_tb)                                                                                                         │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 896, in __connect                                          │
│     self.dbapi_connection = connection = pool._invoke_creator(self)                                                                                │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 643, in connect                                        │
│     return dialect.connect(*cargs, **cparams)                                                                                                      │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 621, in connect                                       │
│     return self.loaded_dbapi.connect(*cargs, **cparams)                                                                                            │
│   File "/opt/bitnami/python/lib/python3.10/site-packages/psycopg2/__init__.py", line 122, in connect                                               │
│     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)                                                                         │
│ sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) connection to server at "mlflow-postgresql" (10.130.50.162), port 5432 failed: FATAL:

pgsql error:

│ 2024-08-15 09:21:26.003 GMT [133377] FATAL:  password authentication failed for user "bn_mlflow"                                                   │
│ 2024-08-15 09:21:26.003 GMT [133377] DETAIL:  Connection matched file "/opt/bitnami/postgresql/conf/pg_hba.conf" line 1: "host     all

Additional information

No response

@lvijnck lvijnck added the tech-issues The user has a technical issue about an application label Aug 15, 2024
@lvijnck lvijnck changed the title password authentication failed for user "bn_mlflow" [MLFlow] password authentication failed for user "bn_mlflow" Aug 15, 2024
@lvijnck lvijnck changed the title [MLFlow] password authentication failed for user "bn_mlflow" [bitnami/mlflow] password authentication failed for user "bn_mlflow" Aug 15, 2024
@github-actions github-actions bot added the triage Triage is needed label Aug 15, 2024
@javsalgar
Copy link
Contributor

Hi!

Could you confirm that you did not have leftovers from previous PostgreSQL PVCs? Maybe it is using a previous database password

@lvijnck
Copy link
Author

lvijnck commented Aug 19, 2024

Cleared volumes, I will get back to you if the error persists.

@ander-db
Copy link

ander-db commented Aug 22, 2024

Hi! I'm also encountering the authentication error with MLflow and PostgreSQL.
While debugging, I noticed that the initdb script in the MLflow chart doesn't appear to be executing. The logs show the pre-init script running, but there's no trace of the initdb script.

Has anyone else observed similar behavior or found a way to resolve this issue?

Update: I've deployed it in a different namespace where there weren't any resources deployed and it worked (in case it helps someone else).

@lvijnck
Copy link
Author

lvijnck commented Aug 23, 2024

@javsalgar issue still persists.

I suspect this happened when we moved the MLFlow tracking pod to another node.

@pascalwhoop
Copy link

I have no name!@mlflow-postgresql-0:/$ env | grep PASS
POSTGRES_PASSWORD=b19Mdkd3Ne
POSTGRES_POSTGRES_PASSWORD=5kjkxJag7j

Noticed this being the case in the Postgres pod.

@lvijnck
Copy link
Author

lvijnck commented Aug 23, 2024

This seems to be caused by our ArgoCD based deployment for MLFlow.

As a workaround, we're using:

  1. GCP SecretsManager entry directly from our IaC codebase (we provision this using GitCrypt)
  2. ExternalSecrets entry to bring Secret into k8s
  3. Use ExternalSecret created secret as postgresql.auth.existingSecret

@pascalwhoop
Copy link

(working with @lvijnck so adding more context)

ArgoCD uses helm template. Every time it does, it generates a new secret. That secret is then overriding the previous one. Postgres only looks at the first secret for the password, it then no minds that env variable. The mlflow-tracking service however continues to use the new secrets (and thus new passwords). These are of course not valid.

There unfortunately is no easy way to get the old passwords back it seems. Unless someone has a good idea how to get it from the rollback function of argocd

@lvijnck
Copy link
Author

lvijnck commented Aug 23, 2024

The same this is true for Minio, where ArgoCD generates a new password on synchronisation breaking the Minio integration. We've followed the same steps above to mitigate for now.

We're looking forward to see #28938 merged.

@lvijnck lvijnck changed the title [bitnami/mlflow] password authentication failed for user "bn_mlflow" [bitnami/mlflow] password authentication failed for user "bn_mlflow" when deploying with ArgoCD Aug 23, 2024
Copy link

github-actions bot commented Sep 8, 2024

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Sep 8, 2024
@lvijnck
Copy link
Author

lvijnck commented Sep 11, 2024

@javsalgar I think it makes sense calling this out somewhere in the docs, as it's quite a nasty issue.

@github-actions github-actions bot removed the stale 15 days without activity label Sep 12, 2024
Copy link

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Sep 27, 2024
Copy link

github-actions bot commented Oct 2, 2024

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlflow solved stale 15 days without activity tech-issues The user has a technical issue about an application triage Triage is needed
Projects
None yet
Development

No branches or pull requests

6 participants