You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are using a Cloud Run v2 pull work pool to execute deployed flows in Cloud Run V2 jobs and have noticed that upon successful container exit and successful flow exit the Cloud Run job definition is occasionally not deleted. This causes us to run up against the 1000 job definition limit in Cloud Run causing other deployed flows to not be submitted. We observe in the worker logs that for most flow runs there is a delete request submitted to Cloud Run.
Successful Delete logs
Prefect Worker log for deleted job definition
DEFAULT 2024-11-13T18:12:49.216484Z 18:12:49.216 | INFO | prefect.flow_runs.worker - Creating Cloud Run JobV2 warm-scorpion-<FLOW_ID>
DEFAULT 2024-11-13T18:12:59.592780Z 18:12:59.593 | INFO | prefect.flow_runs.worker - Submitting Cloud Run Job V2 warm-scorpion-<FLOW_ID> for execution...
DEFAULT 2024-11-13T18:12:59.784565Z 18:12:59.784 | INFO | prefect.flow_runs.worker - Cloud Run Job V2 warm-scorpion-<FLOW_ID> submitted for execution with command: p r e f e c t f l o w - r u n e x e c u t e
DEFAULT 2024-11-13T18:18:12.420228Z 18:18:12.420 | INFO | prefect.flow_runs.worker - Cloud Run Job V2 warm-scorpion-<FLOW_ID> succeeded
DEFAULT 2024-11-13T18:18:12.421823Z 18:18:12.421 | INFO | prefect.flow_runs.worker - Job run logs can be found on GCP at: https://console.cloud.google.com/logs/viewer?...
DEFAULT 2024-11-13T18:18:12.423120Z 18:18:12.423 | INFO | prefect.flow_runs.worker - Deleting completed Cloud Run Job 'warm-scorpion-<FLOW_ID>' from Google Cloud Run...
Prefect UI Job log for deleted job
Cloud Run Job V2 warm-scorpion-<FLOW_ID> submitted for execution with command: p r e f e c t f l o w - r u n e x e c u t e
10:12:59 AM
prefect.flow_runs.worker
Completed submission of flow run '<FLOW_RUN_ID>'
10:12:59 AM
prefect.flow_runs.worker
Opening process...
10:13:26 AM
prefect.flow_runs.runner
Uploading blob named <BLOB NAME> to the <BUCKET NAME> bucket
10:17:42 AM
prefect.flow_runs
Finished in state Completed()
10:17:42 AM
prefect.flow_runs
Process for flow run 'warm-scorpion' exited cleanly.
10:17:46 AM
prefect.flow_runs.runner
Cloud Run Job V2 warm-scorpion-<JOB_ID> succeeded
10:18:12 AM
prefect.flow_runs.worker
Job run logs can be found on GCP at: https://console.cloud.google.com/logs/viewer?...
10:18:12 AM
prefect.flow_runs.worker
Deleting completed Cloud Run Job 'warm-scorpion-<JOB_ID>' from Google Cloud Run...
Cloud Run API Audit log: Corresponding delete event for job definition
For flow runs that do not get deleted we observe that the container and flow exit successfully but have no associated delete request in the worker:
Prefect worker logs for non-deleted job definition
DEFAULT 2024-11-07T23:59:13.787708Z 23:59:13.788 | INFO | prefect.flow_runs.worker - Creating Cloud Run JobV2 clay-wolverine-<JOB_ID>
DEFAULT 2024-11-07T23:59:24.232438Z 23:59:24.232 | INFO | prefect.flow_runs.worker - Submitting Cloud Run Job V2 clay-wolverine-<JOB_ID> for execution...
DEFAULT 2024-11-07T23:59:24.478259Z 23:59:24.478 | INFO | prefect.flow_runs.worker - Cloud Run Job V2 clay-wolverine-<JOB_ID> submitted for execution with command: p r e f e c t f l o w - r u n e x e c u t e
Prefect UI job logs for non-deleted job definition
Cloud Run Job V2 clay-wolverine-<JOB_ID> submitted for execution with command: p r e f e c t f l o w - r u n e x e c u t e
03:59:24 PM
prefect.flow_runs.worker
Completed submission of flow run '<FLOW_RUN_ID>'
03:59:24 PM
prefect.flow_runs.worker
Opening process...
03:59:51 PM
prefect.flow_runs.runner
Uploading blob named <BLOB_NAME> to the <BUCKET_NAME> bucket
04:03:43 PM
prefect.flow_runs
Finished in state Completed()
04:03:43 PM
prefect.flow_runs
Process for flow run 'clay-wolverine' exited cleanly.
For our deployed flows we use the default job variable setting keep_job: false. Our current workaround is to have a scheduled job cleanup "stale" job definitions. This also happens for failed job runs as well. Any help here or pointers to something potentially misconfigured on our end would be helpful.
Bug summary
We are using a Cloud Run v2 pull work pool to execute deployed flows in Cloud Run V2 jobs and have noticed that upon successful container exit and successful flow exit the Cloud Run job definition is occasionally not deleted. This causes us to run up against the 1000 job definition limit in Cloud Run causing other deployed flows to not be submitted. We observe in the worker logs that for most flow runs there is a delete request submitted to Cloud Run.
Successful Delete logs
Prefect Worker log for deleted job definition
Prefect UI Job log for deleted job
Cloud Run API Audit log: Corresponding delete event for job definition
Unsuccessful Delete logs
For flow runs that do not get deleted we observe that the container and flow exit successfully but have no associated delete request in the worker:
Prefect worker logs for non-deleted job definition
Prefect UI job logs for non-deleted job definition
For our deployed flows we use the default job variable setting
keep_job: false
. Our current workaround is to have a scheduled job cleanup "stale" job definitions. This also happens for failed job runs as well. Any help here or pointers to something potentially misconfigured on our end would be helpful.Version info
Additional context
I did see this ticket #14525 which is also open and seems to be related
The text was updated successfully, but these errors were encountered: