-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difficulty Debugging the Vertex AI Training Pipeline #49
Comments
Hi @clopezhrimac, thanks for your question! KubeFlow limits the execution of pipelines locally, alternatives pointed by the community are:
In this project, we've optimised components to be inline with option (2). Commands which help you with testing locally are: make setup-all-components
make test-all-components or make setup-component GROUP=<e.g. vertex-components>
make test-component GROUP=<e.g. vertex-components> Further, we've replaced the python-based training component in the pipelines with a While these don't provide full parity between local pipeline runs and submitting pipelines to Vertex AI, these will help you to iterate locally over any changes related to custom python-based components and your training code. We're currently evaluating the use of |
What is the diference between CustomTrainingJob y CustomPythonPackageTrainingJob ? |
Hi @clopezhrimac, Thanks for this issue. Please check out the most recent PR and release. We've moved away from You can |
I am facing difficulties in debugging the Vertex AI training pipeline. The issue lies in the fact that I cannot run the pipeline locally for testing and debugging purposes. Instead, I have to submit the pipeline to Vertex AI and wait for it to execute in order to obtain debugging information.
The current debugging process involves sending the pipeline with multiple print statements or logging messages to trace the execution flow and pinpoint the exact location of the error. This becomes a slow and tedious cycle as it requires resubmitting the pipeline every time an adjustment or error identification is needed.
Steps to Reproduce the Problem:
What would be the best way to handle this training component development cycle?
The text was updated successfully, but these errors were encountered: