Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved ways of storing local code in S3 for ProcessingSteps #4879

Open
HFulcher opened this issue Oct 1, 2024 · 0 comments
Open

Improved ways of storing local code in S3 for ProcessingSteps #4879

HFulcher opened this issue Oct 1, 2024 · 0 comments
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: feature request

Comments

@HFulcher
Copy link

HFulcher commented Oct 1, 2024

Describe the feature you'd like
Currently, when using Processors such as SKLearnProcessor there is no way to specify where a local code= file should be stored in S3 when used in conjunction with a ProcessingStep. This can lead to clutter in S3 buckets, for example. The current behaviour places code in the default_bucket of a Sagemaker session like so:

s3://{default_bucket}/auto_generated_hash/input/code/preprocess.py

A better user experience would be to allow the user to define exactly where the code should be uploaded. This allows users to group files together for each run. For example:

s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/code/preprocess.py
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/data/train.csv
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/model/model.pkl

This should already be possible with the FrameworkProcessor and utilising the code_location= parameter but this seems to be ignored by the ProcessingStep.

@rohangujarathi rohangujarathi added the component: pipelines Relates to the SageMaker Pipeline Platform label Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: feature request
Projects
None yet
Development

No branches or pull requests

3 participants