Improved ways of storing local code in S3 for ProcessingSteps #4879

HFulcher · 2024-10-01T09:01:59Z

Describe the feature you'd like
Currently, when using Processors such as SKLearnProcessor there is no way to specify where a local code= file should be stored in S3 when used in conjunction with a ProcessingStep. This can lead to clutter in S3 buckets, for example. The current behaviour places code in the default_bucket of a Sagemaker session like so:

s3://{default_bucket}/auto_generated_hash/input/code/preprocess.py

A better user experience would be to allow the user to define exactly where the code should be uploaded. This allows users to group files together for each run. For example:

s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/code/preprocess.py
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/data/train.csv
s3://{specified_bucket}/{project_name}/PIPELINE_EXECUTION_ID/model/model.pkl

This should already be possible with the FrameworkProcessor and utilising the code_location= parameter but this seems to be ignored by the ProcessingStep.

The text was updated successfully, but these errors were encountered:

rohangujarathi added the component: pipelines Relates to the SageMaker Pipeline Platform label Oct 28, 2024

qidewenwhen added the type: feature request label Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved ways of storing local code in S3 for ProcessingSteps #4879

Improved ways of storing local code in S3 for ProcessingSteps #4879

HFulcher commented Oct 1, 2024

Improved ways of storing local code in S3 for ProcessingSteps #4879

Improved ways of storing local code in S3 for ProcessingSteps #4879

Comments

HFulcher commented Oct 1, 2024