Vertexai pipeline #17

jmandivarapu1 · 2022-11-09T16:18:46Z

Expected Behavior

I am using VertexAI pipeline which does have only two modules.

M1 : Create Datasets (which is just pytorch dataloader) and sent the output to
M2: Training

But the problem is that when print the dataset in M2. It is not sending the pytorch dataloader the previous model is sending artifact_types.Datase . It would be great if any can help how to send pytorch dataloader directly. As I seen in other examples is ostly about sending datasets url and paths. It would be great if you can provide with an example of how to send pytorch loader from M1 to M2.
you can follow sample example from this https://github.com/pytorch/examples/blob/main/mnist/main.py

print(dataloader)
Train Loader <kfp.v2.components.types.artifact_types.Dataset object at 0x7f2dc3dfe6d0>

print(dir(dataloader))

 ['TYPE_NAME', 'VERSION', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_get_path', '_set_path', 'metadata', 'name', 'path', 'uri']

ERROR 'Dataset' object is not iterable

Actual Behavior

I sent the output of the M1 to M2.

Steps to Reproduce the Problem

M1

@component(
    output_component_file="pipeline/create_dataset.yaml", 
    base_image=BASE_IMAGE,
)
def create_dataset(
  # An input parameter of type string.
    cfg_url: str,
    # Use Output to get a metadata-rich handle to the output artifact
    # of type `Dataset`.
    train_loader: Output[Dataset],
    test_loader: Output[Dataset],
    # A locally accessible filepath for another output artifact of type
    # `Dataset`.
    # output_dataset_two_path: OutputPath("Dataset"),
    # A locally accessible filepath for an output parameter of type string.
    # output_parameter_path: OutputPath(str),
):
 
    train_kwargs = {'batch_size': 12}
    test_kwargs = {'batch_size': 25}
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
        ])
    dataset1 = datasets.MNIST('../data', train=True, download=True,
                       transform=transform)
    dataset2 = datasets.MNIST('../data', train=False,
                       transform=transform)
    train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs)
    test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)

M2

@component(
    output_component_file="pipeline/train.yaml", 
    base_image=TRAIN_IMAGE
)
def train_model(
    dataloader_train: Input[Dataset],
    dataloader_test: Input[Dataset],
    cfg_url       : str,
    ds_urls       : str,
    models_path   : Input[Artifact],
    tb_logs       : Input[Artifact],
):
print(dataloader_train,dataloader_test)
for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)

Specifications

Version:
Platform:

The text was updated successfully, but these errors were encountered:

felix-datatonic · 2023-06-02T20:01:32Z

Hi @jmandivarapu1, you'll need to save the MNIST dataset as a file (or multiple files) in the destination train_loader.path (or train_loader.uri). Then you'll be able to open it in your second component.

@Linchin Can you please close this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vertexai pipeline #17

Vertexai pipeline #17

jmandivarapu1 commented Nov 9, 2022

felix-datatonic commented Jun 2, 2023 •

edited

Loading

Vertexai pipeline #17

Vertexai pipeline #17

Comments

jmandivarapu1 commented Nov 9, 2022

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications

felix-datatonic commented Jun 2, 2023 • edited Loading

felix-datatonic commented Jun 2, 2023 •

edited

Loading