Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertexai pipeline #17

Open
jmandivarapu1 opened this issue Nov 9, 2022 · 1 comment
Open

Vertexai pipeline #17

jmandivarapu1 opened this issue Nov 9, 2022 · 1 comment

Comments

@jmandivarapu1
Copy link

Expected Behavior

I am using VertexAI pipeline which does have only two modules.

  1. M1 : Create Datasets (which is just pytorch dataloader) and sent the output to
  2. M2: Training

But the problem is that when print the dataset in M2. It is not sending the pytorch dataloader the previous model is sending artifact_types.Datase . It would be great if any can help how to send pytorch dataloader directly. As I seen in other examples is ostly about sending datasets url and paths. It would be great if you can provide with an example of how to send pytorch loader from M1 to M2.
you can follow sample example from this https://github.com/pytorch/examples/blob/main/mnist/main.py

print(dataloader)
Train Loader <kfp.v2.components.types.artifact_types.Dataset object at 0x7f2dc3dfe6d0>
print(dir(dataloader))

 ['TYPE_NAME', 'VERSION', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_get_path', '_set_path', 'metadata', 'name', 'path', 'uri']
ERROR 'Dataset' object is not iterable

Actual Behavior

  1. I sent the output of the M1 to M2.

Steps to Reproduce the Problem

M1

@component(
    output_component_file="pipeline/create_dataset.yaml", 
    base_image=BASE_IMAGE,
)
def create_dataset(
  # An input parameter of type string.
    cfg_url: str,
    # Use Output to get a metadata-rich handle to the output artifact
    # of type `Dataset`.
    train_loader: Output[Dataset],
    test_loader: Output[Dataset],
    # A locally accessible filepath for another output artifact of type
    # `Dataset`.
    # output_dataset_two_path: OutputPath("Dataset"),
    # A locally accessible filepath for an output parameter of type string.
    # output_parameter_path: OutputPath(str),
):
 
    train_kwargs = {'batch_size': 12}
    test_kwargs = {'batch_size': 25}
    transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
        ])
    dataset1 = datasets.MNIST('../data', train=True, download=True,
                       transform=transform)
    dataset2 = datasets.MNIST('../data', train=False,
                       transform=transform)
    train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs)
    test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)

M2

@component(
    output_component_file="pipeline/train.yaml", 
    base_image=TRAIN_IMAGE
)
def train_model(
    dataloader_train: Input[Dataset],
    dataloader_test: Input[Dataset],
    cfg_url       : str,
    ds_urls       : str,
    models_path   : Input[Artifact],
    tb_logs       : Input[Artifact],
):
print(dataloader_train,dataloader_test)
for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)

Specifications

  • Version:
  • Platform:
@felix-datatonic
Copy link
Contributor

felix-datatonic commented Jun 2, 2023

Hi @jmandivarapu1, you'll need to save the MNIST dataset as a file (or multiple files) in the destination train_loader.path (or train_loader.uri). Then you'll be able to open it in your second component.

@Linchin Can you please close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants