Pickle Loading Problems (EOFError: Ran out of input) #13

golddohyun · 2023-02-28T12:02:01Z

I am trying to train the model using vimeo90k dataset, but I get "EOFError: Ran out of input" issue. I was able to train the flow estimator successfully, but this kind of error occurs when training the whole framework. I ran the model with one A6000 GPU and had set the default num_workders as 2. Any ideas..?

  File "/data/projects/chaeyun/VFIformer/models/archs/VFIformer_arch.py", line 346, in __init__
    self.load_networks('flownet', args.resume_flownet)
  File "/data/projects/chaeyun/VFIformer/models/archs/VFIformer_arch.py", line 354, in load_networks
    load_net = torch.load(load_path, map_location=torch.device(self.device))
  File "/home/chaeyun/.conda/envs/vfiformer/lib/python3.9/site-packages/torch/serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/chaeyun/.conda/envs/vfiformer/lib/python3.9/site-packages/torch/serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 199954) of binary: /home/chaeyun/.conda/envs/vfiformer/bin/python
Traceback (most recent call last):
  File "/home/chaeyun/.conda/envs/vfiformer/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/chaeyun/.conda/envs/vfiformer/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/chaeyun/.conda/envs/vfiformer/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/chaeyun/.conda/envs/vfiformer/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/chaeyun/.conda/envs/vfiformer/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/chaeyun/.conda/envs/vfiformer/lib/python3.9/site-packages/torch/distributed/run.py", line 715, in run
    elastic_launch(
  File "/home/chaeyun/.conda/envs/vfiformer/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/chaeyun/.conda/envs/vfiformer/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Here's my script with arguments :

python -m torch.distributed.launch --nproc_per_node=1 --master_port=4178 train.py --launcher pytorch --gpu_ids 0 --loss_l1 --loss_ter --loss_flow --use_tb_logger --batch_size 128 --net_name VFIformer --name train_VFIformer --max_iter 300 --crop_size 192 --save_epoch_freq 5 --resume_flownet ./weights/train_IFNet/snapshot/net_final.pth

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pickle Loading Problems (EOFError: Ran out of input) #13

Pickle Loading Problems (EOFError: Ran out of input) #13

golddohyun commented Feb 28, 2023

Pickle Loading Problems (EOFError: Ran out of input) #13

Pickle Loading Problems (EOFError: Ran out of input) #13

Comments

golddohyun commented Feb 28, 2023