Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run the training bash script to reproduce table #1

Open
kaushik333 opened this issue Aug 11, 2022 · 1 comment
Open

Unable to run the training bash script to reproduce table #1

kaushik333 opened this issue Aug 11, 2022 · 1 comment

Comments

@kaushik333
Copy link

kaushik333 commented Aug 11, 2022

Hi,

I am trying to run the set of experiments on different algorithms and different datasets by using the small bash script you provide in the README file, but I get the following error. This error is written into the error.txt file in the experiment folder. I create a file run.sh and place it inside design-baselines/design-baselines/ and then run it from that location. Is this correct?

Failure # 1 (occurred at 2022-08-11_01-17-15)
Traceback (most recent call last):
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 702, in _process_tr$
    results = self.trial_executor.fetch_result(trial)
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 686, in fetch$
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 47, in wrap$
    return func(*args, **kwargs)
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/worker.py", line 1481, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ^[[36mray::ImplicitFunc.train_buffered()^[[39m (pid=2049, ip=130.107.5.38)
  File "python/ray/_raylet.pyx", line 505, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task.function_executor
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/_private/function_manager.py", line 556, in act$
    return method(__ray_actor, *args, **kwargs)
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/tune/trainable.py", line 173, in train_buffered
    result = self.train()
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/tune/trainable.py", line 232, in train
    result = self.step()
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/tune/function_runner.py", line 366, in step
    self._report_thread_runner_error(block=True)
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/tune/function_runner.py", line 513, in _report_$
    ("Trial raised an exception. Traceback:\n{}".format(err_tb_str)
ray.tune.error.TuneError: Trial raised an exception. Traceback:
^[[36mray::ImplicitFunc.train_buffered()^[[39m (pid=2049, ip=130.107.5.38)
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/tune/function_runner.py", line 248, in run
    self._entrypoint()
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/tune/function_runner.py", line 316, in entrypoi$
    self._status_reporter.get_checkpoint())
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/ray/tune/function_runner.py", line 580, in _trainab$
    output = fn()
  File "/homes/kaushik/Projects/xxxxx/design-baselines/design_baselines/bo_qei/__init__.py", line 279, in bo_qei
    score = task.predict(solution)
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780,$
    result = self._call(*args, **kwds)
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 846,$
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in$
    cancellation_manager=cancellation_manager)
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in$
    ctx, args, cancellation_manager=cancellation_manager))
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in $
    ctx=ctx)
  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in qu$
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown:  NameError: name 'training' is not defined
[[36mray::ImplicitFunc.train_buffered()^[[39m (pid=2049, ip=130.107.5.38)

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 244, in $
    ret = func(*args)

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302,$
    return func(*args, **kwargs)

  File "/homes/kaushik/Projects/xxxxx/design-baselines/design_baselines/data.py", line 992, in predict_numpy
    return self.wrapped_task.predict(x_batch)

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/task.py", line 832, in predict
    return self.oracle.predict(x_batch, **kwargs)

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/oracles/oracle_builder.py", line 505, $
    range(self.internal_measurements)], axis=0)

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/oracles/oracle_builder.py", line 504, $
    x_sliced, **kwargs) for _ in

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/oracles/tensorflow/transformer_oracle.$
    elif isinstance(training, DiscreteDataset):

NameError: name 'training' is not defined

[[node PyFunc (defined at /Projects/xxxxx/design-baselines/design_baselines/data.py:1016) ]]
  (1) Unknown:  NameError: name 'training' is not defined
^[[36mray::ImplicitFunc.train_buffered()^[[39m (pid=2049, ip=130.107.5.38)

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 244, in $
    ret = func(*args)

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 302,$
    return func(*args, **kwargs)

  File "/homes/kaushik/Projects/xxxxx/design-baselines/design_baselines/data.py", line 992, in predict_numpy
    return self.wrapped_task.predict(x_batch)

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/task.py", line 832, in predict
    return self.oracle.predict(x_batch, **kwargs)

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/oracles/oracle_builder.py", line 505, $
    range(self.internal_measurements)], axis=0)

  File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/oracles/oracle_builder.py", line 504, $
    x_sliced, **kwargs) for _ in

File "/homes/kaushik/miniconda3/envs/design-baselines/lib/python3.7/site-packages/design_bench/oracles/tensorflow/transformer_oracle.$
    elif isinstance(training, DiscreteDataset):

NameError: name 'training' is not defined


         [[node PyFunc (defined at /Projects/xxxxx/design-baselines/design_baselines/data.py:1016) ]]
         [[PyFunc/_4]]
0 successful operations.
0 derived errors ignored. [Op:__inference_predict_169205]

Errors may have originated from an input operation.
Input Source operations connected to node PyFunc:
 x (defined at /Projects/xxxxx/design-baselines/design_baselines/bo_qei/__init__.py:279)

Input Source operations connected to node PyFunc:
 x (defined at /Projects/xxxxx/design-baselines/design_baselines/bo_qei/__init__.py:279)

Function call stack:
predict -> predict

This looks like there is some error produced in the pip installed design-bench library. Could you please take a look at this? Any suggestions are greatly appreciated. Thanks !

@kaushik333
Copy link
Author

Adding some more details after digging in:

It seems that protected_fit() expects "training" to be passed, but oracle predict doesnt seem to pass it (did a back trace on **kwargs and it doesnt seem like "training" is being passed into it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant