Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Inconsistent with the reproduced results of the paper #18

Open
metaqiang opened this issue Feb 24, 2022 · 6 comments
Open

Inconsistent with the reproduced results of the paper #18

metaqiang opened this issue Feb 24, 2022 · 6 comments

Comments

@metaqiang
Copy link

metaqiang commented Feb 24, 2022

Description

This is what we reproduced:
image

This is the result in the paper:
image

We don't know why the results of Soft Modularization and Multi-headed SAC are not good.

How to reproduce

The following code is a command line instruction, as described in https://mtrl.readthedocs.io/en/latest/pages/tutorials/baseline.html.


cd Code/mtrl-main/
conda activate garage
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/yyq/.mujoco/mujoco200/bin
export PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning'
mkdir -p ./trainlogs

mt10_mtsac

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd1.log 2>&1 &

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd2.log 2>&1 &

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd3.log 2>&1 &

mt10_mtmhsac

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd1.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd2.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd3.log 2>&1 &

mt10_soft_modularization

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd1.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd2.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd3.log 2>&1 &

System information

  • MTRL Version: latest :
  • MTRL environment Name: MT10 :
  • Python version: 1.5.0 :

Thank you very much!

@aithuuuus
Copy link

You are using 2 million time steps and the paper is using 100k time steps, you should compare table1

@metaqiang
Copy link
Author

You are using 2 million time steps and the paper is using 100k time steps, you should compare table1

The experimental results are still inconsistent with the table 1 in the paper:
image

@shagunsodhani
Copy link
Contributor

Hi! Let me see if I can understand the issue here. The results in the paper are with 10 seeds. As you can see, even with 10 seeds, the standard error is quite high. For reference, standard error = standard deviation / sqrt(num of seeds). I understand that you ran the experiments with 3 seeds and the standard error bands that you get are quite high as well (especially for multi-headed SAC). Could you please try running with more seeds? Increasing the experiment.num_eval_episodes may also help to get more stable results. Could you also share the metaworld version (git commit) that you are using.

@metaqiang
Copy link
Author

Hello, my metaworld version is af8417bfc82a3e249b4b02156518d775f29eb289. Do different versions of metaworld greatly affect the experimental results?

@shagunsodhani
Copy link
Contributor

That is the metaworld version that we tested against. I wanted to check the version as metaworld was under active development at that time

@metaqiang
Copy link
Author

We are using the af8417bfc version as the environment, which is also used when we run the MTRL code :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants