Skip to content

Latest commit

 

History

History
38 lines (29 loc) · 1.24 KB

File metadata and controls

38 lines (29 loc) · 1.24 KB

Decision Making-ALFWorld

This is the code for evaluating ALFChat on ALFWorld.

Setup

Option 1: Local Installation

Download alfworld data and install environments following instructions here.

Option 2: Docker (Recommended)

We provide a docker that has set up all the environments. The code is in its /autogen-eval directory.

docker pull leoljl/alfchat:v2
docker run -it leoljl/alfchat zsh
# now you are in docker environment
cd /autogen-eval/application/A3-decision-making-ALFWorld

Evaluation on Benchmark

Fill in your api-key in twoagent.py, then run the following command to evaluate ALFChat (2 agent) on AlfWorld. The conversation history will be saved in logs_twoagent/

python twoagent.py

Fill in your api-key in multiagent.py, then run the following command to evaluate ALFChat (3 agent) on AlfWorld. The conversation history will be saved in logs_multiagent/

python multiagent.py

To calculate success rate given conversation history, use the following command.

python count.py --dir logs_multiagent/
python count.py --dir logs_twoagent/

We compare task success rate between ReAct, ALFChat (2 agent), ALFChat (3 agent).