Alfred Evaluation #16

BatmanofZuhandArrgh · 2024-04-15T12:09:14Z

Hi,

How did you guys evaluate on Alfred? Skimming through it it seems that it requires some .pth deep learning model files. Did u use this codebase https://github.com/lbaa2022/LLMTaskPlanning

Also how did LLM-Planner do on the 192 AI2Thor games? I didn't find any info in your paper?

Thank u

chanhee-luke · 2024-06-11T07:10:56Z

We used a HLSM's low-level controller as our low-level controller (per our paper).
For 192 Alfworld games, we don't have a separate statistic for those. But they are a subset of ALFRED evaluation tasks so I assume the performance is going to be similar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alfred Evaluation #16

Alfred Evaluation #16

BatmanofZuhandArrgh commented Apr 15, 2024

chanhee-luke commented Jun 11, 2024

Alfred Evaluation #16

Alfred Evaluation #16

Comments

BatmanofZuhandArrgh commented Apr 15, 2024

chanhee-luke commented Jun 11, 2024