ALFRED Speaks: Automatic Instruction Generation for Egocentric Skill Learning EVAL @ ECCV 2020

Abstract

Embodied agents need to not only be able to follow instructions, but also give them. Previous work has focused on simple navigation instructions and on language generation targeted at embodied agents, not humans. We take a different approach and target the creation of EXPLAINER modules that eschew low-level instruction in favor of more natural goal-directed language. We achieve this in part with novel fine-grained state-tracking to minimize extraneous details without forgetting core properties. In addition to this new model formulation, we propose a new Object Capture Test to evaluate the accuracy and goal-directedness of these EXPLAINER-generated instructions. We find that the EXPLAINER generates more fluent, accurate, and goal-directed instructions compared to a naïve sequence-to-sequence generative model.