Share this page:

SIMMC-VR: A Task-oriented Multimodal Dialog Dataset with Situated and Immersive VR Streams

Te-Lin Wu, Satwik Kottur, Andrea Madotto, Mahmoud Azab, Pedro Rodriguez, Nanyun Peng, Babak Damavandi, and Seungwhan Moon, in Proceedings of the Conference of the 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

Download the full text


Abstract

Building an AI assistant that can seamlessly converse and instruct humans, in a user-centric situated scenario, requires several essential abilities: (1) spatial and temporal understanding of the situated and real-time user scenes, (2) capability of grounding the actively perceived visuals of users to conversation contexts, and (3) conversational reasoning over past utterances to perform just-in-time assistance. However, we currently lack a large-scale benchmark that captures user–assistant interactions with all of the aforementioned features. To this end, we propose SIMMC-VR, extending the SIMMC 2.0 dataset, which only concerns static visual scenes, to a video-grounded task-oriented dialog dataset that captures real-world AI-assisted user scenarios in VR. We propose a novel data collection paradigm that involves (1) generating object-centric multimodal dialog flows with egocentric visual streams and visually-grounded templates, and (2) manually paraphrasing the simulated dialogs for naturalness and diversity while preserving multimodal dependencies. To measure meaningful progress in the field, we propose four tasks to address the new challenges in SIMMC-VR, which require complex spatial-temporal dialog reasoning in active egocentric scenes. We benchmark the proposed tasks with strong multimodal models, and highlight the key capabilities that current models lack for future research directions.


Bib Entry

@inproceedings{wu2023simmcvr,
  title = {SIMMC-VR: A Task-oriented Multimodal Dialog Dataset with Situated and Immersive VR Streams},
  author = {Wu, Te-Lin and Kottur, Satwik and Madotto, Andrea and Azab, Mahmoud and Rodriguez, Pedro and Peng, Nanyun and Damavandi, Babak and Moon, Seungwhan},
  booktitle = {Proceedings of the Conference of the 61st Annual Meeting of the Association for Computational Linguistics (ACL)},
  year = {2023}
}

Related Publications