"Bad Idea, Right?"

Exploring Anticipatory Human Reactions for Outcome Prediction in HRI

¹Cornell Tech, ²Accenture Labs

Published at the 2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)

For inquiries, please contact Teresa (mb2554 [at] cornell [dot] edu).

Read Paper Supplementary Material GitHub

Abstract

Humans have the ability to anticipate what will happen in their environment based on perceived information. Their anticipation is often manifested as an externally observable behavioral reaction, which cues other people in the environment that something bad might happen. As robots become more prevalent in human spaces, robots can leverage these visible anticipatory responses to assess whether their own actions might be "a bad idea?"

In this study, we delved into the potential of human anticipatory reaction recognition to predict outcomes. We conducted a user study wherein 30 participants watched videos of action scenarios and were asked about their anticipated outcome of the situation shown in each video ("good" or "bad"). We collected video and audio data of the participants reactions as they were watching these videos. We then carefully analyzed the participants' behavioral anticipatory responses; this data was used to train machine learning models to predict anticipated outcomes based on human observable behavior.

Reactions are multimodal, compound and diverse, and we find significant differences in facial reactions. Model performances are around 0.5-0.6 test accuracy, and increase notably when nonreactive participants are excluded from the dataset. We discuss the implications of these findings and future work. This research offers insights into improving the safety and efficiency of human-robot interactions, contributing to the evolving field of robotics and human-robot collaboration.

Study Protocol

Overview

We conducted an online crowd-sourced study to collect webcam reactions to stimulus videos from a global sample recruited through Prolific.

  • Participants: 30 participants (ages 20-39, diverse backgrounds)
  • Stimulus Dataset: 30 short videos (9.62 ± 2.77 seconds) featuring humans and robots
  • Study Duration: approximately 30 minutes

Data Collection

  • Webcam recordings at 30 fps captured participants' facial reactions
  • Video order was randomized for each participant
  • Participants could not see themselves during video playback
  • Two-stage video viewing: shortened version followed by prediction, then full outcome reveal

Analysis

  • Facial feature extraction using OpenFace 2.0
  • 35 facial action units (AUs) analyzed
  • Machine learning models tested: RNNs, LSTMs, GRUs, BiLSTMs, and Deep Neural Networks
  • 12 action units with significantly different activation intensities identified

Example Reactions

Here are some examples of anticipatory reactions captured during the study:

Responses to good (left) vs bad (right) outcome

Reactive Response - Good Outcome Reactive Response - Bad Outcome

Diverse Anticipatory Behaviors

Reaction 1 Reaction 2 Reaction 3
Reaction 4 Reaction 5 Reaction 6

Example Stimulus

Example Stimulus Video

Comparison: Reactive vs Non-Reactive

Our study found that some participants displayed very visible reactions (especially for anticipated bad outcomes), while others showed subtle to no reactions at all.

Non-Reactive Example

Key Findings

  • Reactions to bad outcomes are more salient: Participants displayed more diverse and visible reactions when anticipating bad outcomes
  • Multimodal and evolving responses: Anticipatory behaviors include facial expressions, head motion, body pose changes, and vocalizations that compound and evolve over time
  • Person-dependent variability: Different participants showed varying degrees of reactivity
  • Significant facial features: 12 facial action units showed significantly different activation patterns between good and bad anticipated outcomes, including:
    • Inner Brow Raiser
    • Brow Lowerer
    • Cheek Raiser
    • Nose Wrinkler
    • Lip Corner Puller
    • Jaw Drop
    • Blink
  • Model performance: Best models achieved ~60% accuracy on the curated dataset, with notable improvements when non-reactive participants were excluded

Applications

This research has implications for:

Robot Error Prevention

Systems that can detect and prevent errors before they happen

Human-Robot Collaboration Safety

Enhanced safety through anticipatory social cue detection

Adaptive Robot Behavior

Robots that respond to human social signals in real-time

Human-AI Collaboration

Proactive failure detection in collaborative systems

Citation

@INPROCEEDINGS{10731310, author={Parreira, Maria Teresa and Lingaraju, Sukruth Gowdru and Ramirez-Artistizabal, Adolfo and Bremers, Alexandra and Saha, Manaswi and Kuniavsky, Michael and Ju, Wendy}, booktitle={2024 33rd IEEE International Conference on Robot and Human Interactive Communication (ROMAN)}, title={"Bad Idea, Right?" Exploring Anticipatory Human Reactions for Outcome Prediction in HRI}, year={2024}, pages={2072-2078}, doi={10.1109/RO-MAN60168.2024.10731310} }

Resources

📄 Paper

IEEE Xplore

📋 Supplementary Material

PDF

📊 Presentation Slides

Google Slides

🎥 Stimulus Dataset

Dataset Info

💻 Code

Implementation

🔗 GitHub

Repository