Abstract
Humans have the ability to anticipate what will happen in their environment based on perceived information. Their anticipation is often manifested as an externally observable behavioral reaction, which cues other people in the environment that something bad might happen. As robots become more prevalent in human spaces, robots can leverage these visible anticipatory responses to assess whether their own actions might be "a bad idea?"
In this study, we delved into the potential of human anticipatory reaction recognition to predict outcomes. We conducted a user study wherein 30 participants watched videos of action scenarios and were asked about their anticipated outcome of the situation shown in each video ("good" or "bad"). We collected video and audio data of the participants reactions as they were watching these videos. We then carefully analyzed the participants' behavioral anticipatory responses; this data was used to train machine learning models to predict anticipated outcomes based on human observable behavior.
Reactions are multimodal, compound and diverse, and we find significant differences in facial reactions. Model performances are around 0.5-0.6 test accuracy, and increase notably when nonreactive participants are excluded from the dataset. We discuss the implications of these findings and future work. This research offers insights into improving the safety and efficiency of human-robot interactions, contributing to the evolving field of robotics and human-robot collaboration.
Study Protocol
Overview
We conducted an online crowd-sourced study to collect webcam reactions to stimulus videos from a global sample recruited through Prolific.
- Participants: 30 participants (ages 20-39, diverse backgrounds)
- Stimulus Dataset: 30 short videos (9.62 ± 2.77 seconds) featuring humans and robots
- Study Duration: approximately 30 minutes
Data Collection
- Webcam recordings at 30 fps captured participants' facial reactions
- Video order was randomized for each participant
- Participants could not see themselves during video playback
- Two-stage video viewing: shortened version followed by prediction, then full outcome reveal
Analysis
- Facial feature extraction using OpenFace 2.0
- 35 facial action units (AUs) analyzed
- Machine learning models tested: RNNs, LSTMs, GRUs, BiLSTMs, and Deep Neural Networks
- 12 action units with significantly different activation intensities identified
Example Reactions
Here are some examples of anticipatory reactions captured during the study:
Responses to good (left) vs bad (right) outcome
Diverse Anticipatory Behaviors
Example Stimulus
Comparison: Reactive vs Non-Reactive
Our study found that some participants displayed very visible reactions (especially for anticipated bad outcomes), while others showed subtle to no reactions at all.
Key Findings
- Reactions to bad outcomes are more salient: Participants displayed more diverse and visible reactions when anticipating bad outcomes
- Multimodal and evolving responses: Anticipatory behaviors include facial expressions, head motion, body pose changes, and vocalizations that compound and evolve over time
- Person-dependent variability: Different participants showed varying degrees of reactivity
- Significant facial features: 12 facial action units showed significantly different activation patterns between good and bad anticipated outcomes, including:
- Inner Brow Raiser
- Brow Lowerer
- Cheek Raiser
- Nose Wrinkler
- Lip Corner Puller
- Jaw Drop
- Blink
- Model performance: Best models achieved ~60% accuracy on the curated dataset, with notable improvements when non-reactive participants were excluded
Applications
This research has implications for:
Robot Error Prevention
Systems that can detect and prevent errors before they happen
Human-Robot Collaboration Safety
Enhanced safety through anticipatory social cue detection
Adaptive Robot Behavior
Robots that respond to human social signals in real-time
Human-AI Collaboration
Proactive failure detection in collaborative systems