Recognizing Actions from Robotic View for Natural Human-Robot Interaction

Anonymous ICCV submission
Anonymous Code Point Cloud Data

ACTIVE is a large-scale human behavior understanding dataset designed for human-robot interaction (N-HRI) scenarios, featuring 46,868 video instances with RGB and LiDAR point cloud data, supporting action recognition and human attribute recognition tasks.

​​The Impact of Robotic Movement on Point Cloud Video​​

Looking Right

Thumbs Down

Touching Chin

Touching

Turning Clockwise

Waving

Clapping

Grasping

Phone Call

More examples of ACTIVE

Arms Crossed

Calling Over

Turning Clockwise

Drinking

Grasping

Hands on Waist

Leftward

Nodding

Phone Call

Pointing Down

Pointing Up

Raising Arms

Rightward

Scratching Head

Shooing Away

Shrugging Shoulders

Stretching

Texting

Thumbs Down

Thumbs Up

Touching Chin

Touching

Turning Counterclockwise

Waving

Abstract

Natural Human-Robot Interaction (N-HRI) requires robots to recognize human actions at various distances and states, while the robot itself may be in motion or stationary. This setup is more flexible and practical than traditional human action recognition tasks. However, existing benchmarks are designed for conventional human action recognition and fail to address the complexities of understanding human action in N-HRI, given the limited data, data modalities, task categories, and diversity in subjects and environments. To understand human behavior in N-HRI, we introduce ACTIVE (Action in Robotic View), a large-scale human action dataset for N-HRI. ACTIVE includes 30 composite action categories with labels, 80 participants, and 46,868 video instances, covering both point cloud and RGB modalities. During data capture, participants perform various human actions in diverse environments at different distances (from 3m to 50m), with the camera platform also in motion to simulate varying robot states. This comprehensive and challenging benchmark aims to advance research on human action understanding in N-HRI, such as action recognition and attribute recognition. For recognizing actions in robotic view, we propose ACTIVE-PC, which achieves accurate perception of human actions at long distances through Multilevel Neighborhood Sampling, Layered Recognizers and Elastic Ellipse Query, along with precise decoupling of kinematic interference and human actions. Experiments demonstrate the effectiveness of this method on the ACTIVE dataset.

Samples for attribute recognition

Sample image