Looking Right
Thumbs Down
Touching Chin
Touching
Turning Clockwise
Waving
Clapping
Grasping
Phone Call
Arms Crossed
Calling Over
Turning Clockwise
Drinking
Grasping
Hands on Waist
Leftward
Nodding
Phone Call
Pointing Down
Pointing Up
Raising Arms
Rightward
Scratching Head
Shooing Away
Shrugging Shoulders
Stretching
Texting
Thumbs Down
Thumbs Up
Touching Chin
Touching
Turning Counterclockwise
Waving
Natural Human-Robot Interaction (N-HRI) requires robots to recognize human actions at various distances and states, while the robot itself may be in motion or stationary. This setup is more flexible and practical than traditional human action recognition tasks. However, existing benchmarks are designed for conventional human action recognition and fail to address the complexities of understanding human action in N-HRI, given the limited data, data modalities, task categories, and diversity in subjects and environments. To understand human behavior in N-HRI, we introduce ACTIVE (Action in Robotic View), a large-scale human action dataset for N-HRI. ACTIVE includes 30 composite action categories with labels, 80 participants, and 46,868 video instances, covering both point cloud and RGB modalities. During data capture, participants perform various human actions in diverse environments at different distances (from 3m to 50m), with the camera platform also in motion to simulate varying robot states. This comprehensive and challenging benchmark aims to advance research on human action understanding in N-HRI, such as action recognition and attribute recognition. For recognizing actions in robotic view, we propose ACTIVE-PC, which achieves accurate perception of human actions at long distances through Multilevel Neighborhood Sampling, Layered Recognizers and Elastic Ellipse Query, along with precise decoupling of kinematic interference and human actions. Experiments demonstrate the effectiveness of this method on the ACTIVE dataset.