PHUMA
Physically-Grounded Humanoid Locomotion Dataset

Under Review

Kyungmin Lee$\dagger$,  Sibeen Kim$\dagger$,  Minho Park,  Hyunseung Kim,  Dongyoon Hwang,  Hojoon Lee,  Jaegul Choo.

KAIST

$\dagger$: Equal contribution

PHUMA Examples

Abstract

Motion imitation is a promising approach for humanoid locomotion, enabling agents to acquire humanlike behaviors. Existing methods typically rely on high-quality motion capture datasets such as AMASS, but these are scarce and expensive, limiting scalability and diversity. Recent studies attempt to scale data collection by converting large-scale internet videos, exemplified by Humanoid-X. However, they often introduce physical artifacts such as floating, penetration, and foot skating, which hinder stable imitation. In response, we introduce PHUMA, a Physically-grounded HUMAnoid locomotion dataset that leverages human video at scale, while addressing physical artifacts through careful data curation and physics-constrained retargeting. PHUMA enforces joint limits, ensures ground contact, and eliminates foot skating, producing motions that are both large-scale and physically reliable. We evaluated PHUMA in two sets of conditions: (i) imitation of unseen motion from self-recorded test videos and (ii) path following with pelvis-only guidance. In both cases, PHUMA-trained policies outperform Humanoid-X and AMASS, achieving significant gains in imitating diverse motions..

Overview

Our four-stage pipeline for motion imitation learning includes: (1) Motion Curation, where we filter out problematic motions from a diverse dataset; (2) Motion Retargeting, where the filtered motions are retargeted to the humanoid using PhySINK, incorporating a series of losses.; (3) Policy Learning, where a policy is trained to imitate the retargeted motions; and (4) Inference, where the trained policy is used to control the humanoid, enabling it to imitate motions from unseen videos processed by a video-to-motion model.

Evaluation of Retargeting Methods

The video below shows the example of retargeting results using different retargeting methods. Mink shows unnatural locomotion patterns where the humanoid appears to walk on a tightrope, and SINK shows more human-like results, but still shows some floating and penetration. Our PhySINK shows not only remaining human-like, but also physically reliable motions.

Imitation Performance

In motion imitation tasks, we train policies on PHUMA and AMASS, and evaluate them on unseen motions. We use MaskedMimic for training, and all policies are trained on IsaacGym. The videos below show qualitative motion imitation results using the student policy on unseen motions. The ghost represents the reference motion to be imitated. As shown, the policy trained on PHUMA demonstrates better motion imitation performance compared to the policy trained on AMASS.

Path Following Performance

In path following tasks, we distill the teacher policy by providing only the pelvis information (position(blue) and rotation(green)) from the reference motions. We train path following policies on PHUMA and AMASS, and evaluate them on unseen paths. The videos below show qualitative path following results using the student policy on unseen paths.

Citation

If you find our work useful, please consider citing the paper as follows:

@article{lee2025phuma,
	title={PHUMA: Physically-Grounded Humanoid Locomotion Dataset}, 
	author={Kyungmin Lee and Sibeen Kim and Minho Park and Hyunseung Kim and Dongyoon Hwang and Hojoon Lee and Jaegul Choo},
	journal={arXiv preprint arXiv:2510.26236},
	year={2025},
}