Capture of dynamic events is an active research area today. Capturing the
D geometric structure and photometric appearance of
dynamic scenes finds applications in 3D tele-conferencing systems, 3DTV etc.
The captured ``depth movies'' contain aligned sequences of depth maps and
textures and are often streamed to a distant location for immersive viewing. The
depth maps are heavy and need efficient compression schemes. In this
paper, we present a scheme to compress depth movies of human actors
using a parametric proxy model for the underlying action. We use a
generic articulated human model as the proxy and the various joint
angles as its parameters for each time instant to represent a common
prediction of the scene structure. The difference or residue between the
captured depth and the depth of the proxy represents the scene to
exploit spatial coherence. Differences in residues across time are used
to exploit temporal coherence. Intra-frame coded frames and
difference-coded frames provide random access and high compression. We
show results on several synthetic and real actions to demonstrate the
compression ratio and resulting quality using a depth-based rendering of
the decoded scene.