Next: Proxy-Based Compression Up: Proxy-Based Compression of D Previous: Introduction

Related Work

Geometric structure of real-life scenes can be captured using multicamera setups, range scanners, etc. Several teleimmersion systems have been built for this purpose over the past decade or so [11,15,10,2,16,4,17]. They attempt to capture dense or sparse 3D structure of the scene using cameras as time-varying depth and texture maps or depth movies.

Disparity compensated compression of multiview images exploit the spatial redundancy in a lightfield. Levkovich-Maslyuk et al. compute the disparity maps and use them to predict the appearance in other views to exploit spatial redundancy of the lightfield [8]. Magnor et al. use block-based disparity compensation to predict other views of a multiview set [9]. They computed the disparity or depth maps for encoding but never stored them. Wu et al. extended disparity compensation to dynamic lightfields to compress video objects [14]. They used an MPEG-like algorithm to compress multiple video data using temporal correlation within one video stream and spatial correlation using depth values across views. They, however, treated depth maps as incompressible and encoded it in a separate layer without any loss.

The compression of depth maps hasn't attracted much attention before. Standard image compression methods like JPEG emphasize perceived visual quality and do not suit depth image compression [6]. They advocate the use of region of interest (RoI) and reshaping of the depths to preserve depth edges or occlusion boundaries when dealing with depth maps. Kum and Mayer-Patel compress multiple depth streams of a scene using motion vectors derived from color and depths [7]. They concluded that separate motion vectors to encode color and depth perform better than a common one. Penta and Narayanan introduced proxy-based compression of depth images of static scenes [12]. The geometric proxy model was used as the common structure of the scene and each depth map was represented as difference with the proxy. Simple triangulated proxies and parametric proxies like ellipsoids provided good compression and quality on static 3D scenes by them.

We extend the proxy-based compression to depth movies of human actors in this paper. The input to the system is multiple depth movies of a scene captured from a setup of cameras all around the scene as showing in figure 2.

**Figure 2:** Setting of 20 cameras around the scene.
$\includegraphics[width=8cm,height=4.5cm]{figures/camera_setup.eps}$

We first fit an articulated proxy model to the point cloud for each frame. The output of this process is the joint angles of the model, which are the parameters of the proxy. Parametrization of the proxy reduces its memory requirements. Several methods try to fit an articulated model to real human data [13,3,1]. The fitted proxy for each frame is projected to each view to give predictions of the input depth maps. The residue or prediction error is sufficient to recover the input depth maps losslessly. The temporal correlation is exploited by encoding the residue differences between successive time instants.

Next: Proxy-Based Compression Up: Proxy-Based Compression of D Previous: Introduction

2008-04-27