Tuesday, February 28, 2012


Learning shape models for monocular human pose estimation from the Microsoft Xbox Kinect
James Charles and Mark Everingham
School of Computing
University of Leeds

The problem the authors are trying to address in this paper is that due to variation in human body shape, it's hard to perform 'pose estimation'; figuring out what pose the person is in.  Most existing work doesn't accurately reflect human limbs because it uses cylindrical or conical representations, which don't have enough flexibility to fit the wide variety of body shapes.  The paper proposes a method to capture variation in limb shape by first inferring the pose from a binary silhouette, using a 'generative model of shape'.  Then, models of probabilistic shape templates for each limb are learnt from Kinect output by inferring segmentation of silhouettes into limbs.

The algorithm is based on a Pictorial Structure Model (PSM) for the human body.  Ten nodes represent limbs in the body, with edges connecting them.  The shape of each limb is independent, and each is parameterized based on its location and orientation.  The overall probability of a pose given a silhouette image is then calculated.  This alone won't give an entirely accurate pose though, because there are still outcomes where multiple limbs are in the same position or missed entirely.

By combining this PSM with sampling from templates learned by the Kinect, the shortcomings of relying on the PSM alone can be overcome.  The Kinect give joint, depth, and silhouette data.  The learning process segments the data from the Kinect into limbs.  By cross referencing this data with the PSM, higher accuracy can be achieved in pose estimation.





No comments:

Post a Comment