Thursday, February 9, 2012


Full DOF tracking of a hand interacting with an object by
modeling occlusions and physical constraints

Iason Oikonomidis, Nikolaos Kyriazis, Antonis A. Argyros

In this paper the authors devise a method by which to estimate the full pose of a hand that is being occluded by some object.  They treat this as an optimization problem, and infer information from the fact that the hand and the object that is occluding the hand cannot occupy the same space.  Therefore, the position of the occluding object tells you a great deal about how the hand is positioned.

To get an idea of the position of the hand, the scene is broken down into a sequence of multiframes, a set of images taken from different cameras at the same point in time.  A joint hand-object model is used to represent the hand and the object occluding it.  The hand model uses 27 parameters, giving it a depth of field of 26.  The authors then attempt to "estimate the parameters that give rise to the hand-object configuration that (a) is most compatible to the image features present in multiframe M (Sec. 2.1) and (b) is physically plausible in the sense that two different rigid bodies cannot share the same physical space (interpenetration constraints)."  They use an edge map and a skin color map to differentiate between the hand and the object, with the assumption that the object will NOT be skin colored.  A fair amount of math ensues.

On experiments with real image data, their system appears to identify the location and position of the hand quite well.  While there is no quantitative data, "visual inspection" shows that the accuracy is better than previous systems that this was tested against.

Their camera system actually consisted of 8 normal cameras mounted in a circle around the hand/object, rather than a Kinect, which I found out in the middle of reading the article / writing this entry.  However.  it would be interesting to see how this might be adapted for use with a single Kinect rather than their 8 camera setup.  There would probably be more restraints, because it's possible to not see the hand at all.  Likely you would have to have a certain amount of hand showing before you could make an accurate guess, but it might be doable.




No comments:

Post a Comment