Ph.D. received on: 8/7/1997E-mail: dibe@dei.unipd.it
Tutor: Prof. Ruggero Frezza, Università di Padova
___________________________________________________________________________________________________________
Methods for estimating the motion of articulated objects via remote sensors ___________________________________________________________________________________________________________Advisor:
Prof. Ruggero Frezza, Università di Padova
Summary:
Visual estimation and tracking of the motion and gestures of the human body is an interesting and exciting computational problem for two reasons, (a) from the engineering standpoint, a non-invasive machine that could track body motion would be invaluable in facilitating most human-machine interactions; (b) it is an important scientific problem in its own right.
Observing the human body in motion is key to a large number of activities and applications:
- Security. In museums, factories and other locations that are either dangerous or sensitive it is crucial to detect the presence of humans and monitor/classify their behavior based upon their gait and gestures.
- Animation. The entertainment industry makes increasing use of actor-to-cartoon animations where the motion of cartoon figures and rendered models is obtained by tracking the motion of a real person.
- Virtual reality. The motion of the user of a virtual reality system is necessary to adjust display parameters and animations.
- Human-machine interfaces. The motion of the human body may be used as a convenient interface between man and machine. For example the hand could be used as a 3D mouse.
- Biomechanics. Reconstructing the 3D motion of human limbs is used for clinical evaluation of orthopedic patients and for training of both professional and amateur athletes.
- Signaling. In airports, at sea, and in other high-noise environments the arms and torso are used for signaling.
- Camera control. Active camera control based on the motion of humans can be used for sport events, conferences, and shows, thus replacing human operators. It may also be used to make human operators more effective in security monitoring.
- Traffic monitoring. Pedestrians are often a component of street traffic. They need to be detected and their behavior understood (e.~g. intention to cross at a traffic light, gesture signaling for emergency help) in order to help avoid collisions and dangerous situations, and in order to detect accidents immediately.
- Customer monitoring. Data on the behavioral pattern of exploration and purchasing of store customers is extremely valuable to advertising companies, producers and sales management.Current techniques for tracking the human body involve a large variety of methods. Security, traffic monitoring, signaling, and customer monitoring are typically implemented using human observers that survey the scene either directly or via a multiple camera closed circuit TV system. For animation and biomechanics multiple camera systems and manual tracking of features across image sequences is used. For virtual reality an assortment of gloves, suits, joysticks and inductive coils is used. For human-machine interfaces we have joysticks, mice and keyboards.
All of these methods require either employing dedicated human operators or using ad-hoc sensors. This results in a number of limitations:
- Practicality -- the user needs to wear markers or other ad-hoc equipment which may be impractical, uncomfortable, constrain the user to a limited work space, be difficult to transport;
- Cost -- computational and sensory hardware and human operator time.
- Timeliness -- The data may not be available in real-time, but only after a lag required to process a batch of images, allow communication between human operators etc.
If tracking the human body could be made automatic and non-invasive, and therefore cheaper, more practical and faster, not only the applications listed above could be better performed, but also a number of new applications would be feasible.
Several components participate in the process of the formation of a sequence of images on the CCD sensor of the camera:
- the geometry of the objects present in the camera field of view;
- the subject motion;
- the reflectance of the surfaces;
- the geometry and spectral properties of the illumination source;
- the camera position in the scene, the optics and the CCD sensor properties.
The problem Computer Graphics people try to solve is to study these complex interactions in order to generate realistic synthetic images.
We address the opposite problem: given a sequence of images we would like to estimate the motion of the subject in the three dimensional space. Reconstruct 3D quantities from their perspective projections is an ill-condition problem unless prior knowledge on the 3D geometry of the observed object is available.
In this thesis we propose a recursive approach to the problem of motion estimation: the state of the dynamical system is given by position and velocity of the links of the kinematic chain approximating the subject's body; in the prediction step the new state value is computed based on a simplified human body dynamical model; this value is updated in such a way to minimize the difference between the acquired image and an “expected image”, computed using the state prediction and the prior knowledge on 3D body geometry and camera parameters.
The algorithm has been implemented on dedicated hardware, that makes possible the image stream processing at frame rate (30 frames/sec). A detailed description of the complete system is included.
_______________________________________