Summary

People detection and tracking in video are fundamental Computer Vision capabilities that still constitute a research challenge. Important difficulties are due to the partial occlusions of the objects of interest (people), the dynamic background (possibly due to the motion of the observer), and the foreground clutter (due to non-person objects in motion). Traditional methods ignore one or more of these aspects, and this prevents from enabling tracking multiple people from a moving platform, even in slightly crowded or cluttered conditions. In (Tu et al., 2008) these issues are addressed all at once by exploiting both people appearance and shape cues in an online optimization framework based on Expectation Maximization. Given initial hypothesis of people positions nominated by a discriminative head and shoulder detector, operating at a high false alarm rate, images are analyzed by optimally assigning each image patch to the most likely person hypothesis. This amounts to automatically reject the false hypothesis, find how many people are present in the scene, localize them, and describe how they occlude each other.

References

  1. ECCV
    Unified crowd segmentation Tu, P., Sebastian, T., Doretto, G., Krahnstoever, N., Rittscher, J., and Yu, T. In Proceedings of European Conference on Computer Vision, 2008. abstract bibTeX pdf