People detection and tracking in video are fundamental Computer Vision capabilities that still constitute a research challenge. Important difficulties are due to the partial occlusions of the objects of interest (people), the dynamic background (possibly due to the motion of the observer), and the foreground clutter (due to non-person objects in motion). Traditional methods ignore one or more of these aspects, and this prevents from enabling tracking multiple people from a moving platform, even in slightly crowded or cluttered conditions. In (Tu et al., 2008) these issues are addressed all at once by exploiting both people appearance and shape cues in an online optimization framework based on Expectation Maximization. Given initial hypothesis of people positions nominated by a discriminative head and shoulder detector, operating at a high false alarm rate, images are analyzed by optimally assigning each image patch to the most likely person hypothesis. This amounts to automatically reject the false hypothesis, find how many people are present in the scene, localize them, and describe how they occlude each other.
References
ECCV
Unified crowd segmentation
Tu, P.,
Sebastian, T.,
Doretto, G.,
Krahnstoever, N.,
Rittscher, J.,
and Yu, T.
In Proceedings of European Conference on Computer Vision,
2008.
abstractbibTeXpdf
This paper presents a unified approach to crowd segmentation. A global
solution is generated using an Expectation Maximization framework.
Initially, a head and shoulder detector is used to nominate an exhaustive
set of person locations and these form the person hypotheses. The
image is then partitioned into a grid of small patches which are
each assigned to one of the person hypotheses. A key idea of this
paper is that while whole body monolithic person detectors can fail
due to occlusion, a partial response to such a detector can be used
to evaluate the likelihood of a single patch being assigned to a
hypothesis. This captures local appearance information without having
to learn specific appearance models. The likelihood of a pair of
patches being assigned to a person hypothesis is evaluated based
on low level image features such as uniform motion fields and color
constancy. During the E-step, the single and pairwise likelihoods
are used to compute a globally optimal set of assignments of patches
to hypotheses. In the M-step, parameters which enforce global consistency
of assignments are estimated. This can be viewed as a form of occlusion
reasoning. The final assignment of patches to hypotheses constitutes
a segmentation of the crowd. The resulting system provides a global
solution that does not require background modeling and is robust
with respect to clutter and partial occlusion.
@inproceedings{tuSDKRY08eccv,
abbr = {ECCV},
author = {Tu, P. and Sebastian, T. and Doretto, G. and Krahnstoever, N. and Rittscher, J. and Yu, T.},
title = {Unified crowd segmentation},
booktitle = {Proceedings of European Conference on Computer Vision},
year = {2008},
pages = {691--704},
bib2html_pubtype = {Conferences},
bib2html_rescat = {Video Analysis, People Detection, Integral Image Computations,
People Tracking},
file = {tuSDKRY08eccv.pdf:doretto\\conference\\tuSDKRY08eccv.pdf:PDF},
owner = {doretto},
timestamp = {2008.01.16}
}
VLG organized the Bioinspired Machine Learning Workshop, which brought together scientists worldwide to discuss research at the intersection between neurosci...
New AI tools for pattern association discovery will be developed for crop phenomics, which could later support the prevention and treatment of genetic diseas...