Long-duration tracking of individuals across large sites is a challenge. Trucks of individuals from disjoint fields of view need to be linked, despite the same person appearing in a different pose, from a different viewpoint, and under different illumination conditions. This is an identity-matching problem, which could be approached by using traditional biometric cues, such as face. However, practical scenarios prevent from relying on good quality acquisition of face images at standoff distance. In luck of stable biometric data, one can revert to the whole-body appearance information, provided that a person will not change clothes between sightings (Doretto et al., 2011)(Wang et al., 2007). This person re-identification problem can be approached by designing methods for learning whole-body appearance representations that are invariant to the high intra-class variance (Sabri et al., 2022)(Siyahjani et al., 2015) induced by the unrestricted nuisance factors of variations, i.e., pose, illumination, viewpoint, background, and sensor noise.
References
ISVC
Joint Discriminative and Metric Embedding Learning for Person Re-Identification
Sabri, S. I.,
Randhawa, Z. A.,
and Doretto, G.
In Advances in Visual Computing,
2022.
OralabstractbibTeXarXivpdfdoi
Person re-identification is a challenging task because of the high intra-class variance induced by the unrestricted nuisance factors of variations such as pose, illumination, viewpoint, background, and sensor noise. Recent approaches postulate that powerful architectures have the capacity to learn feature representations invariant to nuisance factors, by training them with losses that minimize intra-class variance and maximize inter-class separation, without modeling nuisance factors explicitly. The dominant approaches use either a discriminative loss with margin, like the softmax loss with the additive angular margin, or a metric learning loss, like the triplet loss with batch hard mining of triplets. Since the softmax imposes feature normalization, it limits the gradient flow supervising the feature embedding. We address this by joining the losses and leveraging the triplet loss as a proxy for the missing gradients. We further improve invariance to nuisance factors by adding the discriminative task of predicting attributes. Our extensive evaluation highlights that when only a holistic representation is learned, we consistently outperform the state-of-the-art on the three most challenging datasets. Such representations are easier to deploy in practical systems. Finally, we found that joining the losses removes the requirement for having a margin in the softmax loss while increasing performance.
@inproceedings{sabriRD22isvc,
abbr = {ISVC},
author = {Sabri, S. I. and Randhawa, Z. A. and Doretto, G.},
booktitle = {Advances in Visual Computing},
title = {Joint Discriminative and Metric Embedding Learning for Person Re-Identification},
year = {2022},
address = {Cham},
editor = {Bebis, G. and Li, B. and Yao, A. and Liu, Y. and Duan, Y. and Lau, M. and Khadka, R. and Crisan, A. and Chang, R.},
pages = {165--178},
month = oct,
publisher = {Springer International Publishing},
bib2html_pubtype = {Conferences},
doi = {10.1007/978-3-031-20716-7_13},
arxiv = {2212.14107},
wwwnote = {Oral}
}
ICCV
A Supervised Low-rank Method for Learning Invariant Subspaces
Siyahjani, F.,
Almohsen, R.,
Sabri, S.,
and Doretto, G.
In Proceedings of IEEE International Conference on Computer Vision,
2015.
abstractbibTeXpdf
Sparse representation and low-rank matrix decomposi- tion approaches have been successfully applied to several computer vision problems. They build a generative repre- sentation of the data, which often requires complex training as well as testing to be robust against data variations in- duced by nuisance factors. We introduce the invariant com- ponents, a discriminative representation invariant to nui- sance factors, because it spans subspaces orthogonal to the space where nuisance factors are defined. This allows de- veloping a framework based on geometry that ensures a uni- form inter-class separation, and a very efficient and robust classification based on simple nearest neighbor. In addi- tion, we show how the approach is equivalent to a local metric learning, where the local metrics (one for each class) are learned jointly, rather than independently, thus avoiding the risk of overfitting without the need for additional regu- larization. We evaluated the approach for face recognition with highly corrupted training and testing data, obtaining very promising results.
@inproceedings{siyahjaniASD15iccv,
abbr = {ICCV},
author = {Siyahjani, F. and Almohsen, R. and Sabri, S. and Doretto, G.},
title = {{A Supervised Low-rank Method for Learning Invariant Subspaces}},
booktitle = {Proceedings of IEEE International Conference on Computer Vision},
year = {2015},
pages = {4220--4228},
owner = {doretto},
timestamp = {2015.09.16},
bib2html_pubtype = {Conferences}
}
JAIHC
Appearance-based person reidentification in camera networks: problem
overview and current approachesDoretto, G.,
Sebastian, T.,
Tu, P.,
and Rittscher, J.
Journal of Ambient Intelligence and Humanized Computing,
2011.
abstractbibTeXpdfhtml
Recent advances in visual tracking methods allow following a given
object or individual in presence of significant clutter or partial
occlusions in a single or a set of overlapping camera views. The
question of when person detections in different views or at different
time instants can be linked to the same individual is of fundamental
importance to the video analysis in large-scale network of cameras.
This is the person reidentification problem. The paper focuses on
algorithms that use the overall appearance of an individual as opposed
to passive biometrics such as face and gait. Methods that effectively
address the challenges associated with changes in illumination, pose,
and clothing appearance variation are discussed. More specifically,
the development of a set of models that capture the overall appearance
of an individual and can effectively be used for information retrieval
are reviewed. Some of them provide a holistic description of a person,
and some others require an intermediate step where specific body
parts need to be identified. Some are designed to extract appearance
features over time, and some others can operate reliably also on
single images. The paper discusses algorithms for speeding up the
computation of signatures. In particular it describes very fast procedures
for computing co-occurrence matrices by leveraging a generalization
of the integral representation of images. The algorithms are deployed
and tested in a camera network comprising of three cameras with non-overlapping
field of views, where a multi-camera multi-target tracker links the
tracks in different cameras by reidentifying the same people appearing
in different views.
@article{dorettoSTR11jaihc,
abbr = {JAIHC},
author = {Doretto, G. and Sebastian, T. and Tu, P. and Rittscher, J.},
title = {Appearance-based person reidentification in camera networks: problem
overview and current approaches},
journal = {Journal of Ambient Intelligence and Humanized Computing},
year = {2011},
volume = {2},
pages = {127-151},
affiliation = {West Virginia University, P.O. Box 6901, Morgantown, WV 26506, USA},
bib2html_pubtype = {Journals},
bib2html_rescat = {Human Reidentification, Identity Management, Video Analysis, Appearance
Modeling, Shape and Appearance Modeling, Integral Image Computations,
Track Matching},
file = {dorettoSTR11jaihc.pdf:doretto\\journal\\dorettoSTR11jaihc.pdf:PDF},
issn = {1868-5137},
issue = {2},
keyword = {Engineering},
owner = {doretto},
publisher = {Springer Berlin / Heidelberg},
timestamp = {2010.10.17},
url = {http://dx.doi.org/10.1007/s12652-010-0034-y}
}
ICCV
Shape and appearance context modeling
Wang, X.,
Doretto, G.,
Sebastian, T. B.,
Rittscher, J.,
and Tu, P. H.
In Proceedings of IEEE International Conference on Computer Vision,
2007.
abstractbibTeXpdf
In this work we develop appearance models for computing the similarity
between image regions containing deformable objects of a given class
in realtime. We introduce the concept of shape and appearance context.
The main idea is to model the spatial distribution of the appearance
relative to each of the object parts. Estimating the model entails
computing occurrence matrices. We introduce a generalization of the
integral image and integral histogram frameworks, and prove that
it can be used to dramatically speed up occurrence computation. We
demonstrate the abiity of this framework to recognize an individual
walking across a network of cameras. Finally, we show that the proposed
approach outperforms several other methods.
@inproceedings{wangDSRT07iccv,
abbr = {ICCV},
author = {Wang, X. and Doretto, G. and Sebastian, T. B. and Rittscher, J. and Tu, P. H.},
title = {Shape and appearance context modeling},
booktitle = {Proceedings of IEEE International Conference on Computer Vision},
year = {2007},
pages = {1--8},
bib2html_pubtype = {Conferences},
bib2html_rescat = {Human Reidentification, Video Analysis, Appearance Modeling, Shape
and Appearance Modeling, Integral Image Computations, Track Matching},
file = {wangDSRT07iccv.pdf:doretto\\conference\\wangDSRT07iccv.pdf:PDF;wangDSRT07iccv.pdf:doretto\\conference\\wangDSRT07iccv.pdf:PDF},
owner = {doretto},
timestamp = {2007.01.19}
}