Providing a semantic interpretation of the actions and interactions between the actors in a scene enables behavior analysis for automated decision-making. This is still a challenging problem, especially when performed online. The problem is aggravated when the track of each actor is fragmented and not linked, due to occlusions, traffic, and tracking errors (Chan et al., 2006)(Chan et al., 2006). In the online settings, behavior might depend on the interactions between pairs of actors, which ultimately constitute the building blocks of more complex group behavior (Motiian et al., 2013). A major challenge is to detect and recognize online such pairwise interactions in a causal fashion (Siyahjani et al., 2014), and by leveraging multiple camera sensors whenever available (Motiian et al., 2017). Correctly interpreting behavioral traits can also be used to characterize identity, and can be an important discriminator at large standoff distances.
Recognizing human interactions from video is an important step forward towards the long-term goal of performing scene understanding fully automatically. Rece...
In aerial video moving objects of interest are typically very small, and being able to detect them is key to enable tracking. There are detection methods tha...
References
TCSVT
Online Human Interaction Detection and Recognition with Multiple CamerasMotiian, S.,
Siyahjani, F.,
Almohsen, R.,
and Doretto, G.IEEE Transactions on Circuits and Systems for Video Technology,
2017.
abstractbibTeXpdf
We address the problem of detecting and recognizing online the occurrence of human interactions as seen by a network of multiple cameras. We represent interactions by forming temporal trajectories, coupling together the body motion of each individual and their proximity relationships with others, and also sound whenever available. Such trajectories are modeled with kernel state-space (KSS) models. Their advantage is being suitable for the online interaction detection, recognition, and also for fusing information from multiple cameras, while enabling a fast implementation based on online recursive updates. For recognition, in order to compare interaction trajectories in the space of KSS models, we design so-called pairwise kernels with a special symmetry. For detection, we exploit the geometry of linear operators in Hilbert space, and extend to KSS models the concept of parity space, originally defined for linear models. For fusion, we combine KSS models with kernel construction and multiview learning techniques. We extensively evaluate the approach on four single view publicly available data sets, and we also introduce, and will make public, a new challenging human interactions data set that we have collected using a network of three cameras. The results show that the approach holds promise to become an effective building block for the analysis of real-time human behavior from multiple cameras.
@article{motiianSAD2015tcsvt,
abbr = {TCSVT},
author = {Motiian, S. and Siyahjani, F. and Almohsen, R. and Doretto, G.},
title = {{Online Human Interaction Detection and Recognition with Multiple Cameras}},
journal = {IEEE Transactions on Circuits and Systems for Video Technology},
year = {2017},
volume = {27},
number = {3},
pages = {649--663},
owner = {doretto},
timestamp = {2015.09.16},
bib2html_pubtype = {Journals}
}
ICME
Online Geometric Human Interaction Segmentation and Recognition
Siyahjani, F.,
Motiian, S.,
Bharthavarapu, H.,
Sharlemin, S.,
and Doretto, G.
In Proceedings of IEEE International Conference on Multimedia and Expo,
2014.
abstractbibTeXpdf
We address the problem of online temporal segmentation and recognition
of human interactions in video sequences. The complexity of the high-dimensional
data variability representing interactions is handled by combining
kernel methods with linear models, giving rise to kernel regression
and kernel state space models. By exploiting the geometry of linear
operators in Hilbert space, we show how the concept of parity space,
defined for linear models, generalizes to the kernellized extensions.
This provides a powerful and flexible framework for online temporal
segmentation and recognition. We extensively evaluate the approach
on a publicly available dataset, and on a new challenging human interactions
dataset that we have collected. The results show that the approach
holds the promise to become an effective building block for the analysis
in real-time of human behavior.
@inproceedings{siyahjaniMBSD14icme,
abbr = {ICME},
author = {Siyahjani, F. and Motiian, S. and Bharthavarapu, H. and Sharlemin, S. and Doretto, G.},
title = {Online Geometric Human Interaction Segmentation and Recognition},
booktitle = {Proceedings of IEEE International Conference on Multimedia and Expo},
year = {2014},
month = jul,
bib2html_pubtype = {Conferences},
bib2html_rescat = {Interaction Recognition, Video Analysis},
file = {siyahjaniMBSD14icme.pdf:doretto/conference/siyahjaniMBSD14icme.pdf:PDF},
owner = {doretto},
timestamp = {2014.05.21}
}
ISVC
Pairwise Kernels for Human Interaction RecognitionMotiian, S.,
Feng, K.,
Bharthavarapu, H.,
Sharlemin, S.,
and Doretto, G.
In Advances in Visual Computing,
2013.
OralabstractbibTeXpdfhtmldoi
In this paper we model binary people interactions by forming tempo- ral interaction trajectories, under the form of a time series, coupling together the body motion of each individual as well as their proximity relationships. Such tra- jectories are modeled with a non-linear dynamical system (NLDS). We develop a framework that entails the use of so-called pairwise kernels, able to compare interaction trajectories in the space of NLDS. To do so we address the problem of modeling the Riemannian structure of the trajectory space, and we also prove that kernels have to satisfy certain symmetry properties, which are peculiar of this interaction modeling framework. Experiment results show that this approach is quite promising, as it is able to match and improve state-of-the-art classification and retrieval accuracies on two human interaction datasets.
@incollection{motiianFBSD13isvc,
abbr = {ISVC},
author = {Motiian, S. and Feng, K. and Bharthavarapu, H. and Sharlemin, S. and Doretto, G.},
title = {Pairwise Kernels for Human Interaction Recognition},
booktitle = {Advances in Visual Computing},
publisher = {Springer Berlin Heidelberg},
year = {2013},
volume = {8034},
series = {Lecture Notes in Computer Science},
pages = {210-221},
bib2html_pubtype = {Conferences},
bib2html_rescat = {Interaction Recognition, Video Analysis},
doi = {10.1007/978-3-642-41939-3_21},
file = {motiianFBSD13isvc.pdf:doretto/conference/motiianFBSD13isvc.pdf:PDF},
isbn = {978-3-642-41938-6},
owner = {doretto},
timestamp = {2013.10.18},
url = {http://dx.doi.org/10.1007/978-3-642-41939-3_21},
wwwnote = {Oral}
}
CVPR
Joint recognition of complex events and track matching
Chan, M. T.,
Hoogs, A.,
Bhotika, R.,
Perera, A.,
Schmiederer, J.,
and Doretto, G.
In Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition,
2006.
abstractbibTeXpdfdoi
We present a novel method for jointly performing recognition of complex
events and linking fragmented tracks into coherent, long-duration
tracks. Many event recognition methods require highly accurate tracking,
and may fail when tracks corresponding to event actors are fragmented
or partially missing. However, these conditions occur frequently
from occlusions, traffic and tracking errors. Recently, methods have
been proposed for linking track fragments from multiple objects under
these difficult conditions. Here, we develop a method for solving
these two problems jointly. A hypothesized event model, represented
as a Dynamic Bayes Net, supplies data-driven constraints on the likelihood
of proposed track fragment matches. These event-guided constraints
are combined with appearance and kinematic constraints used in the
previous track linking formulation. The result is the most likely
track linking solution given the event model, and the highest event
score given all of the track fragments. The event model with the
highest score is determined to have occurred, if the score exceeds
a threshold. Results demonstrated on a busy scene of airplane servicing
activities, where many non-event movers and long fragmented tracks
are present, show the promise of the approach to solving the joint
problem.
@inproceedings{chanHBPSD06cvpr,
abbr = {CVPR},
author = {Chan, M. T. and Hoogs, A. and Bhotika, R. and Perera, A. and Schmiederer, J. and Doretto, G.},
title = {Joint recognition of complex events and track matching},
booktitle = {Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition},
year = {2006},
volume = {2},
pages = {1615--1622},
address = {New York City, NY, USA},
month = jun,
bib2html_pubtype = {Conferences},
bib2html_rescat = {Video Analysis, Event Recognition, Track Matching},
doi = {10.1109/CVPR.2006.160},
file = {chanHBPSD06cvpr.pdf:doretto\\conference\\chanHBPSD06cvpr.pdf:PDF;chanHBPSD06cvpr.pdf:doretto\\conference\\chanHBPSD06cvpr.pdf:PDF},
issn = {1063-6919},
owner = {doretto},
timestamp = {2006.11.29}
}
ICPR
Event recognition with fragmented object tracks
Chan, M. T.,
Hoogs, A.,
Sun, Z.,
Schmiederer, J.,
Bhotika, R.,
and Doretto, G.
In Proceedings of the International Conference on Pattern Recognition,
2006.
abstractbibTeXpdfdoi
Complete and accurate video tracking is very difficult to achieve
in practice due to long occlusions, traffic clutter, shadows and
appearance changes. In this paper, we study the feasibility of event
recognition when object tracks are fragmented. By changing the lock
score threshold controlling track termination, different levels of
track fragmentation are generated. The effect on event recognition
is revealed by examining the event model match score as a function
of lock score threshold. Using a Dynamic Bayesian Network to model
events, it is shown that event recognition actually improves with
greater track fragmentation, assuming fragmented tracks for the same
object are linked together. The improvement continues up to a point
when it is more likely to be offset by other errors such as those
caused by frequent object reinitialization. The study is conducted
on busy scenes of airplane servicing activities where long tracking
gaps occur intermittently.
@inproceedings{chanHSSBD06icpr,
abbr = {ICPR},
author = {Chan, M. T. and Hoogs, A. and Sun, Z. and Schmiederer, J. and Bhotika, R. and Doretto, G.},
title = {Event recognition with fragmented object tracks},
booktitle = {Proceedings of the International Conference on Pattern Recognition},
year = {2006},
volume = {1},
pages = {412--416},
month = aug,
bib2html_pubtype = {Conferences},
bib2html_rescat = {Video Analysis, Event Recognition},
doi = {10.1109/ICPR.2006.513},
file = {chanHSSBD06icpr.pdf:doretto\\conference\\chanHSSBD06icpr.pdf:PDF;chanHSSBD06icpr.pdf:doretto\\conference\\chanHSSBD06icpr.pdf:PDF},
issn = {1051-4651},
owner = {doretto},
timestamp = {2006.11.29}
}