Summary
Detection and recognition of continuous activities from video is a core problem to address for enabling intelligent systems that can extract and manage content fully automatically. Recent years have seen a concentration of works revolving around the problem of recognizing single-person actions, as well as group activities. On the other hand, the area of modeling the interactions between two people is still relatively unexplored. While there are a lot of large datasets on single person action recognition, the computer vision community is deprived from a large and challenging binary human interactions dataset. Therefore, we introduce a new video dataset recording several binary human interactions from four different views.
Some components of the dataset are in this repository
Related publications include (Motiian et al., 2013) , (Siyahjani et al., 2014) , (Motiian et al., 2017).
References
Online Human Interaction Detection and Recognition with Multiple Cameras
Motiian, S.,
Siyahjani, F.,
Almohsen, R.,
and Doretto, G.
IEEE Transactions on Circuits and Systems for Video Technology,
2017.
abstract
bibTeX
pdf
We address the problem of detecting and recognizing online the occurrence of human interactions as seen by a network of multiple cameras. We represent interactions by forming temporal trajectories, coupling together the body motion of each individual and their proximity relationships with others, and also sound whenever available. Such trajectories are modeled with kernel state-space (KSS) models. Their advantage is being suitable for the online interaction detection, recognition, and also for fusing information from multiple cameras, while enabling a fast implementation based on online recursive updates. For recognition, in order to compare interaction trajectories in the space of KSS models, we design so-called pairwise kernels with a special symmetry. For detection, we exploit the geometry of linear operators in Hilbert space, and extend to KSS models the concept of parity space, originally defined for linear models. For fusion, we combine KSS models with kernel construction and multiview learning techniques. We extensively evaluate the approach on four single view publicly available data sets, and we also introduce, and will make public, a new challenging human interactions data set that we have collected using a network of three cameras. The results show that the approach holds promise to become an effective building block for the analysis of real-time human behavior from multiple cameras.
@article{motiianSAD2015tcsvt,
abbr = {TCSVT},
author = {Motiian, S. and Siyahjani, F. and Almohsen, R. and Doretto, G.},
title = {{Online Human Interaction Detection and Recognition with Multiple Cameras}},
journal = {IEEE Transactions on Circuits and Systems for Video Technology},
year = {2017},
volume = {27},
number = {3},
pages = {649--663},
owner = {doretto},
timestamp = {2015.09.16},
bib2html_pubtype = {Journals}
}
Online Geometric Human Interaction Segmentation and Recognition
Siyahjani, F.,
Motiian, S.,
Bharthavarapu, H.,
Sharlemin, S.,
and Doretto, G.
In Proceedings of IEEE International Conference on Multimedia and Expo,
2014.
abstract
bibTeX
pdf
We address the problem of online temporal segmentation and recognition
of human interactions in video sequences. The complexity of the high-dimensional
data variability representing interactions is handled by combining
kernel methods with linear models, giving rise to kernel regression
and kernel state space models. By exploiting the geometry of linear
operators in Hilbert space, we show how the concept of parity space,
defined for linear models, generalizes to the kernellized extensions.
This provides a powerful and flexible framework for online temporal
segmentation and recognition. We extensively evaluate the approach
on a publicly available dataset, and on a new challenging human interactions
dataset that we have collected. The results show that the approach
holds the promise to become an effective building block for the analysis
in real-time of human behavior.
@inproceedings{siyahjaniMBSD14icme,
abbr = {ICME},
author = {Siyahjani, F. and Motiian, S. and Bharthavarapu, H. and Sharlemin, S. and Doretto, G.},
title = {Online Geometric Human Interaction Segmentation and Recognition},
booktitle = {Proceedings of IEEE International Conference on Multimedia and Expo},
year = {2014},
month = jul,
bib2html_pubtype = {Conferences},
bib2html_rescat = {Interaction Recognition, Video Analysis},
file = {siyahjaniMBSD14icme.pdf:doretto/conference/siyahjaniMBSD14icme.pdf:PDF},
owner = {doretto},
timestamp = {2014.05.21}
}
Pairwise Kernels for Human Interaction Recognition
Motiian, S.,
Feng, K.,
Bharthavarapu, H.,
Sharlemin, S.,
and Doretto, G.
In Advances in Visual Computing,
2013.
Oral
abstract
bibTeX
pdf
html
doi
In this paper we model binary people interactions by forming tempo- ral interaction trajectories, under the form of a time series, coupling together the body motion of each individual as well as their proximity relationships. Such tra- jectories are modeled with a non-linear dynamical system (NLDS). We develop a framework that entails the use of so-called pairwise kernels, able to compare interaction trajectories in the space of NLDS. To do so we address the problem of modeling the Riemannian structure of the trajectory space, and we also prove that kernels have to satisfy certain symmetry properties, which are peculiar of this interaction modeling framework. Experiment results show that this approach is quite promising, as it is able to match and improve state-of-the-art classification and retrieval accuracies on two human interaction datasets.
@incollection{motiianFBSD13isvc,
abbr = {ISVC},
author = {Motiian, S. and Feng, K. and Bharthavarapu, H. and Sharlemin, S. and Doretto, G.},
title = {Pairwise Kernels for Human Interaction Recognition},
booktitle = {Advances in Visual Computing},
publisher = {Springer Berlin Heidelberg},
year = {2013},
volume = {8034},
series = {Lecture Notes in Computer Science},
pages = {210-221},
bib2html_pubtype = {Conferences},
bib2html_rescat = {Interaction Recognition, Video Analysis},
doi = {10.1007/978-3-642-41939-3_21},
file = {motiianFBSD13isvc.pdf:doretto/conference/motiianFBSD13isvc.pdf:PDF},
isbn = {978-3-642-41938-6},
owner = {doretto},
timestamp = {2013.10.18},
url = {http://dx.doi.org/10.1007/978-3-642-41939-3_21},
wwwnote = {Oral}
}