Problem:
Despite a significant amount of research focussing on recognizing actions and activities, the problem of their time localization has received considerably less attention. Even more so, if the localization has to be performed online, and enable the processing of video in realtime.
Also another main challenge is handling the complexity of the variability of data that represent activities, which is inherently multidimensional.
Solution:
In this work we propose an online approach that copes with the high dimensions of the data, as well as the complexity of their variability by combining notions from two well understood theories and formalisms. The first one is the theory on reproducing kernel Hilbert spaces , and the second is the theory on state space models.
Method:
We have introduced two methods (KSS and KR) based on RKHS and state space models to detect change points in a video. Consider we are interested in detecting when an activity starts and ends in a video, we may consider this as a change point detection in a video. Figure below shows how our methods are able to detect the begining and ending an interaction in a video compare to Maximum Mean Discrepancy (MMD method):
This video shows an example of detecting and recognizing an interaction in a video:
In this work, we have presented our newly collocted person interaction dataset, Human Activities Under Surveillance – Person Interaction (HAUS-PI) dataset. HAUS-PI has 16 person interaction classes: handshaking, hugging, high-fiving, kicking, punching, pushing, slapping, bowing, waving, starring, getting up, contraband exchange, shooting, stabbing, talking, and patting. Since the participants were allowed to enter the scene from any direction, the interactions are recorded with a very high viewpoint change variation. High viewpoint change variation together with the number of classes collected and the number of samples per class (around 45) make this a very challenging dataset.
Video below show some samples of the HAUS-PI dataset:k
Results:
We have evaluated our methods using 3 different criterias, AMOC Curves, F1 Score, and Rand Index Score. We have also computed recognition accuracy using detected segments and compared it with ground truth segments. Results shows the effectiveness of the proposed methods.
For example figure below shows AMOC curves for the HAUS-PI dataset. Sensitivity of the normalized time to detection with respect to the length τ of the test time window, for the KSS model (left), and for the MMD model (center). Right: Comparison between the KSS, KR, and MMD models