In aerial video moving objects of interest are typically very small, and being able to detect them is key to enable tracking. There are detection methods that learn the background and distinguish when a foreground object is present. These approaches require the image sensor to be fixed, and a large amount of frames for learning the background. To avoid these constraints, one could use motion segmentation algorithms (which need as low as two consecutive frames) but the foreground objects are expected to be considerably big. When the objects are small (Perera et al., 2006) proposes to learn how to classify image regions into categories such as road, tree, grass, building, vehicle, shadow, and to integrate this information with a motion segmentation algorithm for extracting the moving objects. The method dramatically boosts the detection rate of small objects, enabling reliable tracking. Moreover, it is general in the sense that it is not bound to a particular motion segmentation approach.
Complex Event Analysis
A fundamental goal in high-level vision is the ability to analyze a large field of view (which might be observed by an aerial sensor), and give a semantic interpretation of the interactions between the actors in the scene. Almost all the approaches have been developed for the ideal scenario where very accurate tracking data of the actors is available, and can be used to infer the status of the site. A more realistic setting is when the tracks of each actor are fragmented, and the fragments are not linked. This assumption accounts for occlusions, traffic, and tracking errors. In (Chan et al., 2006) and (Chan et al., 2006) a framework is developed where a dynamic Bayesian network is used to represent the interactions between actors, and inference is done by estimating at the same time what is the most likely linking between fragments, given that a certain event is occurring, and what is the most likely occurring event, given the current linking. This means that this approach estimates the long-duration tracks, while the events are being recognized despite the high fragmentation. This is possible even in scenes with many non-involved movers, and under different scene viewpoints and/or configurations.
References
CVPR
Joint recognition of complex events and track matching
Chan, M. T.,
Hoogs, A.,
Bhotika, R.,
Perera, A.,
Schmiederer, J.,
and Doretto, G.
In Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition,
2006.
abstractbibTeXpdf
We present a novel method for jointly performing recognition of complex
events and linking fragmented tracks into coherent, long-duration
tracks. Many event recognition methods require highly accurate tracking,
and may fail when tracks corresponding to event actors are fragmented
or partially missing. However, these conditions occur frequently
from occlusions, traffic and tracking errors. Recently, methods have
been proposed for linking track fragments from multiple objects under
these difficult conditions. Here, we develop a method for solving
these two problems jointly. A hypothesized event model, represented
as a Dynamic Bayes Net, supplies data-driven constraints on the likelihood
of proposed track fragment matches. These event-guided constraints
are combined with appearance and kinematic constraints used in the
previous track linking formulation. The result is the most likely
track linking solution given the event model, and the highest event
score given all of the track fragments. The event model with the
highest score is determined to have occurred, if the score exceeds
a threshold. Results demonstrated on a busy scene of airplane servicing
activities, where many non-event movers and long fragmented tracks
are present, show the promise of the approach to solving the joint
problem.
@inproceedings{chanHBPSD06cvpr,
abbr = {CVPR},
author = {Chan, M. T. and Hoogs, A. and Bhotika, R. and Perera, A. and Schmiederer, J. and Doretto, G.},
title = {Joint recognition of complex events and track matching},
booktitle = {Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition},
year = {2006},
volume = {2},
pages = {1615--1622},
address = {New York City, NY, USA},
month = jun,
bib2html_pubtype = {Conferences},
bib2html_rescat = {Video Analysis, Event Recognition, Track Matching},
doi = {10.1109/CVPR.2006.160},
file = {chanHBPSD06cvpr.pdf:doretto\\conference\\chanHBPSD06cvpr.pdf:PDF;chanHBPSD06cvpr.pdf:doretto\\conference\\chanHBPSD06cvpr.pdf:PDF},
issn = {1063-6919},
owner = {doretto},
timestamp = {2006.11.29}
}
ICPR
Event recognition with fragmented object tracks
Chan, M. T.,
Hoogs, A.,
Sun, Z.,
Schmiederer, J.,
Bhotika, R.,
and Doretto, G.
In Proceedings of the International Conference on Pattern Recognition,
2006.
abstractbibTeXpdf
Complete and accurate video tracking is very difficult to achieve
in practice due to long occlusions, traffic clutter, shadows and
appearance changes. In this paper, we study the feasibility of event
recognition when object tracks are fragmented. By changing the lock
score threshold controlling track termination, different levels of
track fragmentation are generated. The effect on event recognition
is revealed by examining the event model match score as a function
of lock score threshold. Using a Dynamic Bayesian Network to model
events, it is shown that event recognition actually improves with
greater track fragmentation, assuming fragmented tracks for the same
object are linked together. The improvement continues up to a point
when it is more likely to be offset by other errors such as those
caused by frequent object reinitialization. The study is conducted
on busy scenes of airplane servicing activities where long tracking
gaps occur intermittently.
@inproceedings{chanHSSBD06icpr,
abbr = {ICPR},
author = {Chan, M. T. and Hoogs, A. and Sun, Z. and Schmiederer, J. and Bhotika, R. and Doretto, G.},
title = {Event recognition with fragmented object tracks},
booktitle = {Proceedings of the International Conference on Pattern Recognition},
year = {2006},
volume = {1},
pages = {412--416},
month = aug,
bib2html_pubtype = {Conferences},
bib2html_rescat = {Video Analysis, Event Recognition},
doi = {10.1109/ICPR.2006.513},
file = {chanHSSBD06icpr.pdf:doretto\\conference\\chanHSSBD06icpr.pdf:PDF;chanHSSBD06icpr.pdf:doretto\\conference\\chanHSSBD06icpr.pdf:PDF},
issn = {1051-4651},
owner = {doretto},
timestamp = {2006.11.29}
}
CVPRW
Moving object segmentation using scene understanding
Perera, A. G. A.,
Brooksby, G.,
Hoogs, A.,
and Doretto, G.
In Proceedings of IEEE Computer Society Workshop on Perceptual Organization
in Computer Vision,
2006.
abstractbibTeXpdf
We present a novel approach to moving object detection in video taken
from a translating, rotating and zooming sensor, with a focus on
detecting very small objects in as few frames as possible. The primary
innovation is to incorporate automatically computed scene understanding
of the video directly into the motion segmentation process. Scene
understanding provides spatial and semantic context that is used
to improve frame-to-frame homography computation, as well as direct
reduction of false alarms. The method can be applied to virtually
any motion segmentation algorithm, and we explore its utility for
three: frame differencing, tensor voting, and generalized PCA. The
approach is especially effective on sequences with large scene depth
and much parallax, as often occurs when the sensor is close to the
scene. In one difficult sequence, our results show an 8-fold reduction
of false positives on average, with essentially no impact on the
true positive rate. We also show how scene understanding can be used
to increase the accuracy of frame-to-frame homography estimates.
@inproceedings{pereraBHD06pocv,
abbr = {CVPRW},
author = {Perera, A. G. A. and Brooksby, G. and Hoogs, A. and Doretto, G.},
title = {Moving object segmentation using scene understanding},
booktitle = {Proceedings of IEEE Computer Society Workshop on Perceptual Organization
in Computer Vision},
year = {2006},
pages = {201--208},
address = {New York City, NY, USA},
month = jun,
bib2html_pubtype = {Conferences},
bib2html_rescat = {Video Analysis, Visual Motion Segmentation},
doi = {10.1109/CVPRW.2006.132},
file = {pereraBHD06pocv.pdf:doretto\\conference\\pereraBHD06pocv.pdf:PDF;pereraBHD06pocv.pdf:doretto\\conference\\pereraBHD06pocv.pdf:PDF},
owner = {doretto},
timestamp = {2006.11.29}
}
Electric fish and robots may hold the key to achieving “autonomous lifelong machine learning,” based on research conducted at West Virginia University with t...