Aerial Video Analysis

8 minute read


Moving Object Segmentation

In aerial video moving objects of interest are typically very small, and being able to detect them is key to enable tracking. There are detection methods that learn the background and distinguish when a foreground object is present. These approaches require the image sensor to be fixed, and a large amount of frames for learning the background. To avoid these constraints, one could use motion segmentation algorithms (which need as low as two consecutive frames) but the foreground objects are expected to be considerably big. When the objects are small (Perera et al., 2006) proposes to learn how to classify image regions into categories such as road, tree, grass, building, vehicle, shadow, and to integrate this information with a motion segmentation algorithm for extracting the moving objects. The method dramatically boosts the detection rate of small objects, enabling reliable tracking. Moreover, it is general in the sense that it is not bound to a particular motion segmentation approach.

Complex Event Analysis

A fundamental goal in high-level vision is the ability to analyze a large field of view (which might be observed by an aerial sensor), and give a semantic interpretation of the interactions between the actors in the scene. Almost all the approaches have been developed for the ideal scenario where very accurate tracking data of the actors is available, and can be used to infer the status of the site. A more realistic setting is when the tracks of each actor are fragmented, and the fragments are not linked. This assumption accounts for occlusions, traffic, and tracking errors. In (Chan et al., 2006) and (Chan et al., 2006) a framework is developed where a dynamic Bayesian network is used to represent the interactions between actors, and inference is done by estimating at the same time what is the most likely linking between fragments, given that a certain event is occurring, and what is the most likely occurring event, given the current linking. This means that this approach estimates the long-duration tracks, while the events are being recognized despite the high fragmentation. This is possible even in scenes with many non-involved movers, and under different scene viewpoints and/or configurations.


  1. CVPR
    Joint recognition of complex events and track matching Chan, M. T., Hoogs, A., Bhotika, R., Perera, A., Schmiederer, J., and Doretto, G. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006. abstract bibTeX pdf doi
  2. ICPR
    Event recognition with fragmented object tracks Chan, M. T., Hoogs, A., Sun, Z., Schmiederer, J., Bhotika, R., and Doretto, G. In Proceedings of the International Conference on Pattern Recognition, 2006. abstract bibTeX pdf doi
  3. CVPRW
    Moving object segmentation using scene understanding Perera, A. G. A., Brooksby, G., Hoogs, A., and Doretto, G. In Proceedings of IEEE Computer Society Workshop on Perceptual Organization in Computer Vision, 2006. abstract bibTeX pdf doi