Video Understanding

9 minute read


Video understanding is concerned with the parsing of the image data flow for the semantic understanding of the objects in the scene, but also their actions and interactions defining their behavior. When the objects of interest are people, there is the need to detect them (Tu et al., 2008), recognize them (Wu et al., 2008), but also to track their position, and re-identify them when they reappear (Doretto et al., 2011). By detecting people actions and interactions (Motiian et al., 2017) we can also attempt to predict their future behavior and intent. These techniques can be used to respond to queries that require mining a large corpus of video data for safety and security applications. On the other hand, variations of these techniques could be used to analyze and quantify the behavior of a heart in an echocardiogram.


  1. TCSVT
    Online Human Interaction Detection and Recognition with Multiple Cameras Motiian, S., Siyahjani, F., Almohsen, R., and Doretto, G. IEEE Transactions on Circuits and Systems for Video Technology, 2017. abstract bibTeX pdf
  2. JAIHC
    Appearance-based person reidentification in camera networks: problem overview and current approaches Doretto, G., Sebastian, T., Tu, P., and Rittscher, J. Journal of Ambient Intelligence and Humanized Computing, 2011. abstract bibTeX pdf html
  3. ECCV
    Unified crowd segmentation Tu, P., Sebastian, T., Doretto, G., Krahnstoever, N., Rittscher, J., and Yu, T. In Proceedings of European Conference on Computer Vision, 2008. abstract bibTeX pdf
  4. CVPR
    Face alignment using boosted ranking models Wu, H., Liu, X., and Doretto, G. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2008. Oral abstract bibTeX pdf