One of the most important elements of modern Computer Vision is the concept of image texture, or simply texture. Depending on the task at hand (e.g. image-based rendering, recognition, or segmentation, just to mention a few broad areas), several texture models have been proposed in the literature. An image texture originates through an image formation process that is typically very complex and not invertible. However, for image analysis purposes, most of the time it is not necessary to recover all the unknowns of a scene, and one can be content with reverting to a statistical analysis of the data. It is within this spirit that textures are seen as a spatial statistical repetition of image patterns. More formally, image textures can be seen as realizations from stochastic processes defined on a surface space, and the “repetition” property can be associated to the “stationarity” of the processes. What happens when these concepts are applied to video?
Dynamic Rextures: Modeling the Temporal Statistics
In nature there are plenty of scenes that originate video sequences showing temporal “repetition,” intended in a statistical sense. One could think of a flow of water, a fire, or a flow of car traffic or people walking. This kind of visual processes are now referred to as dynamic textures. (Doretto et al., 2003; Soatto et al., 2001) propose to study dynamic textures as stochastic processes that exhibit temporal stationarity, and introduce the use of linear dynamic systems for modeling their second-order statistical properties. They derived procedures for learning and simulating a dynamic texture model, and demonstrated its effectiveness in several cases using prediction error methods. The formalization is technically sound, and the model has been used in the literature to tackle many other problems by several other authors.
Dynamic Textures: Joint Modeling of Spatial and Temporal Statistics
In analyzing visual processes there may be portions of videos that can be modeled as dynamic textures, which means that they exhibit temporal stationarity. In addition to that, within a single frame they may also exhibit repetitions of the same patterns, like in image textures, which means that the visual process is spatially stationary as well. Therefore, it makes sense to design models that can capture the structure of the joint spatial, and temporal statistics, for the purpose of enabling recognition and segmentation. (Doretto et al., 2004) introduces a model for this kind of dynamic textures, which combines a tree representation of Markov random fields, for capturing the spatial stationarity, with linear dynamic systems, for capturing the temporal stationarity of the visual process. The effectiveness of the model is demonstrated by showing extrapolation of video in both space and time domains. The framework sets the stage for simultaneous segmentation and recognition of spatio-temporal events.
Dynamic Shape and Appearance: Joint Shape, Appearance, and Dynamics Modeling
Rather then attempting to model the temporal image variability of dynamic textures by capturing only how image intensities (appearance) vary over time, one could try to describe it by modeling how the shape of the scene varies. Both representations have advantages and limitations. For instance, the temporal variations of sharp edges are better captured by shape variation; however, this one cannot be used when a directional motion component is present, and appearance is the alternative. Therefore, exploiting the benefits of jointly modeling shape and appearance is very important, as it has been demonstrated for single images, but the extension to dynamic scenes (motion) was missing. (Doretto & Soatto, 2006; Doretto, 2005) address this issue, and propose to explain stationary image variability by means of the joint variability of shape and appearance akin to a temporal generalization of the well-known Active Appearance Models (AAMs). The issues of how much image variability should be modeled by shape, how much by appearance, how they vary over time (motion), and how appearance, shape and motion merge together, are addressed. The approach is capable of learning the temporal variation of higher-order image statistics, typical of videos containing sharp edge variation.
- TPAMIDynamic shape and appearance models IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006.
- CVPRModeling dynamic scenes with active appearance In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. Oral
- ECCVSpatially homogeneous dynamic textures In Proceedings of European Conference on Computer Vision, 2004. Oral
- IJCVDynamic textures International Journal of Computer Vision, 2003.
- ICCVDynamic textures In Proceedings of IEEE International Conference on Computer Vision, 2001. Oral