One of the most important elements of modern Computer Vision is the concept of image texture, or simply texture. Depending on the task at hand (e.g. image-based rendering, recognition, or segmentation, just to mention a few broad areas), several texture models have been proposed in the literature. An image texture originates through an image formation process that is typically very complex and not invertible. However, for image analysis purposes, most of the time it is not necessary to recover all the unknowns of a scene, and one can be content with reverting to a statistical analysis of the data. It is within this spirit that textures are seen as a spatial statistical repetition of image patterns. More formally, image textures can be seen as realizations from stochastic processes defined on a surface space, and the “repetition” property can be associated to the “stationarity” of the processes. What happens when these concepts are applied to video?
Dynamic Textures: Modeling the Temporal Statistics
In nature there are plenty of scenes that originate video sequences showing temporal “repetition,” intended in a statistical sense. One could think of a flow of water, a fire, or a flow of car traffic or people walking. This kind of visual processes are now referred to as dynamic textures. (Doretto et al., 2003; Soatto et al., 2001) propose to study dynamic textures as stochastic processes that exhibit temporal stationarity, and introduce the use of linear dynamic systems for modeling their second-order statistical properties. They derived procedures for learning and simulating a dynamic texture model, and demonstrated its effectiveness in several cases using prediction error methods. The formalization is technically sound, and the model has been used in the literature to tackle many other problems by several other authors.
Dynamic Textures: Joint Modeling of Spatial and Temporal Statistics
In analyzing visual processes there may be portions of videos that can be modeled as dynamic textures, which means that they exhibit temporal stationarity. In addition to that, within a single frame they may also exhibit repetitions of the same patterns, like in image textures, which means that the visual process is spatially stationary as well. Therefore, it makes sense to design models that can capture the structure of the joint spatial, and temporal statistics, for the purpose of enabling recognition and segmentation. (Doretto et al., 2004) introduces a model for this kind of dynamic textures, which combines a tree representation of Markov random fields, for capturing the spatial stationarity, with linear dynamic systems, for capturing the temporal stationarity of the visual process. The effectiveness of the model is demonstrated by showing extrapolation of video in both space and time domains. The framework sets the stage for simultaneous segmentation and recognition of spatio-temporal events.
Dynamic Shape and Appearance: Joint Shape, Appearance, and Dynamics Modeling
Rather then attempting to model the temporal image variability of dynamic textures by capturing only how image intensities (appearance) vary over time, one could try to describe it by modeling how the shape of the scene varies. Both representations have advantages and limitations. For instance, the temporal variations of sharp edges are better captured by shape variation; however, this one cannot be used when a directional motion component is present, and appearance is the alternative. Therefore, exploiting the benefits of jointly modeling shape and appearance is very important, as it has been demonstrated for single images, but the extension to dynamic scenes (motion) was missing. (Doretto & Soatto, 2006; Doretto, 2005) address this issue, and propose to explain stationary image variability by means of the joint variability of shape and appearance akin to a temporal generalization of the well-known Active Appearance Models (AAMs). The issues of how much image variability should be modeled by shape, how much by appearance, how they vary over time (motion), and how appearance, shape and motion merge together, are addressed. The approach is capable of learning the temporal variation of higher-order image statistics, typical of videos containing sharp edge variation.
The operation that by processing video data allows producing new video data in Computer Graphics is known as Video-Based Rendering (VBR). Developing new VBR ...
Recognition of objects based on their images is one of the central problems in modern Computer Vision. Objects can be characterized by their geometric, photo...
Segmenting the image plane of video sequences is often one of the first steps towards the analysis of video. A lot of effort has been spent on developing ima...
References
TPAMI
Dynamic shape and appearance modelsDoretto, G.,
and Soatto, S.IEEE Transactions on Pattern Analysis and Machine Intelligence,
2006.
abstractbibTeXpdfdoi
We propose a model of the joint variation of shape and appearance
of portions of an image sequence. The model is conditionally linear,
and can be thought of as an extension of active appearance models
to exploit the temporal correlation of adjacent image frames. Inference
of the model parameters can be performed efficiently using established
numerical optimization techniques borrowed from finite-element analysis
and system identification techniques.
@article{dorettoS06IEEEtpami,
abbr = {TPAMI},
author = {Doretto, G. and Soatto, S.},
title = {Dynamic shape and appearance models},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
year = {2006},
volume = {28},
pages = {2006--2019},
number = {12},
address = {Los Alamitos, CA, USA},
bib2html_pubtype = {Journals},
bib2html_rescat = {Dynamic Textures, Visual Motion Analysis, Shape and Appearance Modeling,
Image Based Rendering, Registration from Dynamic Textures},
doi = {http://doi.ieeecomputersociety.org/10.1109/TPAMI.2006.243},
file = {dorettoS06IEEEtpami.pdf:doretto\\journal\\dorettoS06IEEEtpami.pdf:PDF;dorettoS06IEEEtpami.pdf:doretto\\journal\\dorettoS06IEEEtpami.pdf:PDF},
issn = {0162-8828},
owner = {doretto},
publisher = {IEEE Computer Society},
timestamp = {2006.11.24}
}
CVPR
Modeling dynamic scenes with active appearanceDoretto, G.
In Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition,
2005.
OralabstractbibTeXpdf
In this work we propose a model for video scenes that contain temporal
variability in shape and appearance. We propose a conditionally linear
model akin to a dynamic extension of active appearance models. We
formulate the problem variationally, and propose a framework where
a model complexity cost dictates the �modeling responsibility� of
each of the factors: appearance, shape and motion. We render the
learning problem well-posed by reverting to a physical and a dynamic
prior, and use the finite element method to compute a numerical solution.
We illustrate our model to learn and simulate the shape, appearance,
and motion of scenes that exhibit some form of temporal regularity,
intended in a statistical sense.
@inproceedings{doretto05cvpr,
abbr = {CVPR},
author = {Doretto, G.},
title = {Modeling dynamic scenes with active appearance},
booktitle = {Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition},
year = {2005},
volume = {1},
pages = {66--73},
address = {San Diego, CA, USA},
month = jun,
bib2html_pubtype = {Conferences},
bib2html_rescat = {Dynamic Textures, Visual Motion Analysis, Shape and Appearance Modeling},
file = {doretto05cvpr.pdf:doretto\\conference\\doretto05cvpr.pdf:PDF;doretto05cvpr.pdf:doretto\\conference\\doretto05cvpr.pdf:PDF},
owner = {doretto},
wwwnote = {Oral}
}
ECCV
Spatially homogeneous dynamic texturesDoretto, G.,
Jones, E.,
and Soatto, S.
In Proceedings of European Conference on Computer Vision,
2004.
OralabstractbibTeXpdf
We address the problem of modeling the spatial and temporal second-order
statistics of video sequences that exhibit both spatial and temporal
regularity, intended in a statistical sense. We model such sequences
as dynamic multiscale autoregressive models, and introduce an efficient
algorithm to learn the model parameters. We then show how the model
can be used to synthesize novel sequences that extend the original
ones in both space and time, and illustrate the power, and limitations,
of the models we propose with a number of real image sequences.
@inproceedings{dorettoJS04eccv,
abbr = {ECCV},
author = {Doretto, G. and Jones, E. and Soatto, S.},
title = {Spatially homogeneous dynamic textures},
booktitle = {Proceedings of European Conference on Computer Vision},
year = {2004},
volume = {2},
pages = {591--602},
address = {Prague, Czech Republic},
month = may,
bib2html_pubtype = {Conferences},
bib2html_rescat = {Dynamic Textures, Visual Motion Analysis},
file = {dorettoJS04eccv.pdf:doretto/conference/dorettoJS04eccv.pdf:PDF},
owner = {doretto},
timestamp = {2013.10.18},
wwwnote = {Oral}
}
IJCV
Dynamic texturesDoretto, G.,
Chiuso, A.,
Wu, Y. N.,
and Soatto, S.International Journal of Computer Vision,
2003.
abstractbibTeXpdf
Dynamic textures are sequences of images of moving scenes that exhibit certain stationarity properties
in time; these include sea-waves, smoke, foliage, whirlwind etc. We present a characterization of dynamic
textures that poses the problems of modeling, learning, recognizing and synthesizing dynamic textures on a
firm analytical footing. We borrow tools from system identification to capture the “essence” of dynamic textures; we do so by learning (i.e. identifying) models that are optimal in the sense of maximum likelihood
or minimum prediction error variance. For the special case of second-order stationary processes, we identify the model sub-optimally in closed-form. Once learned, a model has predictive power and can be used
for extrapolating synthetic sequences to infinite length with negligible computational cost. We present experimental evidence that, within our framework, even low-dimensional models can capture very complex visual
phenomena.
@article{dorettoCWS03ijcv,
abbr = {IJCV},
author = {Doretto, G. and Chiuso, A. and Wu, Y. N. and Soatto, S.},
title = {Dynamic textures},
journal = {International Journal of Computer Vision},
year = {2003},
volume = {51},
pages = {91--109},
number = {2},
bib2html_pubtype = {Journals},
bib2html_rescat = {Dynamic Textures, Visual Motion Analysis, Visual Motion Recognition},
file = {dorettoCWS03ijcv.pdf:doretto\\journal\\dorettoCWS03ijcv.pdf:PDF;dorettoCWS03ijcv.pdf:doretto\\journal\\dorettoCWS03ijcv.pdf:PDF}
}
ICCV
Dynamic texturesSoatto, S.,
Doretto, G.,
and Wu, Y. N.
In Proceedings of IEEE International Conference on Computer Vision,
2001.
OralabstractbibTeXpdf
Dynamic textures are sequences of images of moving scenes that exhibit
certain stationarity properties in time; these include sea-waves,
smoke, foliage, whirlwind but also talking faces, traffic scenes
etc. We present a novel characterization of dynamic textures that
poses the problems of modelling, learning, recognizing and synthesizing
dynamic textures on a firm analytical footing. We borrow tools from
system identification to capture the �essence� of dynamic textures;
we do so by learning (i.e. identifying) models that are optimal in
the sense of maximum likelihood or minimum prediction error variance.
For the special case of secondorder stationary processes we identify
the model in closed form. Once learned, a model has predictive power
and can be used for extrapolating synthetic sequences to infinite
length with negligible computational cost. We present experimental
evidence that, within our framework, even low dimensional models
can capture very complex visual phenomena.
@inproceedings{soattoDW01iccv,
abbr = {ICCV},
author = {Soatto, S. and Doretto, G. and Wu, Y. N.},
title = {Dynamic textures},
booktitle = {Proceedings of IEEE International Conference on Computer Vision},
year = {2001},
volume = {2},
pages = {439--446},
address = {Vancouver, BC, Canada},
month = jul,
bib2html_pubtype = {Conferences},
bib2html_rescat = {Dynamic Textures, Visual Motion Analysis},
file = {soattoDW01iccv.pdf:doretto\\conference\\soattoDW01iccv.pdf:PDF;soattoDW01iccv.pdf:doretto\\conference\\soattoDW01iccv.pdf:PDF},
wwwnote = {Oral}
}