Interpreting camera operations in the context of content-based video indexing and retrieval
Abstract
Video is a medium of increasing popularity and widely used in many areas. However, classifying or retrieving video data by keywords or filenames is not sufficient in many situations. Moreover, it can yield unpredictable results. For example, when people use"kitten" as a keyword and want to find some video sequences that include a kitten, the output result may include sequences of a pop band called"atomic kitten". Hence, the concept of content-based video indexing and retrieval (CBVR,) is introduced and becomes a challenging problem. Existing CBVR techniques can be classified in three classes depending on what features they use: low level (e.g. pixels level: colors, edges, etc.), mid level and high level (e.g. concepts). Most of the works in the literature deal with low-level features. Only few works try to bridge the gap between low and high level. In this work, we intend to go one step further in that way. More especially, our work achieves a new technique to index and retrieve videos based on both the apparent motion and defocus blur. From those low-level features, we estimate some of the extrinsic and intrinsic camera parameter changes, and then deduce camera operations, such as panning/tracking, tilting/booming, zooming/dollying and rolling, as well as focus changes. Finally, camera operations are recorded into an index which is then used for video retrieval. Experiments confirm that the proposed mid-level features can be accurately deduced from low-level features and that they can be used for indexing and retrieval purpose.
Collection
- Sciences – Mémoires [1602]