Open Access Repository

Video genre classification using compressed domain visual features

Gillespie, Warwick James 2006 , 'Video genre classification using compressed domain visual features', PhD thesis, University of Tasmania.

[img]
Preview
PDF (Whole thesis)
whole_Gillespie...pdf | Download (18MB)
Available under University of Tasmania Standard License.

| Preview

Abstract

With the rapid growth in the prevalence of digital video in the world comes the need for
efficient and effective management of such information. The field of content-based
video indexing and retrieval aims to achieve this through the automatic recognition of
the structure and content of video data and the indexing of both low level formative
features and high level semantic features. Two of the main problems facing this field
are what low level features can be used, and how to 'bridge the semantic gap' between
low level features and high level understanding of a video sequence. In this thesis we
propose a new method that can successfully automatically classify video shots into
broadly defined video genres. The classification serves as the first step in the indexing
of a video sequence and its consequent retrieval from a large database by partitioning the
database into more manageable sub-units according to genre, e.g. sport, drama, scenery,
news reading.
As the transmission and storage of digital video is commonly in a compressed format
(e.g. the MPEG-1, -2, and -4 standards), it is therefore efficient for any processing to
occur in the compressed domain. In this thesis video files compressed in the MPEG-1
format are considered, although the majority of methods presented can be easily adapted
for use with MPEG-2 and MPEG-4 formats. For indexing purposes an MPEG-1 file
contains spatial information in DCT coefficients and temporal information using motion
vectors. The reliability of the MPEG motion vectors is evaluated using a spatial block
activity factor estimated from DCT coefficients, to discard the vectors which do not
represent the true motion within a video sequence. The thesis also presents a robust
camera motion estimation technique, based on Least Median-of-Squares regression, to
minimise the influence of the outliers due to object motion and wrongly predicted
motion vectors. The results produced by the proposed technique show a significant
improvement in the sensitivity to object motion when compared to those produced by an
M-estimator technique. Robust motion intensity metrics are also presented for camera
and object motion, calculated from the estimated camera model and the MPEG motion vector field after the filtering of unreliable vectors. A novel metric based on the activity
factor used in the motion vector field filtering called activity power flow is introduced to
effectively capture the spatio-temporal evolution of scenes through a video shot. These
shot-based, low-level, global features represent both the spatial content of a shot, and the
motion in a shot, both due to movement of the camera, and also of objects.
The thesis also compares several machine learning techniques to transform low level
visual features into high level semantics, in particular Radial Basis Function (RBF)
networks with a focus on a tree-based RBF network. In this network, the result of a
binary classification tree is used to configure and to initialise the structure of the RBF
network. Video shots in a database are classified into four video genres: Sport, News,
Scenery, and Drama. This is believed to be the first shot based video classification
algorithm and the first method which uses only compressed domain features.
Experimental results show that this method is both efficient, as processing is undertaken
in the compressed domain, and effective, providing a classification accuracy which is
comparable, where possible, to previous techniques. For the genre set {sport, news,
cartoon, commercial, music} the best classification accuracies seen in previous works
are 83.1% [131] using just visual features and 87% [133] using combined audio and
visual features compared with a classification accuracy of 83.6% presented in this thesis.
For the genre set (sport, news, cartoon, drama, music } previous work [134] reported
classification accuracies of 72.0% for visual features and 88.8% using combined audio
and visual features compared with a classification accuracy of 86.2 % in this thesis.

Item Type: Thesis - PhD
Authors/Creators:Gillespie, Warwick James
Keywords: Information storage and retrieval systems, Electronic information research searching
Copyright Holders: The Author
Copyright Information:

Copyright 2006 the Author - The University is continuing to endeavour to trace the copyright
owner(s) and in the meantime this item has been reproduced here in good faith. We
would be pleased to hear from the copyright owner(s).

Additional Information:

Thesis (PhD)--University of Tasmania, 2006. Includes bibliographical references. Ch. 1. Introduction -- Ch. 2. Content-based video indexing and retrieval -- Ch. 3. Semantic video processing -- Ch. 4. Compressed domain video analysis -- Ch. 5. Low-level visual features -- Ch. 6. High-level semantic classification -- Ch. 7. Video genre classification results -- Ch. 8. Conclusions and further research

Item Statistics: View statistics for this item

Actions (login required)

Item Control Page Item Control Page
TOP