Applications of Content-Based Retrieval

Next: CBR in Image and Up: Digital Libraries: Content-Based Retrieval Previous: Content-Based Retrieval (CBR)

Applications of Content-Based Retrieval

In this section we provide a broad overview of areas of application of content-based retrieval of a variety of media in multimedia databases. Later sections will discuss how such retrieval may be facilitated.

Images

-- Image retrieval, navigation, and browsing in collections of images have a variety of applications. More and more application areas, such as medicine, maintain large collections of digital images. Efficient mechanisms to efficiently browse and navigate through the collections, however, are still lacking. Semantic content-based image browsing and navigation are needed instead of searching and viewing directory trees for image files. An important issue is to extract images according to the user's (semantic) association and impression of an image, e.g. sunset at the sea, and not only according to mere (syntactic or structural) image features such as the color or texture. For the retrieval of images, a suitable definition for semantic equality and similarity of images is also needed. To be able to support the mapping from the users' ideas down to the raw image data, a model describing the association between users' concepts and image characteristics and semantics is needed Advances in techniques for obtaining images of the body's interior have greatly improved medical diagnosis. New imaging methods include various X-ray systems, computerized tomography, and magnetic resonance. The introduction of computerized tomography (CT) was a major advance in visualizing almost all parts of the body, particularly useful in diagnosing tumors and other space-occupying lesions. These new techniques lead to accumulation of masses of digital medical images stored in medical image archives.

For satisfying diagnosis, however, it is not sufficient to store and access a patient's CT images with the patient's record-id. Rather, suitable querying mechanisms are needed for a useful employment of the images in medical diagnosis. The questions of a surgeon to a medical image archival may be: How does my patient's tumor look compared to similar cases of brain tumors? What is the normal growth rate of a special type of brain tumor? Does the spatial growth of a brain tumor decrease with a certain drug therapy?

The images themselves do not give hints about whether they show a brain tumor or where it is located in the body. Therefore the knowledge of the spatial content of the images and the evolutionary behavior of the spatial content for a medical image (e.g., for a brain tumor) must be used or made available when processing a surgeon's queries. The result of a query should then be a collection of images that have similar spatial characteristics compared to a given image or a sequence of images showing the growth of a brain tumor over a year's time.

Searching image collections can be employed to find a starting point in a web of images from which the user may want to start a navigation through images and related information.

After having selected a particular image of interest, navigating through an image collection can take place, e.g. by choosing a particular part of the image that is currently being displayed. This selection can lead to associated data such as a set of related images or some other related textual information. One may also navigate in the (hyper-)textual information and may come back to the image collection via special links/hooks in the text and find an image associated with the respective textual information. For example, an image of a person comprises various regions having semantic content, e.g. the various subregions that correspond to the eyes, lips, and nose. When viewing a media object, the related information can be investigated for learning, e.g. which person can be seen on the image. Additionally an information location associated with a person's image can lead to an associated building and room of the location and then finally to the image of the person's office.

Navigating image collections might also involve navigating three-dimensional (3-D) representations, e.g. of the body's interior. A sequence of CT images can be the basis of a computed 3-D graphics representation of the brain. A surgeon may navigate through this representation of the brain. She/he may select a particular volume of interest, the thalamus, and enter it, viewing it at a higher resolution to see whether there is a growth inside. The surgeon may also select a part of the 3-D representation inside a thalamus that allows him/her to view photographs of patients with similar growth, etc. This kind of support for image retrieval, navigation, and browsing requires a lot of semantic knowledge for the retrieval, navigation, and browsing algorithms. Such matters are still very much research issues.

Video

-- In many domains video clips are archived digitally, e.g. in news agencies. Besides the archiving of the digitized video clips, an important issue is to browse through a collection of videos and select them either entirely or partially. A difference between searching, browsing, and navigating in videos in contrast to images is the temporal aspect. The abstract information that is added to video to support retrieval can change within the video depending on the part of the video. Furthermore, querying against a video database results in a sequence of video clips, each of which is time-dependent.

For example, nowadays we have to watch a provider's news and cannot eliminate those news items we are not interested in. Personalized news , cut to special personal interest, will make a news watcher independent of the news and of the time the news is actually on air. According to a user profile, videos are searched, and those parts of the present news items are selected that fit a questioner's need. With semantic knowledge about the structure of news, newly assembled and temporally arranged news items can be composed to form a personalized news extract. The interesting issue is how to define such a user profile and how thousands of news items of a news provider can be attached metadata that in combination with a user profile allow for a satisfying mapping between the two and the successful reassembling of the personalized news.

A similar application scenario can be derived from the demand of a critic who only wants to watch those parts of a film that suffice to write a quick review of the film or the special demand of a sport enthusiast who has only time to see a sequence of all goals of a certain football game of a certain team in order to be able to talk about the game the next day or the post-game analysis of football teams to support planning of strategies and analyze performance.

Audio and speech

-- Radio stations collect many if not all of their important and informative programs such as radio news in archives. Often it is of interest to reuse or to refer to parts of such programs in other radio broadcasts. However, to efficiently retrieve parts of radio programs it is necessary to have the right metadata generated from and associated with the audio recordings. This asks for retrieval of audio that contains spoken text. One important issue here is the detection of text in the audio, i.e. speech recognition. Here problems that arise in speech recognition because of different pronunciation of words by different speakers and language peculiarities must be overcome. Another important issue is the mapping between a high-level vague query, like a textual or a query containing spoken text, to the metadata attached with the audio recordings. This calls for an organization of the metadata to support efficient query evaluation and for a query evaluation model that determines those recordings in an archive that are relevant to a user's query.

Structured document management

-- As the publishing paradigm is shifting from popular desktop publishing to database-driven publishing, processing of structured documents becomes more and more important. Interesting issues are the description of document structure and layout, structure and content-oriented retrieval of components of documents, full-text retrieval, presentation of document content on various output channels like print media, CD-ROMs, WWW, etc. Particular document information models like SGML (Standard Generalized Markup Language) and HyTime (Hypermedia/Time-Based Structuring Language) introduce a lot of descriptive information, i.e., metadata, on the structure and content of documents. Such metadata can be used during processing for improving system performance, e.g. database configuration, and document type-specific query optimization, or for providing new functionality, e.g. higher expressiveness of query statements, integrating information retrieval techniques with database functionality, and providing query templates based on document structure or layout. Metadata can be used at various system layers, e.g. at the specification layer by means of document type definitions, for the internal representation of documents and its components including storage models and indexing, for maintaining histories of processing a document, to support declarative access and query processing, etc.

Knowledge about structure can be used by the author of a news article to retrieve interesting parts of documents in a huge document archive. For a well-targeted query the document structure available via metadata can be exploited. Not only can all the documents be retrieved by the author that contain the name "Clint Eastwood" but also all documents that contain the name in their heading as this is a known structural element in the documents. Efficient retrieval is achieved by exploiting document structure as the metadata can be used for indexing, and that is essential for short query response time. A typesetter of a newspaper's title page can make use of the metadata to properly lay out the article, that is, to process the document like ``Place the title in 18pt Helvetica at the top of the page, align the first two paragraphs beneath the headline, and let the remaining paragraphs follow on the next page.''

Geographic and environmental information systems

-- Geographic and environmental information systems are used by various parties who have very special information needs. Such systems have to provide an integrated view on individual geographic and environmental data sets. Obviously, one key problem is the provision of descriptive information on the content provided to the end users and for the information system itself in order to facilitate transparent integrated access to different information sources. The problems faced in this application domain are related to some extent to the integration of heterogeneous databases. Approaches taken in this field deal with a significant amount of metadata for global query decomposition, global transaction management, schema integration, and management of federated information systems. Other important issues addressed are exercise of control over the degree of uncertainty and accuracy of the data. An important milestone achieved in this application domain is the availability of national and international standards for metadata frameworks.

Next: CBR in Image and Up: Digital Libraries: Content-Based Retrieval Previous: Content-Based Retrieval (CBR)

Dave Marshall
10/4/2001