“03-Ch01-SA272” 17/9/2008 page 3
1.1 Is Pattern Recognition Important? 3
in a manufacturing environment, to control machines in hazardous environments
remotely, and to help handicapped people to control machines by talking to them.
A major effort, which has already had considerable success, is to enter data into
a computer via a microphone. Software, built around a pattern (spoken sounds
in this case) recognition system, recognizes the spoken text and translates it into
ASCII characters,which are shown on the screen and can be stored in the memory.
Entering information by “talking” to a computer is twice as fast as entry by a skilled
typist. Furthermore, this can enhance our ability to communicate with deaf and
dumb people.
Data mining and knowledge discovery in databases is another key application
area of pattern recognition. Data mining is of intense interest in a wide range of
applications such as medicine and biology, market and financial analysis, business
management, science exploration, image and music retrieval. Its popularity stems
from the fact that in the age of information and knowledge society there is an ever
increasing demand for retrieving information and turning it into knowledge. More-
over,this information exists in huge amounts of data in various forms including,text,
images, audio and video, stored in different places distributed all over the world.
The traditional way of searching information in databases was the description-based
model where object retrieval was based on keyword description and subsequent
word matching. However, this type of searching presupposes that a manual anno-
tation of the stored information has previously been performed by a human. This
is a very time-consuming job and, although feasible when the size of the stored
information is limited, it is not possible when the amount of the available informa-
tion becomes large. Moreover, the task of manual annotation becomes problematic
when the stored information is widely distributed and shared by a heterogeneous
“mixture”of sites and users. Content-based retrieval systems are becoming more and
more popular where information is sought based on “similarity”between an object,
which is presented into the system, and objects stored in sites all over the world.
In a content-based image retrieval CBIR (system) an image is presented to an input
device (e.g.,scanner). The system returns“similar”images based on a measured“sig-
nature,” which can encode, for example, information related to color, texture and
shape. In a music content-based retrieval system, an example (i.e., an extract from
a music piece), is presented to a microphone input device and the system returns
“similar” music pieces. In this case, similarity is based on certain (automatically)
measured cues that characterize a music piece, such as the music meter, the music
tempo, and the location of certain repeated patterns.
Mining for biomedical and DNA data analysis has enjoyed an explosive growth
since the mid-1990s. All DNA sequences comprise four basic building elements;
the nucleotides: adenine (A), cytosine (C), guanine (G) and thymine (T). Like the
letters in our alphabets and the seven notes in music, these four nucleotides are
combined to form long sequences in a twisted ladder form. Genes consist of,usually,
hundreds of nucleotides arranged in a particular order. Specific gene-sequence
patterns are related to particular diseases and play an important role in medicine.
To this end, pattern recognition is a key area that offers a wealth of developed tools
for similarity search and comparison between DNA sequences. Such comparisons