Page
Understanding Metadata
2
links to resources based on
audience or topic. Such lists can be
built as static webpages, with the
names and locations of the
resources “hardcoded” in the
HTML. However, it is more efficient
and increasingly more common to
build these pages dynamically from
metadata stored in databases.
Various software tools can be used
to automatically extract and
reformat the information for Web
applications.
Interoperability
Describing a resource with
metadata allows it to be understood
by both humans and machines in
ways that promote interoperability.
Interoperability is the ability of
multiple systems with different
hardware and software platforms,
data structures, and interfaces to
exchange data with minimal loss of
content and functionality. Using
defined metadata schemes, shared
transfer protocols, and crosswalks
between schemes, resources
across the network can be
searched more seamlessly.
Two approaches to inter-
operability are cross-system search
and metadata harvesting. The
Z39.50 protocol is commonly used
for cross-system search. Z39.50
implementers do not share
metadata but map their own search
capabilities to a common set of
search attributes. A contrasting
approach taken by the Open
Archives Initiative is for all data
providers to translate their native
metadata to a common core set of
elements and expose this for
harvesting. A search service
provider then gathers the metadata
into a consistent central index to
allow cross-repository searching
regardless of the metadata formats
used by participating repositories.
Digital Identification
Most metadata schemes include
elements such as standard
numbers to uniquely identify the
work or object to which the
metadata refers. The location of a
digital object may also be given
using a file name, URL (Uniform
Resource Locator), or some more
persistent identifier such as a PURL
(Persistent URL) or DOI (Digital
Object Identifier). Persistent
identifiers are preferred because
object locations often change,
making the standard URL (and
therefore the metadata record)
invalid. In addition to the actual
elements that point to the object, the
metadata can be combined to act
as a set of identifying data,
differentiating one object from
another for validation purposes.
Archiving and
Preservation
Most current metadata efforts
center around the discovery of
recently created resources.
However, there is a growing
concern that digital resources will
not survive in usable form into the
future. Digital information is fragile;
it can be corrupted or altered,
intentionally or unintentionally. It
may become unusable as storage
media and hardware and software
technologies change. Format
migration and perhaps emulation of
current hardware and software
behavior in future hardware and
software platforms are strategies for
overcoming these challenges.
Metadata is key to ensuring that
resources will survive and continue
to be accessible into the future.
Archiving and preservation require
special elements to track the
lineage of a digital object (where it
came from and how it has changed
over time), to detail its physical
characteristics, and to document its
behavior in order to emulate it on
future technologies.
Many organizations inter-
nationally have worked on defining
metadata schemes for digital
preservation, including the National
Library of Australia, the British
Cedars Project (CURL Exemplars
in Digital Archives), and a joint
Working Group of OCLC and the
Research Libraries Group (RLG).
The latter group developed a
framework outlining types of
presentation metadata. A follow-up
group, PREMIS (PREservation
Metadata: Implementation Strat-
egies)—also sponsored by OCLC
and RLG—is developing a set of
core elements and strategies for the
encoding, storage, and manage-
ment of preservation metadata
within a digital preservation system.
Many of these initiatives are based
on or compatible with the ISO
Reference Model for an Open
Archival Information System
(OAIS).
Structuring Metadata
Metadata schemes (also called
schema) are sets of metadata
elements designed for a specific
purpose, such as describing a
particular type of information
resource. The definition or meaning
of the elements themselves is
known as the semantics of the
scheme. The values given to
metadata elements are the content.
Metadata schemes generally
specify names of elements and their
semantics. Optionally, they may
specify content rules for how
content must be formulated (for
example, how to identify the main
title), representation rules for
content (for example, capitalization
rules), and allowable content values
(for example, terms must be used
from a specified controlled
vocabulary).
There may also be syntax rules
for how the elements and their
content should be encoded. A
metadata scheme with no
prescribed syntax rules is called
syntax independent. Metadata can
be encoded in any definable syntax.
Many current metadata schemes
use SGML (Standard Generalized
Mark-up Language) or XML
(Extensible Mark-up Language).
XML, developed by the World Wide
Web Consortium (W3C), is an
extended form of HTML that allows
for locally defined tag sets and the
easy exchange of structured