DSpace System Documentation: Functional Overview
3
Data Model Diagram
The way data is organized in DSpace is intended to reflect the structure of the organization using the DSpace system. Each
DSpace site is divided into communities, which can be further divided into sub-communities reflecting the typical university
structure of college, departement, research center, or laboratory.
Communities contain collections, which are groupings of related content. A collection may appear in more than one community.
Each collection is composed of items, which are the basic archival elements of the archive. Each item is owned by one collection.
Additionally, an item may appear in additional collections; however every item has one and only one owning collection.
Items are further subdivided into named bundles of bitstreams. Bitstreams are, as the name suggests, streams of bits, usually
ordinary computer files. Bitstreams that are somehow closely related, for example HTML files and images that compose a
single HTML document, are organised into bundles.
In practice, most items tend to have these named bundles:
• ORIGINAL -- the bundle with the original, deposited bitstreams
• THUMBNAILS -- thumbnails of any image bitstreams
• TEXT -- extracted full-text from bitstreams in ORIGINAL, for indexing
• LICENSE -- contains the deposit license that the submitter granted the host organization; in other words, specifies the rights
that the hosting organization have
• CC_LICENSE -- contains the distribution license, if any (a Creative Commons [http://www.creativecommons.org] license)
associated with the item. This license specifies what end users downloading the content can do with the content
Each bitstream is associated with one Bitstream Format. Because preservation services may be an important aspect of the
DSpace service, it is important to capture the specific formats of files that users submit. In DSpace, a bitstream format is a unique
and consistent way to refer to a particular file format. An integral part of a bitstream format is an either implicit or explicit notion
of how material in that format can be interpreted. For example, the interpretation for bitstreams encoded in the JPEG standard
for still image compression is defined explicitly in the Standard ISO/IEC 10918-1. The interpretation of bitstreams in Microsoft
Word 2000 format is defined implicitly, through reference to the Microsoft Word 2000 application. Bitstream formats can be
more specific than MIME types or file suffixes. For example, application/ms-word and .doc span multiple versions
of the Microsoft Word application, each of which produces bitstreams with presumably different characteristics.
Each bitstream format additionally has a support level, indicating how well the hosting institution is likely to be able to preserve
content in the format in the future. There are three possible support levels that bitstream formats may be assigned by the hosting
institution. The host institution should determine the exact meaning of each support level, after careful consideration of costs
and requirements. MIT Libraries' interpretation is shown below:
Table 2.1. MIT Libraries' Definitions of Bitstream Format Support Levels
Supported The format is recognized, and the hosting institution is confi-
dent it can make bitstreams of this format useable in the future,
using whatever combination of techniques (such as migration,
emulation, etc.) is appropriate given the context of need.
Known The format is recognized, and the hosting institution will
promise to preserve the bitstream as-is, and allow it to be re-
trieved. The hosting institution will attempt to obtain enough
information to enable the format to be upgraded to the 'sup-
ported' level.