没有合适的资源?快使用搜索试试~ 我知道了~
首页System-R关系型数据库论文
资源详情
资源评论
资源推荐

System R: Relational Approach to Database
Management
M. M. ASTRAHAN, ht. W. BLASGEN, D. D. CHAMBERLIN,
K. P. ESWARAN, J. N. GRAY, P. P. GRIFFITHS,
W. F. KING, R. A. LORIE, P. R. A&JONES, J. W. MEHL,
G. R. PUTZOLU, I. L. TRAIGER, B. W. WADE, AND V. WATSON
IBM Research Laboratory
System R is a database management system which
provides a high level relational data interface.
The system provides a high level of data independence by isolating the end user as much as
possible from underlying storage structures. The system permits definition of a variety of relational
views on common underlying data. Data control features are provided, including authorization,
integrity assertions, triggered transactions, a logging and recovery subsystem, and facilities for
maintaining data consistency in a shared-update environment.
This paper contains a description of the overall architecture and design of the system. At the
present time the system is being implemented and the design evaluated. We emphasize that
System R is a vehicle for research in database architecture, and is not planned as a product.
Key Words and Phrases: database, relational model, nonprocedural language, authorization,
locking, recovery, data structures, index structures
CR categories: 3.74, 4.22, 4.33, 4.35
1. INTRODUCTION
The relational model of data was introduced by Codd [7] in 1970 as an approach
toward providing solutions to various problems in database management. In par-
ticular, Codd addressed the problems of providing a data model or view which is
divorced from various implementation considerations (the data independence
problem) and also the problem of providing the database user with a very high
level, nonprocedural data sublanguage for accessing data.
To a large extent, the acceptance and value of the relational approach hinges on
the demonstration that a system can be built which can be used in a real environ-
ment to solve real problems and has performance at least comparable to today’s
existing systems. The purpose of this paper is to describe the overall architecture
and design aspects of an experimental prototype database management system
called System R, which is currently being implemented and evaluated at the IBM
San Jose Research Laboratory. At the time of this writing, the design has been
-
Copyright @ 1976, Association for Computing Machinery, Inc. General permission to republish,
but not for profit, all or part of this material is granted provided that ACM’s copyright notice is
given
and that reference
is made to the publication, to its date of issue, and to the fact that
reprinting privileges were granted by permission of the Association for Computing Machinery.
Authors’ address: IBM Research Laboratory, San Jose, CA 95193.
ACM Transactions on Database Systems, Vol. 1, No. 2. June 1976, Pages 97-137.

98
l
M. M. Astrahan et al,
CONTENTS
1.
INTRODUCTION
Architecture and System Structure
2. THE RELATIONAL DATA SYSTEM
Host Language Interface
Query Facilities
Data Manipulation Facilities
Data Definition Facilities
Data Control Facilities
The Optimizer
Modifying Cursors
Simulation of Nonrelational Data Models
3. THE RELATIONAL STORAGE SYSTEM
Segments
Relations
Images
Links
Transaction Management
Concurrency Control
System Checkpoint and Restart
4. SUMMARY AND CONCLUSION
APPENDIX I. RDI Operators
APPENDIX II.
SEQUEL
Syntax
APPENDIX III. RSI Operators
ACKNOWLEDGMENTS
REFERENCES
completed and major portions of the system are implemented and running. How-
ever, the overall system is not completed. We plan a complete performance evalua-
tion of the system which will be available in later papers.
The System R project is not the first implementation of the relational approach
[12, 301. On the other hand, we know of no other relational system which provides
a complete database management capability-including application programming
as well as query capability, concurrent access support, system recovery, etc. Other
relational systems have focused on, and demonstrated, feasibility of techniques
for solving various specific problems. For example, the IS/l system [22] demon-
strated the feasibility of supporting the relational algebra [S] and also developed
optimization techniques for evaluating algebraic expressions [29]. Techniques for
optimization of the relational algebra have also been developed by Smith and
Chang at the University of Utah [27]. The extended relational memory (XRM)
system [19] developed at the IBM Cambridge Scientific Center has been used as
a single user access method by other relational systems [2]. The
SEQUEL
prototype
[l] was originally developed as a single-user system to demonstrate the feasibility
of supporting the
SEQUEL
[5] language. However, this system has been extended
by the IBM Cambridge Scientific Center and the MIT Sloan School Energy Labor-
atory to allow a simple type of concurrency and is being used as a component of
the Generalized Management Information System (GMIS) [9] being developed
at MIT for energy related applications. The INGRES project [lS] being developed
at the University of California, Berkeley, has demonstrated techniques for the de-
composition of relational expressions in the
QUEL
language into “one-variable
ACM Transactions on Database Systems, Vol. 1, No. 2, June 1976.

System R
l
99
queries.” Also, this system has investigated the use of query modification [28] for
enforcing integrity constraints and authorization constraints on users. The problem
of translating a high level user language into lower level access primitives has also
been studied at the University of Toronto [21,26].
Architecture and System Structure
We will describe the overall architecture of System R from two viewpoints. First,
we will describe the system as seen by a single transaction, i.e. a monolithic de-
scription. Second, we will investigate its multiuser dimensions. Figure 1 gives a
functional view of the system including its major interfaces and components.
The Relational Storage Interface (RSI) is an internal interface which handles
access to single tuples of base relations. This interface and its supporting system,
the Relational Storage System (RSS) , is actually a complete storage subsystem in
that it manages devices, space allocation, storage buffers, transaction consistency
and locking, deadlock detection, backout, transaction recovery, and system re-
covery. Furthermore, it maintains indexes on selected fields of base relations, and
pointer chains across relations.
The Relational Data Interface (RDI) is the external interface which can be
called directly from a programming language, or used to support various emulators
and other interfaces. The Relational Data System (RDS), which supports the
RDI, provides authorization, integrity enforcement, and support for alternative
views of data. The high level
SEQUEL
language is embedded within the RDI, and
is used as the basis for all data definition and manipulation. In addition, the RDS
maintains the catalogs of external names, since the RSS uses only system generated
internal names. The RDS contains an optimizer which chooses an appropriate
access path for any given request from among the paths supported by the RSS.
P-m
Programs to support
+ various Interfaces:
Stand alone SEQUEL,
Query By Example, etc.
- Relational
Data
RelatIonal
Interface
DC3
(RDII
System
IRDSI
RDS
RSS
Ml
I :
I ’ I
I
Monitor
I
storage
Interface
(RSll
FIQ. 1. Architecture of System R
FIG. 2. Use of virtual machines
in System R
ACM Transactions on Database Systems. Vol. 1, No. 2, June 1878.

100
l
M. M. Astrahan et al.
The current operating system environment for this experimental system is
VM/370 [lS]. Several extensions to this virtual machine facility have been made
[14] in order to support the multiuser environment of System R. In particular,
we have implemented a technique for the selective sharing of read/write virtual
memory across any number of virtual machines and for efficient communication
among virtual machines through processor interrupts. Figure 2 illustrates the use
of many virtual machines to support concurrent transactions on shared data. For
each logged-on user there is a dedicated
database machine.
Each of these database
machines contains all code and tables needed to execute all data management
functions; that is, services are not reserved to a centralized machine.
The provision for many database machines, each executing shared, reentrant
code and sharing control information, means that the database system need not
provide its own multitasking to handle concurrent transactions. Rather, one can
use the host operating system to multithread at the level of virtual machines.
Furthermore, the operating system can take advantage of multiprocessors allo-
cated to several virtual machines, since each machine is capable of providing all
data management services. A single-server approach would eliminate this advan-
tage, since most processing activity would then be focused on only one machine.
In addition to the database machines, Figure 2 also illustrates the Monitor
Machine, which contains many system administrator facilities. For example, the
Monitor Machine controls logon authorization and initializes the database machine
for each user. The Monitor also schedules periodic checkpoints and maintains
usage and performance statistics for reorganization and accounting purposes.
In Sections 2 and 3 we describe the main components of System R: the Relational
Data System and the Relational Storage System.
2. THE RELATIONAL DATA SYSTEM
The Relational Data Interface (RDI) is the principal external interface of System
R. It provides high level, data independent facilities for data retrieval, manipula-
tion, definition, and control. The data definition facilities of the RDI allow a variety
of alternative relational views to be defined on common underlying data. The
Relational Data System (RDS) is the subsystem which implements the RDI. The
RDS contains an optimizer which plans the execution of each RDI command,
choosing a low cost access path to data from among those provided by the Rela-
tional Storage System (RSS) .
The RDI consists of a set of operators which may be called from PL/I or other
host programming languages. (See Appendix I for a list of these operators.) All
the facilities of the
SEQUEL
data sublanguage [S] are available at the RDI by
means of the RDI operator called SEQUEL. (A Backus-Naur Form (BNF) syntax
for
SEQUEL
is given in Appendix II.) The
SEQUEL
language can be supported as a
stand-alone interface by a simple program, written on top of the RDI, which
handles terminal communications. (Such a stand-alone
SEQUEL
interface, called
the User-Friendly Interface, or UFI, is provided as a part of System R.) In addi-
tion, programs may be written on top of the RDI to support other relational inter-
faces, such as Query by Example [31], or to simulate nonrelational interfaces.
ACM TransactionsonDstabase Syystans,Vol. l.No. 2, June 1976.

System R
l
101
Host Language Interface
The facilities of the RDI are basically those of the SEQUEL data sublanguage, which
is described in [5] and in Appendix II. Several changes have been made to
SEQUEL
since the earlier publication of the language; they are described below.
The illustrative examples used in this section are based on the following database
of employees and their departments:
EMP(EMPN0, NAME, DNO, JOB, SAL, MGR)
DEPT(DN0, DNAME, LOC, NEMPS)
The RDI interfaces
SEQUEL
to a host programming language by means of a
concept called a cursor, A cursor is a name which is used at the RDI to identify a
set of tuples called its active
set
(e.g. the result of a query) and furthermore to main-
tain a position on one tuple of the set. The cursor is associated with a set of tuples
by means of the RDI operator SEQUEL; the tuples may then be retrieved, one at a
time, by the RDI operator FETCH.
Some host programs may know in advance exactly the degree and data types
of the tuples they wish to retrieve. Such a program may specify, in its SEQUEL call,
the program variables into which the resulting tuples are to be delivered. The pro-
gram must first give the system the addresses of the program variables to be used
by means of the RDI operator BIND, In the following example, the host program
identifies variables X and Y to the system and then issues a query whose results
are to be placed in these variables:
CALL BIND (‘Xl, ADDR(X)) ;
CALL BIND(‘Y’, ADDR(Y));
CALL SEQUEL(C1, ‘SELECT NAME:X, SAL:Y
FROM EMP
WHERE JOB = ’ ‘PROGRAMMER’ ’ ‘);
The SEQUEL call has the effect of associating the cursor Cl with the set of tuples
which satisfy the query and positioning it just before the first such tuple. The
optimizer is invoked to choose an access path whereby the tuples may be material-
ized. However, no tuples are actually materialized in response to the SEQUEL call.
The materialization of tuples is done as they are called for, one at a time, by the
FETCH operator. Each call to FETCH delivers the next tuple of the active set
into program variables X and Y, i.e. NAME to X and SAL to Y:
CALL FETCH(C1) ;
A program may wish to write a
SEQUEL
predicate based on the contents of a
program variable-for example, to find the programmers whose department num-
ber matches the contents of program variable 2. This facility is also provided by
the RDI BIND operator, as follows:
CALL BIND(‘X’, ADDR(X)) ;
CALL BIND(‘Y’, ADDR(Y));
CALL BIND(‘Z’, ADDR(2));
CALL SEQUEL(C1, ‘SELECT NAME:X, SAL:Y
FROM EMP
WHERE JOB = ’ ‘PROGRAMMER’ ’
AND DNO = 2’);
CALL FETCH(C1) ;
ACM Tramactions on Database System, Vol. 1, No. 2, June 1976.
剩余40页未读,继续阅读


















qq_31967983
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助

会员权益专享
安全验证
文档复制为VIP权益,开通VIP直接复制

评论0