6 1 Introduction to the SAS Language
statement 2 , and a group of statements used for analyzing a SAS data set
(called a SAS proc step) can be recognized because it begins with a proc
statement
3 . There may be several of each kind of these steps in a SAS pro-
gram that logically defines a data analysis task.
SAS interprets and executes these steps in their order of appearance in a
program. Therefore, the user must make sure that there is a logical progression
in the operations carried out. Thus, a proc step must follow the data step
that creates the SAS data set to be analyzed by that proc step. Although
statements in a data step are executed sequentially, in order that computations
are carried out on the data values as expected, statements within the step
must also satisfy this requirement, in general, except for certain declarative
or nonexecutable statements. For example, an input statement that defines
variables must precede executable SAS statements, such as SAS programming
statements, that references those variable names.
One very important characteristic of the execution of a SAS data step is
that the statements in a data step are executed and an observation written
to the output SAS data set, repeatedly for every line of data input in cyclic
fashion, until every data line is processed. A detailed discussion of data step
processing is given in Sect. 1.6.
The first statement following the data statement
2 in the data step usually
(but not always) is an input statement, especially when raw data are being
accessed. The input statement used here is a moderately complex example
of aformattedinputstatement, described in detail in Sect. 1.4.Thesymbols
and informats used to read the data values for the variables Income, Tax,
Age, and State from the data lines in SAS Example A1 and their effects are
itemized as follows:
• @4 causes SAS to begin reading each data line at column 4.
• 2*5.2 reads data values for Income and Tax from columns 4–8 and 9–13,
respectively, using the informat 5.2 twice, that is, two decimal places are
assumed for each value.
• 2. reads the data value for Age from columns 14 and 15 as a whole number
(i.e., a number without a fraction portion) using the informat 2.
• $2. reads the data value for State from columns 16 and 17 as a character
string of length 2, using the informat $2.
A semicolon symbol “;” appearing by itself in the first column in a data line
signals the end of the lines of raw data supplied instream in the current data
step. On its encounter, SAS proceeds to complete the creation of the SAS data
set named first by closing the file. The proc print;
3 that follows the data
step signals the beginning of a proc step. The SAS data set processed in this
proc step is, by default, the data set created immediately preceding it (in this
program the SAS data set first was the only one created). Again, by default,
all variables and observations in the SAS data set will be processed in this
proc step.
The output from execution of the SAS program consists of two parts: the
SAS Log (see Fig. 1.5), which is a running commentary on the results of ex-