1
Using SAS Functions in Data Steps
Yue Ye, The R.W. Johnson Pharmaceutical Research Institute, Raritan, NJ
Yong Lin, The Cancer Institute of New Jersey, New Brunswick, NJ
ABSTRACT
SAS base software has plenty of built-in functions. Proper use of
these functions can save a lot of programming time and effort. In
this paper, we will illustrate how to use some of new functions in
version 6.12 and later. These new functions include data set
functions, variable functions, external file functions, library and
catalog functions, and some other special functions.
In the paper, we will concentrate on using these functions in data
steps. Quite often, we need to get the information from SAS data
sets, libraries or external files in data steps. The conventional
way of getting the information is that first we run several SAS
procedures, then we merge the results with the SAS data set.
Using SAS functions directly in data step provides a simple
solution in many situations.
INTRODUCTION
There are many new SAS data step functions introduced since
version 6.12. These new functions include data set functions,
variable functions, external file functions, library and catalog
functions, and some other special functions. In this paper, we
want to show how to use these functions through several specific
examples. We try to use as many functions as possible so the
reader will be able to see the use of as many of these functions.
We hope this paper can serve as a tutorial for the use of SAS
data step functions.
Although all the functions discussed in this paper can be used
both in data step and in macro facility, we will demonstrate the
use of these functions only in SAS data step. The uses of these
functions in macro facility are similar to the uses in data step. The
main differences will be discussed in FINAL NOTES section.
The data set generated by the following SAS codes will be used
for the demonstration through this paper:
proc format;
value $ gender ’F’=’Female’ ’M’=’Male’;
data income(label=’Annual Income’);
input name &$20. street &$20. income +1 gender $1.;
format income dollar11.2 gender gender.;
label name=’Last Name, First Name’
street=’Address’
income=’Annual Income’
gender=’Gender’;
cards;
Leverling, Janet 55 Hazel Way 54789 F
Peacock, Margaret 101 Broadway 4565 F
Smith, John 23 Mars Hill Dr. 86685 M
Buchanan, Steven 45 Gray Drive 23567 M
Suyama, Michael 213 Hillside Ave. 65778 M
King, Robert 345 Main St. 45654 M
Callahan, Laura 455 8th St. 134656 F
Dodsworth, Anne 57 Pleasant Blvd. 5433 F
Davolio, Nancy 65 Peanut Circle 57654 F
;
Most of data set access functions use data set identifier obtained
from the OPEN function. The syntax of the OPEN function is
open(‘data-set-name’, ‘mode’)
where mode may be ‘i’, meaning read but not modified, or ‘in’,
meaning read sequentially and allowed to revisit an observation,
or ‘is’, meaning observations are read sequentially but not
allowed to revisit an observation. The OPEN function opens a
SAS data set and returns a unique numeric data set identifier.
For example, we may use the following data step to find the
number of observations, the number of variables and the label
assigned to the data set in the data set income
Example 1:
data ex1;
dsid=open(’income’,’i’);
n_obs=attrn(dsid,’nobs’);
n_vars=attrn(dsid,’nvars’);
dslabel=attrc(dsid,’label’);
run;
proc print;run;
Here we open SAS data set income in read only mode. The
return value of the OPEN function is assigned to the variable
dsid. This value is used in the ATTRN function to get the number
of observations and the number of variables in the data set, and
also used in the ATTRC function to obtain the label assigned to
the data set. The following is the output from the program:
OBS DSID N_OBS N_VARS DSLABEL
1 1 9 4 Annual Income
All data sets opened within a DATA step are closed automatically
at the end of the DATA step. Using the CLOSE function, you can
close any opened SAS data sets as soon as they are no longer
needed. The syntax of the CLOSE function is
close(data-set-id)
where data-set-id is the data set identifier of the corresponding
data set. You may assign the return value of the CLOSE function
to any variable in a data step. As an example, you may add the
statement
rc=close(dsid);
to the data step in Example 1 as the last statement of the data
step.
Similar to the OPEN and CLOSE functions for SAS data sets, the
FOPEN and FCLOSE functions are used for external files and the
DOPEN and DCLOSE functions are used for directories. The
arguments of both FOPEN and DOPEN use the file reference
assigned to the file or the directory. The FILENAME function can
be used to assign or deassign a file reference for an external file,
directory, or output device. The FILEREF function is used to
verify if a file reference has been defined. A file can also be
opened by directory ID and member name using the MOPEN
function. We will show the use of these functions along the way.
USING DATA SET FUNCTIONS
As we discussed before, the OPEN and the CLOSE functions are
the data set functions. Other useful new data set functions
introduced since version 6.12 are ATTRC, ATTRN, EXIST,
DSNAME, FETCH, FETCHOBS, CUROBS, NOTE, DROPNOTE,
POINT, and REWIND. The ATTRC and ATTRN functions are
used to retrieve the value of a character and numeric attributes of
a SAS data set, respectively. The EXIST function is used to verify