Before describing the regular expression needed for the task, some general comments about the way the PRX functions are
used in this example used are in order. In the code in Figure 2A, the regular expression is compiled only once at the first
iteration of the DATA step using the PRXPARSE function. The value assigned to the variable _re by the compilation of
the regular expression is retained and available for future use as an argument to the PRXMATCH function. Checking for
errors in the regular expression is also done at this juncture. If the regular expression is not syntactically correct, a missing
value will be assigned to variable _re. If this occurs, a polite reminder is written to the SAS log and the DATA step will
then terminate. The compile once behavior can also be accomplished using the /o modifier under most conditions. Future
examples will have the regex entered directly in the PRXMATCH function located in the WHERE clause of PROC SQL,
where one-time compilation of the regex occurs automatically. Generally speaking, one-time compilation of the regex
occurs when used in WHERE clauses of any PROC or DATA step. However, you lose the ability to do any error checking
on the regex.
Comments on the regex defined in the PRXPARSE function are included in Figure 2A. The variable match_pos contains
the starting position in the variable where the first regex match begins. If the regex does not match the variable in the
second argument in the PRXMATCH function, then match_pos is assigned the value of 0. The field OcularYN is a
Boolean field that has a value of 1 if the observation is flagged as an ocular adverse event (i.e. match_pos greater than
zero). Displayed in Figure 2B below are some observations from the Find_Ocular_AE data set. Notice that we were
indeed successful flagging observation #4 for the
OS part of the string and not the ou part of ‘Dangerous’, which is
confirmed by noting the match starts at position 20 and not 7.
Figure 2B – Partial Results from Ocular AE Matching
Obs Adverse Event Text match_pos OcularYN
4
Dangerous glaucoma OS 20 1
5
migraine headache 0 0
6
lower backache 0 0
7
festering headwound 0 0
8
O.D. has issues 1 1
The ‘ou’ from
‘headwound’
does not
match the
definition of
an ocular
adverse event
Now consider the regular expression in Figure 2C below where the metacharacter \b is replaced by the character class \s
3
.
Is the regular expression below equivalent to the one in Figure 2A?
Figure 2C – \b is a Zero-Width Assertion
_re=prxparse('/\s[o0]\.?[uds]\.?/i');
The character class \s
consumes at least one byte
of space, but \b does not.
Obs Adverse Event Text match_pos OcularYN
8
O.D. has issues 0 0
Inspecting the values of the fields in observation #8 we find that the two regexen are not equivalent. The reason is that \s
consumes at least one byte of the character field. The string O.D. is in the first four bytes of the field, so the leading space
character is not matched. The metacharacter \b does not consume any bytes, and in Perl this is referred to as a zero-width
assertion. This supermodel-like phenomenon will show up again throughout this paper.
3
To refresh your memory, \s is the Whitespace character class which includes space, tab, carriage return character and some
other peculiar beasts.
3
Posters