6
denote the corresponding complements. Thus, strand w
i
and complement c
i
form a perfect match (duplex), whereas
strand w
i
and complement c
j
, and complements c
i
and c
j
, with j ≠ i, form mismatches. We will call probes the strands
attached to the surface of microarrays, while targets are the corresponding complements floating in the solution. We
adopt this terminology because it is widely used in the literature on DNA microarrays.
Finally, we formalize the optimization version of the DNA Strand Design Problem.
The DNA Strand Design (DSD) Problem is defined as follows: with respect to a fixed set C of constraints (which are
part of the input and will be detailed in the reminder of the section), given a strand length n, find the largest possible set
S with DNA strands of length n, satisfying all the constraints in
C
.
The constraints used in the design of DNA strands can be grouped into two classes: combinatorial and thermodynamic
constraints. The grouping is based on the measures that estimate the quality of one or more DNA strands upon which
the constraint is applied. Combinatorial constraints mostly rely on verifying the presence or absence of certain
nucleotides in a strand, and counting of matches and/or mismatches for pairs of DNA strands, whereas thermodynamic
constraints take into account the biochemical properties that govern DNA hybridization.
COMBINATORIAL CONSTRAINTS
Combinatorial constraints are widely used to design large sets of non-interacting DNA strands. The necessity of
designing sets of strands that avoid cross-hybridization can be superficially modeled by enforcing a high number of
mismatches between all possible pairs of strands and between pairs of strands and the complements of others. The
stability and uniformity of DNA strands can be also modeled with combinatorial constraints by either controlling the
presence or absence of undesired subsequences or counting specific bases within the same strand. Various types of
combinatorial constraints can be considered. In the following paragraphs we will formalize those of interest for our
study.
C1: Direct Mismatches. The number of mismatches in a perfect alignment of two strands (also referred to as Hamming
distance) must be above a given threshold d. Here, a perfect alignment is a pairing of bases from the two strands,
where the i
th
base of the first strand is paired with the i
th
base of the second strand, and a mismatch is a pairing of
two distinct bases. For example, G paired with A, C, or T is a mismatch whereas G paired with G is a match. For
example, the strands w=5‘-AACAA-3‘ and w
I
=5‘-AAGAA-3‘ have one mismatch at position i = 3, where i = 1,
2, …, 5.
C2: Complement Mismatches. The number of mismatches in a perfect alignment of a strand and a complement must
be above a given threshold d. For example, the complement of strand w=5‘-AACAA-3‘, which is c=5‘-TTGTT-
3‘ and strand w
I
=5‘-TTATT-3‘ have one mismatch at position i = 3, where i = 1, 2, …, 5. This constraint is also
referred to as the reverse-complement constraint.
C3: GC Content. The number of G‘s and/or C‘s in a strand must be in a given range. For example, all strands
presented in (Braich et al., 2003) have 4, 5 or 6 C‘s. One such strand is w=5‘-TTACACCAATCTCTT-3‘, which
contains five Cs.
Constraint C3 can be used to attain desired hybridization. Roughly, the stability of a strand-complement pair increases
as the content of G‘s and C‘s within strands increases. Constraint C3 is also used to ensure uniform melting
temperatures across desired strand-complement pairings. Constraints C1 and C2 can be used in various combinations to
avoid undesired hybridization. As it is widely known, the strength of hybridization between two strands (or between
bases of the same strand) depends roughly on the number of nucleotide bonds formed (longer perfectly paired regions
are more stable than shorter ones; this is modeled by C1 and C2), and on the types of nucleotides involved in bonding.
It can be observed that checking if a combinatorial constraint is satisfied can be done with computational complexity
O(n). This makes these constraints extremely tractable from a computational point of view.
THERMODYNAMIC CONSTRAINTS
While combinatorial constraints provide a good starting point for designing sets of strands for various biotechnological