followed by what you would type in:
>>> from Bio.Seq import Seq
>>> my_seq = Seq("AGTACACTGGT")
>>> my_seq
Seq('AGTACACTGGT')
>>> print(my_seq)
AGTACACTGGT
The Seq object differs from the Python string in the methods it supports. You can’t do this with a plain
string:
>>> my_seq
Seq('AGTACACTGGT')
>>> my_seq.complement()
Seq('TCATGTGACCA')
>>> my_seq.reverse_complement()
Seq('ACCAGTGTACT')
The next most important class is the SeqRecord or Sequence Record. This holds a sequence (as a Seq
object) with additional annotation including an identifier, name and description. The Bio.SeqIO module
for reading and writing sequence file formats works with SeqRecord objects, which will be introduced below
and covered in more detail by Chapter 5.
This covers the basic features and uses of the Biopython sequence class. Now that you’ve got some idea
of what it is like to interact with the Biopython libraries, it’s time to delve into the fun, fun world of dealing
with biological file formats!
2.3 A usage example
Before we jump right into parsers and everything else to do with Biopython, let’s set up an example to
motivate everything we do and make life more interesting. After all, if there wasn’t any biology in this
tutorial, why would you want you read it?
Since I love plants, I think we’re just going to have to have a plant based example (sorry to all the fans
of other organisms out there!). Having just completed a recent trip to our local greenhouse, we’ve suddenly
developed an incredible obsession with Lady Slipper Orchids (if you wonder why, have a look at some Lady
Slipper Orchids photos on Flickr, or try a Google Image Search).
Of course, orchids are not only beautiful to look at, they are also extremely interesting for people studying
evolution and systematics. So let’s suppose we’re thinking about writing a funding proposal to do a molecular
study of Lady Slipper evolution, and would like to see what kind of research has already been done and how
we can add to that.
After a little bit of reading up we discover that the Lady Slipper Orchids are in the Orchidaceae family and
the Cypripedioideae sub-family and are made up of 5 genera: Cypripedium, Paphiopedilum, Phragmipedium,
Selenipedium and Mexipedium.
That gives us enough to get started delving for more information. So, let’s look at how the Biopython
tools can help us. We’ll start with sequence parsing in Section 2.4, but the orchids will be back later on as
well - for example we’ll search PubMed for papers about orchids and extract sequence data from GenBank in
Chapter 9, extract data from Swiss-Prot from certain orchid proteins in Chapter 10, and work with ClustalW
multiple sequence alignments of orchid proteins in Section 6.5.1.
2.4 Parsing sequence file formats
A large part of much bioinformatics work involves dealing with the many types of file formats designed to
hold biological data. These files are loaded with interesting biological data, and a special challenge is parsing
15