454 Sequencing System Software Manual
General Overview and Data File Formats
454 Sequencing System Software Manual, May 2011 10
GS De Novo
Assembler
SFF files, from
one or multiple
sequencing
Runs,
containing read
flowgrams and
basecalls, and
per-base
quality scores
Sample
consensus
sequence,
assembled de
novo (and
scaffold
information,
with Paired End
option)
Identify pairwise overlaps between reads, in nucleotide space
Construct multiple alignments of reads that tile together (i.e.
form contigs), based on the pairwise overlaps
Generate consensus basecalls of the contigs by averaging the
processed flow signals for each nucleotide flow included in the
alignment, in flowspace
Output the contig consensus sequences and corresponding
quality scores, along with an ACE file of the multiple alignments
and assembly metrics files
Additional steps with Paired End option:
Identify pairwise overlaps between Paired End tags and the
shotgun contigs
Organize the contigs into scaffolds (order, orientation, and
approximate distance)
Output the scaffolded consensus sequences and
corresponding quality scores, along with an AGP file of the
scaffolds and specific metrics Tables
GS Reference
Mapper
Sample
consensus
sequence,
mapped to a
reference
sequence; and
list of
differences
For each read, search for alignment(s) to the reference
sequence, in nucleotide space
Construct contigs and compute a consensus basecall sequence
from the signals of the aligned reads (flowspace)
Identify the positions where the consensus or subsets of the
reads that comprise it differ from the reference sequence (or
reads from one another); these are the “putative differences”
Evaluate the putative differences to identify high-confidence
differences
Output contig consensus sequence(s) and corresponding
quality scores, an ACE file of the multiple alignments of the
reads and contigs to the reference, the list of identified
differences, and mapping metrics files
GS Amplicon
Variant
Analyzer
quantitation of
sequence
variants
Trim reads (remove primer sequences)
Assign reads to “Samples” (demultiplex data sets)
Align Sample reads to their reference sequences
Quantitate variant frequency for each Sample
Table 2: The 3 applications of the data analysis phase of the 454 Sequencing System, with their inputs, outputs, and main
processing steps. Note that all data analysis applications use as input the reads and flowgrams output in SFF format by the
data processing (GS Run Processor application). For a full description of the various data analysis applications, see Parts C
and D in this manual.
The software package described in this manual also includes a variety of applications that are used primarily
or exclusively off-instrument (on a DataRig or GS Junior Attendant PC). The GS Reporter and the GS Run
Browser applications are used to view and troubleshoot the results of a completed sequencing Run; the GS
Support Tool is used to package sequencing Run data to send to Roche Customer Support for further help
and troubleshooting; and the SFF Tools are a set of commands used to create, manipulate and access
sequencing trace data from SFF files. However, these applications and commands are not required steps of
data processing and analysis.