RES E A R C H Open Access
Improving alignment accuracy on
homopolymer regions for semiconductor-
based sequencing technologies
Weixing Feng
1
, Sen Zhao
1
, Dingkai Xue
1
, Fengfei Song
1
, Ziwei Li
1
, Duojiao Chen
1
,BoHe
1*
, Yangyang Hao
2
,
Yadong Wang
3
and Yunlong Liu
1,2*
From The International Conference on Intelligent Biology and Medicine (ICIBM) 2015
Indianapolis, IN, USA. 13-15 November 2015
Abstract
Background: Ion Torrent and Ion Proton are semiconductor-based sequencing technologies that feature rapid
sequencing speed and low upfront and operating costs, thanks to the avoidance of modified nucleotides and
optical measurements. Despite of these advantages, however, Ion semiconductor sequencing technologies suffer
much reduced sequencing accuracy at the genomic loci with homopolymer repeats of the same nucleotide. Such
limitation significantly reduces its efficiency for the biological applications aiming at accurately identifying various
genetic variants.
Results: In this study, we propose a Bayesian inference-based method that takes the advantage of the signal
distributions of the electrical voltages that are measured for all the homopolymers of a fixed l ength. By
cross-referencing the le ngth o f homopol ymers in the reference genome and the voltage sig nal distributio n
derived from the experiment, the proposed integrated model sign ificantly impro ves the alignment accuracy
around the homopolymer regions.
Conclusions: Besides improving alignme nt accuracy on homopolymer regions for semiconductor -based
sequencing technologies with the proposed model, similar strategies can also be used on other high-throughput
sequencing technologies that share similar limitations.
Keywords: Homopolymer, Ion Torrent/Proton, Bayesian, Alignment
Background
The rapid development of high-throughput sequencing
technologies leads to appearances of many innovative
sequencing platforms [1, 2]. Ion Torrent and Ion Proton
are semiconductor-based sequencing platforms that are
primarily designed for personal genome sequencing [3, 4].
Different from sequencing techniques enriched with
substitution errors [5, 6], Ion semiconductor sequencing
platforms suffer from the inaccuracy in detecting the
length of homopolymers repeats of the same nucleotide
[7, 8]. These homopolymer errors often lead to the
inaccurate local alignment results, and become a critical
barrier against accurate detection of genomic variations
[9–11] (http://www.broadinstitute.org/gatk/media/docs/Sa
mtools.pdf).
The sequencing chemistry for the Ion semiconductor-
based technology is that the incorporation of a deoxyri-
bonucleotide (dNTP) into a strand of DNA couples with
the release of a hydrogen ion, which changes the pH of
the solution and then leads to the electronic voltage
pulse in the ion sensor. Multiple identical bases on the
DNA strand often result in the detection of multiple
* Correspondence: bohe@hrbeu.edu.cn; yunliu@iupui.edu
1
Automation College, Harbin Engineering University, HarbinHeilongjiang
150001People’s Republic of China
Full list of author information is available at the end of the article
© 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
The Author(s) BMC Genomics 2016, 17(Suppl 7):521
DOI 10.1186/s12864-016-2894-9