没有合适的资源?快使用搜索试试~ 我知道了~
首页使用Vienna RNA进行RNA二级结构预测
使用Vienna RNA进行RNA二级结构预测
需积分: 45 10 下载量 110 浏览量
更新于2023-03-16
1
收藏 164KB PDF 举报
RNA Secondary Structure Analysis Using the Vienna RNA Package 博文链接:https://bbsunchen.iteye.com/blog/1239572
资源详情
资源推荐
UNIT 12.2
RNA Secondary Structure Analysis Using the
Vienna RNA Package
The Vienna RNA package (Hofacker et al., 1994) is a free software package that
implements a variety of algorithms for the prediction and analysis of RNA secondary
structures. The various algorithms are usually accessed through several command-line
programs (discussed here), but the package also provides a C library that can be used to
develop new programs, as well as a Perl module that gives access to all functions of the
library from the Perl scripting language.
For structure prediction (see Basic Protocol 1), the package implements the classic
minimum free energy algorithm of Zuker and Stiegler (1981), the partition function
algorithm of McCaskill (1990), which calculates base pair probabilities in thermody-
namic equilibrium, and the suboptimal folding algorithm (Wuchty et al., 1999), which
generates all suboptimal structures within a given energy range of the optimal energy.
If several sequences are expected to share a common structure, highly accurate predictions
of the consensus structure can be obtained by combining thermodynamic rules with an
analysis of sequence variation and covariation. Such a method is implemented in the
RNAalifold program (Hofacker et al., 2002; see Basic Protocol 2).
Finally, the authors of the Vienna RNA package provide an algorithm for inverse folding,
i.e., to design sequences with a predefined structure (see Basic Protocol 3).
NOTE:
Investigators who are unfamiliar with the Unix environment should refer to
APPENDIX 1C
and
APPENDIX 1D
.
BASIC
PROTOCOL 1
USING THE RNAfold PROGRAM TO PREDICT RNA SECONDARY
STRUCTURE
Secondary structure prediction from individual sequences is the most frequently per-
formed task. Basic structure prediction is done using the RNAfold program; for short
sequences the RNAsubopt program can also be used. The programs support quite a few
options that modify the way the prediction is done. Here, only the default settings will be
used; all other options are described in detail on the RNAfold main page, and a few are
further discussed in the Commentary of this unit (see Critical Parameters and Trou-
bleshooting).
Necessary Resources
Hardware
A personal computer running Linux is recommended; a Unix workstation (e.g.,
from Sun, SGI, or IBM) or Macintosh under OS X may be used, but these
platforms are less well tested. PCs with MS Windows require significant extra
installation effort. For predictions on long sequences, sufficient memory should
be available: e.g., a complete HIV genome will require
∼
1 Gb of memory.
Software
Vienna RNA package (see Support Protocol)
A basic
x
-
y
plotting program (e.g., xmgrace;
http://plasma-gate.weizmann.ac.il/
Grace/
) for mountain plots; an alternative for use on most Unix systems would
be gnuplot (
http://www.gnuplot.info
)
Supplement 4
Contributed by Ivo L. Hofacker
Current Protocols in Bioinformatics
(2003) 12.2.1-12.2.12
Copyright © 2003 by John Wiley & Sons, Inc.
12.2.1
Analyzing RNA
Sequence and
Structure
Files
One or more RNA sequences. The RNAfold program uses a “trivial” sequence
format with each sequence on a single line without embedded whitespace. Each
sequence may be preceded by a line starting with the
>
character followed by a
sequence name, which will be used for output filenames later. Thus, sequences
in FASTA format (
APPENDIX 1B
) can be converted simply by removing
whitespace and newlines within the sequence. For sequence files in other
formats, the program Readseq (
APPENDIX 1E
) can be used. A modified version of
Readseq that writes output suitable for RNAfold is included in the package.
Lowercase characters will be converted to uppercase and T’s will be replaced by
U’s. Any remaining characters except for A, C, G, U, I, X, and K will be treated
as nonpairing bases (
APPENDIX 1A
).
1. Download and install the Vienna RNA package (see Support Protocol).
Prepare the sequence file for input
2a.
To compute a single optimal secondary structure
(i.e., a structure with minimum free
energy, mfe):
Assuming that the sequence file of interest is named
file.seq
, type:
RNAfold < file.seq > file.fold
2b.
To compute optimal (mfe) structure, partition function, and pair probabilities:
Type
the command in step 2a and add a
-p
option:
RNAfold -p < file.seq > file.fold
Note that the program reads from stdin and writes to stdout, i.e., the < and > above are
necessary to redirect input and output. It is also possible to start the program without an
input file and type the sequence(s) on the terminal, or use the program in a pipe (i.e., have
another program produce the input). Depending on the length of the sequences, the
computation will take between a fraction of a second (e.g., for tRNA) and several hours
(for a complete viral genome).
3. Examine and interpret the output file.
The output file (
file.fold
in our example) first repeats the input sequence; the next line
contains the predicted mfe structure in bracket notation and its free energy in kcal/mol (Fig.
12.2.1). In the bracket notation, unpaired positions are represented by dots, while base
pairs (i, j) are represented by a pair of matching parentheses at positions i and j. Thus the
secondary structure
(((..((((...)))).)))
describes a stem-loop structure con-
sisting of an outer helix of 3 base pairs interrupted by an interior loop of size 3, a second
helix of length 4, and a hairpin loop of size 3.
If partition function folding was selected above (step 2b), the next line contains another
string giving a condensed representation of the pair probabilities followed by the ensemble
free energy in kcal/mol (Fig. 12.2.1). The structure string is similar to the bracket notation
but contains additional symbols: parentheses represent positions with strong tendency to
pair and dots represent positions that are mostly unpaired, while curly brackets and
commas represent positions with less clear pairing preferences. See the manual
(http://www.tbi.univie.ac.at/~ivo/RNA/RNAfold.html) for the exact definitions.
From the minimum free energy, E, and the ensemble free energy, F, the frequency of the mfe
structure in thermodynamic equilibrium can be computed as:
This value is given on the last line. The mfe structure is well defined when the difference
E
−
F is small, and the two structure strings look similar. The more well defined the
structure, the more confidence one may have in the accuracy of the prediction.
()
exp
− −
=
EF
p
RT
Supplement 4 Current Protocols in Bioinformatics
12.2.2
RNA Secondary
Structure
Analysis Using
the Vienna RNA
Package
4. View the PostScript figures.
Apart from the text output, RNAfold produces a PostScript structure drawing, suitable for
inclusion in publications as well as for printing on any PostScript-capable printer (Fig.
12.2.1). For on-screen, viewing a PostScript viewer such as GhostScript (or one of its front
ends, i.e., gv or gsview; http://www.cs.wisc.edu/~ghost/) is needed. If the input defined a
sequence name (say
seq1
), it will be used to name the PostScript file (e.g,.
seq1 ss.ps
);
otherwise the default filename
rna.ps
will be used.
Pair probabilities will be written in the form of a PostScript “dot plot.” The dot plot shows
a n
×
n matrix of squares, such that the area of the square at row i and column j in the
upper right half is proportional to probability of the pair (i, j), while the lower left half
shows all pairs belonging to the mfe structure. The name of the dot plot file will again be
derived from the sequence name (e.g.,
seq1 dp.ps
) or the default filename
dot.ps
will be used.
Dot plots are an excellent way to visualize structural alternatives. For an RNA with
well-defined mfe structure, the upper right half should only contain a few small additional
dots compared to the lower left. The PostScript dot plot is constructed such that the actual
pair probabilities can be easily read from the file itself (see, e.g., step 5).
5. Produce a mountain plot.
Secondary structure graphs and dot plots both become cumbersome for long file sequences.
A mountain plot is a structure representation that works well even for long sequences, and
which is well suited for comparing structures. A mountain plot is an x-y graph that plots
the number of base pairs enclosing a sequence position, or, for pair probabilities, the
average number of enclosing pairs. The Perl script
mountain.pl
can be used to produce
the coordinates for a mountain plot from a dot plot PostScript file. The result can then be
plotted with any x-y plotting program. Using, e.g., the xmgrace plotting program, the
following command is typed:
mountain.pl seq1_dp.ps | xmgrace -pipe
If a
mountain.pl: Command not found
error is encountered, use the full path in
the command (e.g.,
/usr/local/share/ViennaRNA/bin/mountain.pl
).
The resulting plot shows three curves: two mountain plots derived from mfe structure and
pair probabilities and a positional entropy derived from the pair probabilities:
where p
i
u
is the probability of i being unpaired. Well-defined regions are marked by low
entropy.
6. Include experimental constraints.
Secondary structure prediction is of course error-prone, and no prediction should be
trusted blindly without experimental support. If any experimental results (such as chemical
probing data) are available, it is possible to test whether the prediction is compatible with
the experimental data. Furthermore, constraints can be used to ensure that RNAfold will
only consider structures compatible with the constraints.
To do constrained folding, open the sequence file in a text editor and add another line after
the sequence consisting of the symbols
x
,
|
,
.
, and matching parentheses,
()
. A pair of
matching parentheses signify that the corresponding positions must form a base pair. A
vertical line (
|
) marks a position that must pair, and an
x
marks a position that must not
pair. The dot (
.
) marks positions without constraint. Refold the sequences with constraints
using the
-C
option:
RNAfold -p -C < file_c.seq > file_c.fold
One can now compare the constrained and unconstrained foldings. Ideally, the constraints
should only lead to a small change in energy.
uu
iijijii
j
log log
=− −
∑
Spppp
Current Protocols in Bioinformatics Supplement 4
12.2.3
Analyzing RNA
Sequence and
Structure
剩余11页未读,继续阅读
weixin_38669628
- 粉丝: 385
- 资源: 6万+
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- 保险服务门店新年工作计划PPT.pptx
- 车辆安全工作计划PPT.pptx
- ipqc工作总结PPT.pptx
- 车间员工上半年工作总结PPT.pptx
- 保险公司员工的工作总结PPT.pptx
- 报价工作总结PPT.pptx
- 冲压车间实习工作总结PPT.pptx
- ktv周工作总结PPT.pptx
- 保育院总务工作计划PPT.pptx
- xx年度现代教育技术工作总结PPT.pptx
- 出纳的年终总结PPT.pptx
- 贝贝班班级工作计划PPT.pptx
- 变电值班员技术个人工作总结PPT.pptx
- 大学生读书活动策划书PPT.pptx
- 财务出纳月工作总结PPT.pptx
- 大学生“三支一扶”服务期满工作总结(2)PPT.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功