使用Vienna RNA进行RNA二级结构预测

源码

需积分: 45 134 浏览量更新于2023-03-16 1 收藏 164KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源推荐

UNIT 12.2

RNA Secondary Structure Analysis Using the

Vienna RNA Package

The Vienna RNA package (Hofacker et al., 1994) is a free software package that

implements a variety of algorithms for the prediction and analysis of RNA secondary

structures. The various algorithms are usually accessed through several command-line

programs (discussed here), but the package also provides a C library that can be used to

develop new programs, as well as a Perl module that gives access to all functions of the

library from the Perl scripting language.

For structure prediction (see Basic Protocol 1), the package implements the classic

minimum free energy algorithm of Zuker and Stiegler (1981), the partition function

algorithm of McCaskill (1990), which calculates base pair probabilities in thermody-

namic equilibrium, and the suboptimal folding algorithm (Wuchty et al., 1999), which

generates all suboptimal structures within a given energy range of the optimal energy.

If several sequences are expected to share a common structure, highly accurate predictions

of the consensus structure can be obtained by combining thermodynamic rules with an

analysis of sequence variation and covariation. Such a method is implemented in the

RNAalifold program (Hofacker et al., 2002; see Basic Protocol 2).

Finally, the authors of the Vienna RNA package provide an algorithm for inverse folding,

i.e., to design sequences with a predefined structure (see Basic Protocol 3).

NOTE:

Investigators who are unfamiliar with the Unix environment should refer to

APPENDIX 1C

and

APPENDIX 1D

BASIC

PROTOCOL 1

USING THE RNAfold PROGRAM TO PREDICT RNA SECONDARY

STRUCTURE

Secondary structure prediction from individual sequences is the most frequently per-

formed task. Basic structure prediction is done using the RNAfold program; for short

sequences the RNAsubopt program can also be used. The programs support quite a few

options that modify the way the prediction is done. Here, only the default settings will be

used; all other options are described in detail on the RNAfold main page, and a few are

further discussed in the Commentary of this unit (see Critical Parameters and Trou-

bleshooting).

Necessary Resources

Hardware

A personal computer running Linux is recommended; a Unix workstation (e.g.,

from Sun, SGI, or IBM) or Macintosh under OS X may be used, but these

platforms are less well tested. PCs with MS Windows require significant extra

installation effort. For predictions on long sequences, sufficient memory should

be available: e.g., a complete HIV genome will require

∼

1 Gb of memory.

Software

Vienna RNA package (see Support Protocol)

A basic

plotting program (e.g., xmgrace;

http://plasma-gate.weizmann.ac.il/

Grace/

) for mountain plots; an alternative for use on most Unix systems would

be gnuplot (

http://www.gnuplot.info

)

Supplement 4

Contributed by Ivo L. Hofacker

Current Protocols in Bioinformatics

(2003) 12.2.1-12.2.12

12.2.1

Analyzing RNA

Sequence and

Structure

Files

One or more RNA sequences. The RNAfold program uses a “trivial” sequence

format with each sequence on a single line without embedded whitespace. Each

sequence may be preceded by a line starting with the

character followed by a

sequence name, which will be used for output filenames later. Thus, sequences

in FASTA format (

APPENDIX 1B

) can be converted simply by removing

whitespace and newlines within the sequence. For sequence files in other

formats, the program Readseq (

APPENDIX 1E

) can be used. A modified version of

Readseq that writes output suitable for RNAfold is included in the package.

Lowercase characters will be converted to uppercase and T’s will be replaced by

U’s. Any remaining characters except for A, C, G, U, I, X, and K will be treated

as nonpairing bases (

APPENDIX 1A

1. Download and install the Vienna RNA package (see Support Protocol).

Prepare the sequence file for input

2a.

To compute a single optimal secondary structure

(i.e., a structure with minimum free

energy, mfe):

Assuming that the sequence file of interest is named

file.seq

, type:

RNAfold < file.seq > file.fold

2b.

To compute optimal (mfe) structure, partition function, and pair probabilities:

Type

the command in step 2a and add a

-p

option:

RNAfold -p < file.seq > file.fold

Note that the program reads from stdin and writes to stdout, i.e., the < and > above are

necessary to redirect input and output. It is also possible to start the program without an

input file and type the sequence(s) on the terminal, or use the program in a pipe (i.e., have

another program produce the input). Depending on the length of the sequences, the

computation will take between a fraction of a second (e.g., for tRNA) and several hours

(for a complete viral genome).

3. Examine and interpret the output file.

The output file (

file.fold

in our example) first repeats the input sequence; the next line

contains the predicted mfe structure in bracket notation and its free energy in kcal/mol (Fig.

12.2.1). In the bracket notation, unpaired positions are represented by dots, while base

pairs (i, j) are represented by a pair of matching parentheses at positions i and j. Thus the

secondary structure

(((..((((...)))).)))

describes a stem-loop structure con-

sisting of an outer helix of 3 base pairs interrupted by an interior loop of size 3, a second

helix of length 4, and a hairpin loop of size 3.

If partition function folding was selected above (step 2b), the next line contains another

string giving a condensed representation of the pair probabilities followed by the ensemble

free energy in kcal/mol (Fig. 12.2.1). The structure string is similar to the bracket notation

but contains additional symbols: parentheses represent positions with strong tendency to

pair and dots represent positions that are mostly unpaired, while curly brackets and

commas represent positions with less clear pairing preferences. See the manual

(http://www.tbi.univie.ac.at/~ivo/RNA/RNAfold.html) for the exact definitions.

From the minimum free energy, E, and the ensemble free energy, F, the frequency of the mfe

structure in thermodynamic equilibrium can be computed as:

This value is given on the last line. The mfe structure is well defined when the difference

−

F is small, and the two structure strings look similar. The more well defined the

structure, the more confidence one may have in the accuracy of the prediction.

()

exp

− − 





Supplement 4 Current Protocols in Bioinformatics

12.2.2

RNA Secondary

Structure

Analysis Using

the Vienna RNA

Package

4. View the PostScript figures.

Apart from the text output, RNAfold produces a PostScript structure drawing, suitable for

inclusion in publications as well as for printing on any PostScript-capable printer (Fig.

12.2.1). For on-screen, viewing a PostScript viewer such as GhostScript (or one of its front

ends, i.e., gv or gsview; http://www.cs.wisc.edu/~ghost/) is needed. If the input defined a

sequence name (say

seq1

), it will be used to name the PostScript file (e.g,.

seq1 ss.ps

);

otherwise the default filename

rna.ps

will be used.

Pair probabilities will be written in the form of a PostScript “dot plot.” The dot plot shows

a n

n matrix of squares, such that the area of the square at row i and column j in the

upper right half is proportional to probability of the pair (i, j), while the lower left half

shows all pairs belonging to the mfe structure. The name of the dot plot file will again be

derived from the sequence name (e.g.,

seq1 dp.ps

) or the default filename

dot.ps

will be used.

Dot plots are an excellent way to visualize structural alternatives. For an RNA with

well-defined mfe structure, the upper right half should only contain a few small additional

dots compared to the lower left. The PostScript dot plot is constructed such that the actual

pair probabilities can be easily read from the file itself (see, e.g., step 5).

5. Produce a mountain plot.

Secondary structure graphs and dot plots both become cumbersome for long file sequences.

A mountain plot is a structure representation that works well even for long sequences, and

which is well suited for comparing structures. A mountain plot is an x-y graph that plots

the number of base pairs enclosing a sequence position, or, for pair probabilities, the

average number of enclosing pairs. The Perl script

mountain.pl

can be used to produce

the coordinates for a mountain plot from a dot plot PostScript file. The result can then be

plotted with any x-y plotting program. Using, e.g., the xmgrace plotting program, the

following command is typed:

mountain.pl seq1_dp.ps | xmgrace -pipe

If a

mountain.pl: Command not found

error is encountered, use the full path in

the command (e.g.,

/usr/local/share/ViennaRNA/bin/mountain.pl

The resulting plot shows three curves: two mountain plots derived from mfe structure and

pair probabilities and a positional entropy derived from the pair probabilities:

where p

is the probability of i being unpaired. Well-defined regions are marked by low

entropy.

6. Include experimental constraints.

Secondary structure prediction is of course error-prone, and no prediction should be

trusted blindly without experimental support. If any experimental results (such as chemical

probing data) are available, it is possible to test whether the prediction is compatible with

the experimental data. Furthermore, constraints can be used to ensure that RNAfold will

only consider structures compatible with the constraints.

To do constrained folding, open the sequence file in a text editor and add another line after

the sequence consisting of the symbols

, and matching parentheses,

()

. A pair of

matching parentheses signify that the corresponding positions must form a base pair. A

vertical line (

) marks a position that must pair, and an

marks a position that must not

pair. The dot (

) marks positions without constraint. Refold the sequences with constraints

using the

-C

option:

RNAfold -p -C < file_c.seq > file_c.fold

One can now compare the constrained and unconstrained foldings. Ideally, the constraints

should only lead to a small change in energy.

iijijii

log log

=− −

∑

Spppp

Current Protocols in Bioinformatics Supplement 4

12.2.3

Analyzing RNA

Sequence and

Structure

剩余11页未读，继续阅读

weixin_38669628

粉丝: 385
资源: 6万+

会员权益专享

使用Vienna RNA进行RNA二级结构预测

Vienna-RNA-package

LinearFold:LinearFold的新源代码，用于RNA二级结构的线性时间预测

RNAviz2.0 RNA二级结构

mfold:大规模折叠 DNA 的二级结构

Weinberg-CMfinder:预测和评分 RNA 二级结构-开源

计算最大堆迭的RNA二级结构预测算法 (2005年)

RNA二级结构预测方法综述

RnaViz-开源

rna二级构预测

RNA二级结构预测

基于改进YOLO的玉米病害识别系统（部署教程＆源码）

非系统Android图片裁剪工具

美赛：数学建模相关算法 MATLAB实现项目源码.zip（教程+源代码+附上详细代码说明）

海信电视刷机数据 LED46K16X3D（0001） 生产用软件数据 务必确认机编一致 整机USB升级程序

嵌入式stm32f103项目实例.pdf

串的子系统,数据结构C语言处理，有关串的所有基本都有，还有注解

基于Gabor滤波的指纹图像增强算法matlab仿真. +代码操作视频 2.rar

基于STM32的便携式脑电信号采集处理系统设计1.zip

hutool-all-5.8.27.jar.zip

2019214249 酒店管理系统 源代码springboot + VUE 前后端结合的项目 ->酒店管理的项目2.zip

会员权益专享

最新资源

海信电视刷机数据 LED46K16X3D（0001）生产用软件数据务必确认机编一致整机USB升级程序

2019214249 酒店管理系统源代码springboot + VUE 前后端结合的项目 ->酒店管理的项目2.zip