Python编程在生物信息学中的应用

需积分: 18 6 下载量 180 浏览量 更新于2024-07-17 收藏 8.72MB PDF 举报
"Bioinformatics Programming Using Python.pdf" 是一本由 Mitchell L. Model 撰写的书籍,主要讨论如何使用 Python 语言进行生物信息学编程。该书由 O'Reilly Media 出版,适用于教育、商业或销售推广用途,并提供在线版本。 在生物信息学领域,Python 是一个非常流行的语言选择,因其简洁的语法和强大的库支持而备受青睐。本书旨在教授读者如何利用 Python 来解决生物信息学中的问题,包括数据处理、序列比对、基因组分析、蛋白质结构预测等多个方面。 书中可能涵盖的知识点可能包括: 1. **Python基础知识**:介绍 Python 的基本语法,如变量、数据类型(如字符串、列表、元组、字典)、控制流(条件语句、循环)以及函数定义。 2. **生物信息学数据格式**:讲解常见的生物信息学数据格式,如 FASTA、GenBank、BED 和 GFF 文件,以及如何用 Python 读取和解析这些格式的数据。 3. **序列比对**:探讨如何使用 Python 实现序列比对算法,如 Smith-Waterman 和 Needleman-Wunsch,以及比对工具的使用,例如 ClustalW 或 MUSCLE。 4. **基因组分析**:介绍基因组数据的处理,如基因预测、SNP 检测、重测序数据分析等,可能会涉及 BioPython 库的使用。 5. **蛋白质结构与功能预测**:探讨如何使用 Python 进行蛋白质结构预测,如二级结构预测、折叠识别,以及如何通过序列相似性搜索预测蛋白质功能。 6. **生物数据库交互**:讲解如何使用 Python 访问和检索 NCBI、UniProt 等公共生物信息学数据库,以及 EBI 的 Web Services。 7. **数据可视化**:介绍如何使用 Python 工具(如 Matplotlib 和 Seaborn)来可视化生物信息学数据,帮助理解分析结果。 8. **机器学习与数据挖掘**:可能涉及到使用 Python 的 Scikit-learn 库进行分类、回归和聚类分析,用于预测基因表达、蛋白质相互作用等生物问题。 9. **算法实现**:书中可能会讲解一些基础的生物信息学算法,如 BLAST、PHYLIP 等,鼓励读者自己实现这些算法以增进理解。 10. **软件工程实践**:介绍良好的编程习惯,如代码组织、版本控制(Git)和测试,以确保生物信息学程序的可靠性和可维护性。 这本书是针对希望利用 Python 进行生物信息学研究或分析的初学者和专业人士,它提供了从基础到高级的全面教程,帮助读者构建必要的技能来处理生物学数据和解决实际问题。
2017-09-19 上传
最新的讲授将Python用于生物信息编程的书籍,希望大家喜欢。目录如下: Conventions 4 1.2.2 Python Versions 5 1.2.3 Code Style 5 1.2.4 Get the Most from This Book without Reading It All 6 1.2.5 Online Resources Related to This Book 7 1.3 WHY LEARN TO PROGRAM? 7 1.4 BASIC PROGRAMMING CONCEPTS 8 1.4.1 What Is a Program? 8 1.5 WHY PYTHON? 10 1.5.1 Main Features of Python 10 1.5.2 Comparing Python with Other Languages 11 1.5.3 How Is It Used? 14 1.5.4 Who Uses Python? 15 1.5.5 Flavors of Python 15 1.5.6 Special Python Distributions 16 1.6 ADDITIONAL RESOURCES 17 Chapter 2 First Steps with Python 19 2.1 INSTALLING PYTHON 20 2.1.1 Learn Python by Using It 20 2.1.2 Install Python Locally 20 2.1.3 Using Python Online 21 2.1.4 Testing Python 22 2.1.5 First Use 22 2.2 INTERACTIVE MODE 23 2.2.1 Baby Steps 23 2.2.2 Basic Input and Output 23 2.2.3 More on the Interactive Mode 24 2.2.4 Mathematical Operations 26 2.2.5 Exit from the Python Shell 27 2.3 BATCH MODE 27 2.3.1 Comments 29 2.3.2 Indentation 30 2.4 CHOOSING AN EDITOR 32 2.4.1 Sublime Text 32 2.4.2 Atom 33 2.4.3 PyCharm 34 2.4.4 Spyder IDE 35 2.4.5 Final Words about Editors 36 2.5 OTHER TOOLS 36 2.6 ADDITIONAL RESOURCES 37 2.7 SELF-EVALUATION 37 Chapter 3 Basic Programming: Data Types 39 3.1 STRINGS 40 3.1.1 Strings Are Sequences of Unicode Characters 41 3.1.2 String Manipulation 42 3.1.3 Methods Associated with Strings 42 3.2 LISTS 44 3.2.1 Accessing List Elements 45 3.2.2 List with Multiple Repeated Items 45 3.2.3 List Comprehension 46 3.2.4 Modifying Lists 47 3.2.5 Copying a List 49 3.3 TUPLES 49 3.3.1 Tuples Are Immutable Lists 49 3.4 COMMON PROPERTIES OF THE SEQUENCES 51 3.5 DICTIONARIES 54 3.5.1 Mapping: Calling Each Value by a Name 54 3.5.2 Operating with Dictionaries 56 3.6 SETS 59 3.6.1 Unordered Collection of Objects 59 3.6.2 Set Operations 60 3.6.3 Shared Operations with Other Data Types 62 3.6.4 Immutable Set: Frozenset 63 3.7 NAMING OBJECTS 63 3.8 ASSIGNING A VALUE TO A VARIABLE VERSUS BINDING A NAME TO AN OBJECT 64 3.9 ADDITIONAL RESOURCES 67 3.10 SELF-EVALUATION 68 Chapter 4 Programming: Flow Control 69 4.1 IF-ELSE 69 4.1.1 Pass Statement 74 4.2 FOR LOOP 75 4.3 WHILE LOOP 77 4.4 BREAK: BREAKING THE LOOP 78 4.5 WRAPPING IT UP 80 4.5.1 Estimate the Net Charge of a Protein 80 4.5.2 Search for a Low-Degeneration Zone 81 4.6 ADDITIONAL RESOURCES 83 4.7 SELF-EVALUATION 83 Chapter 5 Handling Files 85 5.1 READING FILES 86 5.1.1 Example of File Handling 87 5.2 WRITING FILES 89 5.2.1 File Reading and Writing Examples 90 5.3 CSV FILES 90 5.4 PICKLE: STORING AND RETRIEVING THE CONTENTS OF VARI- ABLES 94 5.5 JSON FILES 96 5.6 FILE HANDLING: OS, OS.PATH, SHUTIL, AND PATH.PY MODULE 98 5.6.1 path.py Module 100 5.6.2 Consolidate Multiple DNA Sequences into One FASTA File 102 5.7 ADDITIONAL RESOURCES 102 5.8 SELF-EVALUATION 103 Chapter 6 Code Modularizing 105 6.1 INTRODUCTION TO CODE MODULARIZING 105 6.2 FUNCTIONS 106 6.2.1 Standard Way to Make Python Code Modular 106 6.2.2 Function Parameter Options 110 6.2.3 Generators 113 6.3 MODULES AND PACKAGES 114 6.3.1 Using Modules 115 6.3.2 Packages 116 6.3.3 Installing Third-Party Modules 117 6.3.4 Virtualenv: Isolated Python Environments 119 6.3.5 Conda: Anaconda Virtual Environment 121 6.3.6 Creating Modules 124 6.3.7 Testing Modules 125 6.4 ADDITIONAL RESOURCES 127 6.5 SELF-EVALUATION 128 Chapter 7 Error Handling 129 7.1 INTRODUCTION TO ERROR HANDLING 129 7.1.1 Try and Except 131 7.1.2 Exception Types 134 7.1.3 Triggering Exceptions 135 7.2 CREATING CUSTOMIZED EXCEPTIONS 136 7.3 ADDITIONAL RESOURCES 137 7.4 SELF-EVALUATION 138 Chapter 8 Introduction to Object Orienting Programming (OOP) 139 8.1 OBJECT PARADIGM AND PYTHON 139 8.2 EXPLORING THE JARGON 140 8.3 CREATING CLASSES 142 8.4 INHERITANCE 145 8.5 SPECIAL METHODS 149 8.5.1 Create a New Data Type Using a Built-in Data Type 154 8.6 MAKING OUR CODE PRIVATE 154 8.7 ADDITIONAL RESOURCES 155 8.8 SELF-EVALUATION 156 Chapter 9 Introduction to Biopython 157 9.1 WHAT IS BIOPYTHON? 158 9.1.1 Project Organization 158 9.2 INSTALLING BIOPYTHON 159 9.3 BIOPYTHON COMPONENTS 162 9.3.1 Alphabet 162 9.3.2 Seq 163 9.3.3 MutableSeq 165 9.3.4 SeqRecord 166 9.3.5 Align 167 9.3.6 AlignIO 169 9.3.7 ClustalW 171 9.3.8 SeqIO 173 9.3.9 AlignIO 176 9.3.10 BLAST 177 9.3.11 Biological Related Data 187 9.3.12 Entrez 190 9.3.13 PDB 194 9.3.14 PROSITE 196 9.3.15 Restriction 197 9.3.16 SeqUtils 200 9.3.17 Sequencing 202 9.3.18 SwissProt 205 9.4 CONCLUSION 207 9.5 ADDITIONAL RESOURCES 207 9.6 SELF-EVALUATION 209 Section II Advanced Topics Chapter 10 Web Applications 213 10.1 INTRODUCTION TO PYTHON ON THE WEB 213 10.2 CGI IN PYTHON 214 10.2.1 Configuring a Web Server for CGI 215 10.2.2 Testing the Server with Our Script 215 10.2.3 Web Program to Calculate the Net Charge of a Protein (CGI version) 219 10.3 WSGI 221 10.3.1 Bottle: A Python Web Framework for WSGI 222 10.3.2 Installing Bottle 223 10.3.3 Minimal Bottle Application 223 10.3.4 Bottle Components 224 10.3.5 Web Program to Calculate the Net Charge of a Protein (Bottle Version) 229 10.3.6 Installing a WSGI Program in Apache 232 10.4 ALTERNATIVE OPTIONS FOR MAKING PYTHON-BASED DYNAMIC WEB SITES 232 10.5 SOME WORDS ABOUT SCRIPT SECURITY 232 10.6 WHERE TO HOST PYTHON PROGRAMS 234 10.7 ADDITIONAL RESOURCES 235 10.8 SELF-EVALUATION 236 Chapter 11 XML 237 11.1 INTRODUCTION TO XML 237 11.2 STRUCTURE OF AN XML DOCUMENT 241 11.3 METHODS TO ACCESS DATA INSIDE AN XML DOCUMENT 246 11.3.1 SAX: cElementTree Iterparse 246 11.4 SUMMARY 251 11.5 ADDITIONAL RESOURCES 252 11.6 SELF-EVALUATION 252 Chapter 12 Python and Databases 255 12.1 INTRODUCTION TO DATABASES 256 12.1.1 Database Management: RDBMS 257 12.1.2 Components of a Relational Database 258 12.1.3 Database Data Types 260 12.2 CONNECTING TO A DATABASE 261 12.3 CREATING A MYSQL DATABASE 262 12.3.1 Creating Tables 263 12.3.2 Loading a Table 264 12.4 PLANNING AHEAD 266 12.4.1 PythonU: Sample Database 266 12.5 SELECT: QUERYING A DATABASE 269 12.5.1 Building a Query 271 12.5.2 Updating a Database 273 12.5.3 Deleting a Record from a Database 273 12.6 ACCESSING A DATABASE FROM PYTHON 274 12.6.1 PyMySQL Module 274 12.6.2 Establishing the Connection 274 12.6.3 Executing the Query from Python 275 12.7 SQLITE 276 12.8 NOSQL DATABASES: MONGODB 278 12.8.1 Using MongoDB with PyMongo 278 12.9 ADDITIONAL RESOURCES 282 12.10 SELF-EVALUATION 284 Chapter 13 Regular Expressions 285 13.1 INTRODUCTION TO REGULAR EXPRESSIONS (REGEX) 285 13.1.1 REGEX Syntax 286 13.2 THE RE MODULE 287 13.2.1 Compiling a Pattern 290 13.2.2 REGEX Examples 292 13.2.3 Pattern Replace 294 13.3 REGEX IN BIOINFORMATICS 294 13.3.1 Cleaning Up a Sequence 296 13.4 ADDITIONAL RESOURCES 297 13.5 SELF-EVALUATION 298 Chapter 14 Graphics in Python 299 14.1 INTRODUCTION TO BOKEH 299 14.2 INSTALLING BOKEH 299 14.3 USING BOKEH 301 14.3.1 A Simple X-Y Plot 303 14.3.2 Two Data Series Plot 304 14.3.3 A Scatter Plot 306 14.3.4 A Heatmap 308 14.3.5 A Chord Diagram 309 Section III Python Recipes with Commented Source Code Chapter 15 Sequence Manipulation in Batch 315 15.1 PROBLEM DESCRIPTION 315 15.2 PROBLEM ONE: CREATE A FASTA FILE WITH RANDOM SE- QUENCES 315 15.2.1 Commented Source Code 315 15.3 PROBLEM TWO: FILTER NOT EMPTY SEQUENCES FROM A FASTA FILE 316 15.3.1 Commented Source Code 317 15.4 PROBLEM THREE: MODIFY EVERY RECORD OF A FASTA FILE 319 15.4.1 Commented Source Code 320 Chapter 16 Web Application for Filtering Vector Contamination 321 16.1 PROBLEM DESCRIPTION 321 16.1.1 Commented Source Code 322 16.2 ADDITIONAL RESOURCES 326 Chapter 17 Searching for PCR Primers Using Primer3 329 17.1 PROBLEM DESCRIPTION 329 17.2 PRIMER DESIGN FLANKING A VARIABLE LENGTH REGION 330 17.2.1 Commented Source Code 331 17.3 PRIMER DESIGN FLANKING A VARIABLE LENGTH REGION, WITH BIOPYTHON 332 17.4 ADDITIONAL RESOURCES 333 Chapter 18 Calculating Melting Temperature from a Set of Primers 335 18.1 PROBLEM DESCRIPTION 335 18.1.1 Commented Source Code 336 18.2 ADDITIONAL RESOURCES 336 Chapter 19 Filtering Out Specific Fields from a GenBank File 339 19.1 EXTRACTING SELECTED PROTEIN SEQUENCES 339 19.1.1 Commented Source Code 339 19.2 EXTRACTING THE UPSTREAM REGION OF SELECTED PRO- TEINS 340 19.2.1 Commented Source Code 340 19.3 ADDITIONAL RESOURCES 341 Chapter 20 Inferring Splicing Sites 343 20.1 PROBLEM DESCRIPTION 343 20.1.1 Infer Splicing Sites with Commented Source Code 345 20.1.2 Sample Run of Estimate Intron Program 347 Chapter 21 Web Server for Multiple Alignment 349 21.1 PROBLEM DESCRIPTION 349 21.1.1 Web Interface: Front-End. HTML Code 349 21.1.2 Web Interface: Server-Side Script. Commented Source Code 351 21.2 ADDITIONAL RESOURCES 353 Chapter 22 Drawing Marker Positions Using Data Stored in a Database 355 22.1 PROBLEM DESCRIPTION 355 22.1.1 Preliminary Work on the Data 355 22.1.2 MongoDB Version with Commented Source Code 358 Section IV Appendices