Python编程解决生物信息学问题

需积分: 34 8 下载量 90 浏览量 更新于2024-07-20 1 收藏 4.3MB PDF 举报
"《生物信息学编程使用Python》是Mitchell L Model撰写的一本关于如何利用Python进行生物信息学分析的书籍。本书旨在教授读者如何运用Python语言解决生物信息学中的问题,适合对生物信息学和编程感兴趣的读者学习。" 在生物信息学领域,Python已经成为一种广泛使用的工具,因其语法简洁、易于学习且拥有丰富的库支持。本书《生物信息学编程使用Python》深入浅出地介绍了如何利用Python进行生物信息学编程,涵盖了从基础到高级的多个方面。 首先,书中可能涉及的基础知识包括Python编程语言的基本概念,如变量、数据类型、控制结构(如循环和条件语句)、函数定义以及模块导入。作者可能会引导读者通过实例了解如何在Python环境中编写和运行代码,以便进行初步的数据处理和分析。 接下来,书中可能会详细介绍生物信息学中常见的数据格式,如FASTA、GenBank和BED等,以及如何用Python读取和解析这些格式的数据。此外,还会讲解如何处理序列比对、基因预测、进化树构建等核心生物信息学任务。 在算法和数据分析部分,读者可以学习到如何利用Python实现各种生物信息学算法,例如BLAST搜索、动态规划法进行序列比对、Smith-Waterman算法等。同时,Python的科学计算库如NumPy和SciPy在处理大规模数据时的作用也会得到阐述,以及如何使用Pandas进行数据清洗和分析。 在生物信息学项目实践中,Python的网络爬虫技术可能用于抓取公开数据库的信息,如NCBI的Entrez接口。而Matplotlib和Seaborn库则用于数据可视化,帮助理解复杂的生物数据。此外,机器学习和深度学习框架,如Scikit-learn和TensorFlow,可能被介绍用于预测模型的构建,以解决诸如蛋白质结构预测或疾病关联研究等问题。 最后,本书可能还会探讨如何将Python代码打包成可重复使用的脚本或软件,以及如何利用版本控制系统如Git进行代码管理,以便于团队协作和项目维护。 《生物信息学编程使用Python》是一本全面的指南,它不仅教授Python编程,还结合了生物学知识,让读者能够有效地应用于生物信息学研究,提高研究效率。通过阅读本书,读者将能够掌握利用Python进行生物信息学分析的关键技能。
2017-09-19 上传
最新的讲授将Python用于生物信息编程的书籍,希望大家喜欢。目录如下: Conventions 4 1.2.2 Python Versions 5 1.2.3 Code Style 5 1.2.4 Get the Most from This Book without Reading It All 6 1.2.5 Online Resources Related to This Book 7 1.3 WHY LEARN TO PROGRAM? 7 1.4 BASIC PROGRAMMING CONCEPTS 8 1.4.1 What Is a Program? 8 1.5 WHY PYTHON? 10 1.5.1 Main Features of Python 10 1.5.2 Comparing Python with Other Languages 11 1.5.3 How Is It Used? 14 1.5.4 Who Uses Python? 15 1.5.5 Flavors of Python 15 1.5.6 Special Python Distributions 16 1.6 ADDITIONAL RESOURCES 17 Chapter 2 First Steps with Python 19 2.1 INSTALLING PYTHON 20 2.1.1 Learn Python by Using It 20 2.1.2 Install Python Locally 20 2.1.3 Using Python Online 21 2.1.4 Testing Python 22 2.1.5 First Use 22 2.2 INTERACTIVE MODE 23 2.2.1 Baby Steps 23 2.2.2 Basic Input and Output 23 2.2.3 More on the Interactive Mode 24 2.2.4 Mathematical Operations 26 2.2.5 Exit from the Python Shell 27 2.3 BATCH MODE 27 2.3.1 Comments 29 2.3.2 Indentation 30 2.4 CHOOSING AN EDITOR 32 2.4.1 Sublime Text 32 2.4.2 Atom 33 2.4.3 PyCharm 34 2.4.4 Spyder IDE 35 2.4.5 Final Words about Editors 36 2.5 OTHER TOOLS 36 2.6 ADDITIONAL RESOURCES 37 2.7 SELF-EVALUATION 37 Chapter 3 Basic Programming: Data Types 39 3.1 STRINGS 40 3.1.1 Strings Are Sequences of Unicode Characters 41 3.1.2 String Manipulation 42 3.1.3 Methods Associated with Strings 42 3.2 LISTS 44 3.2.1 Accessing List Elements 45 3.2.2 List with Multiple Repeated Items 45 3.2.3 List Comprehension 46 3.2.4 Modifying Lists 47 3.2.5 Copying a List 49 3.3 TUPLES 49 3.3.1 Tuples Are Immutable Lists 49 3.4 COMMON PROPERTIES OF THE SEQUENCES 51 3.5 DICTIONARIES 54 3.5.1 Mapping: Calling Each Value by a Name 54 3.5.2 Operating with Dictionaries 56 3.6 SETS 59 3.6.1 Unordered Collection of Objects 59 3.6.2 Set Operations 60 3.6.3 Shared Operations with Other Data Types 62 3.6.4 Immutable Set: Frozenset 63 3.7 NAMING OBJECTS 63 3.8 ASSIGNING A VALUE TO A VARIABLE VERSUS BINDING A NAME TO AN OBJECT 64 3.9 ADDITIONAL RESOURCES 67 3.10 SELF-EVALUATION 68 Chapter 4 Programming: Flow Control 69 4.1 IF-ELSE 69 4.1.1 Pass Statement 74 4.2 FOR LOOP 75 4.3 WHILE LOOP 77 4.4 BREAK: BREAKING THE LOOP 78 4.5 WRAPPING IT UP 80 4.5.1 Estimate the Net Charge of a Protein 80 4.5.2 Search for a Low-Degeneration Zone 81 4.6 ADDITIONAL RESOURCES 83 4.7 SELF-EVALUATION 83 Chapter 5 Handling Files 85 5.1 READING FILES 86 5.1.1 Example of File Handling 87 5.2 WRITING FILES 89 5.2.1 File Reading and Writing Examples 90 5.3 CSV FILES 90 5.4 PICKLE: STORING AND RETRIEVING THE CONTENTS OF VARI- ABLES 94 5.5 JSON FILES 96 5.6 FILE HANDLING: OS, OS.PATH, SHUTIL, AND PATH.PY MODULE 98 5.6.1 path.py Module 100 5.6.2 Consolidate Multiple DNA Sequences into One FASTA File 102 5.7 ADDITIONAL RESOURCES 102 5.8 SELF-EVALUATION 103 Chapter 6 Code Modularizing 105 6.1 INTRODUCTION TO CODE MODULARIZING 105 6.2 FUNCTIONS 106 6.2.1 Standard Way to Make Python Code Modular 106 6.2.2 Function Parameter Options 110 6.2.3 Generators 113 6.3 MODULES AND PACKAGES 114 6.3.1 Using Modules 115 6.3.2 Packages 116 6.3.3 Installing Third-Party Modules 117 6.3.4 Virtualenv: Isolated Python Environments 119 6.3.5 Conda: Anaconda Virtual Environment 121 6.3.6 Creating Modules 124 6.3.7 Testing Modules 125 6.4 ADDITIONAL RESOURCES 127 6.5 SELF-EVALUATION 128 Chapter 7 Error Handling 129 7.1 INTRODUCTION TO ERROR HANDLING 129 7.1.1 Try and Except 131 7.1.2 Exception Types 134 7.1.3 Triggering Exceptions 135 7.2 CREATING CUSTOMIZED EXCEPTIONS 136 7.3 ADDITIONAL RESOURCES 137 7.4 SELF-EVALUATION 138 Chapter 8 Introduction to Object Orienting Programming (OOP) 139 8.1 OBJECT PARADIGM AND PYTHON 139 8.2 EXPLORING THE JARGON 140 8.3 CREATING CLASSES 142 8.4 INHERITANCE 145 8.5 SPECIAL METHODS 149 8.5.1 Create a New Data Type Using a Built-in Data Type 154 8.6 MAKING OUR CODE PRIVATE 154 8.7 ADDITIONAL RESOURCES 155 8.8 SELF-EVALUATION 156 Chapter 9 Introduction to Biopython 157 9.1 WHAT IS BIOPYTHON? 158 9.1.1 Project Organization 158 9.2 INSTALLING BIOPYTHON 159 9.3 BIOPYTHON COMPONENTS 162 9.3.1 Alphabet 162 9.3.2 Seq 163 9.3.3 MutableSeq 165 9.3.4 SeqRecord 166 9.3.5 Align 167 9.3.6 AlignIO 169 9.3.7 ClustalW 171 9.3.8 SeqIO 173 9.3.9 AlignIO 176 9.3.10 BLAST 177 9.3.11 Biological Related Data 187 9.3.12 Entrez 190 9.3.13 PDB 194 9.3.14 PROSITE 196 9.3.15 Restriction 197 9.3.16 SeqUtils 200 9.3.17 Sequencing 202 9.3.18 SwissProt 205 9.4 CONCLUSION 207 9.5 ADDITIONAL RESOURCES 207 9.6 SELF-EVALUATION 209 Section II Advanced Topics Chapter 10 Web Applications 213 10.1 INTRODUCTION TO PYTHON ON THE WEB 213 10.2 CGI IN PYTHON 214 10.2.1 Configuring a Web Server for CGI 215 10.2.2 Testing the Server with Our Script 215 10.2.3 Web Program to Calculate the Net Charge of a Protein (CGI version) 219 10.3 WSGI 221 10.3.1 Bottle: A Python Web Framework for WSGI 222 10.3.2 Installing Bottle 223 10.3.3 Minimal Bottle Application 223 10.3.4 Bottle Components 224 10.3.5 Web Program to Calculate the Net Charge of a Protein (Bottle Version) 229 10.3.6 Installing a WSGI Program in Apache 232 10.4 ALTERNATIVE OPTIONS FOR MAKING PYTHON-BASED DYNAMIC WEB SITES 232 10.5 SOME WORDS ABOUT SCRIPT SECURITY 232 10.6 WHERE TO HOST PYTHON PROGRAMS 234 10.7 ADDITIONAL RESOURCES 235 10.8 SELF-EVALUATION 236 Chapter 11 XML 237 11.1 INTRODUCTION TO XML 237 11.2 STRUCTURE OF AN XML DOCUMENT 241 11.3 METHODS TO ACCESS DATA INSIDE AN XML DOCUMENT 246 11.3.1 SAX: cElementTree Iterparse 246 11.4 SUMMARY 251 11.5 ADDITIONAL RESOURCES 252 11.6 SELF-EVALUATION 252 Chapter 12 Python and Databases 255 12.1 INTRODUCTION TO DATABASES 256 12.1.1 Database Management: RDBMS 257 12.1.2 Components of a Relational Database 258 12.1.3 Database Data Types 260 12.2 CONNECTING TO A DATABASE 261 12.3 CREATING A MYSQL DATABASE 262 12.3.1 Creating Tables 263 12.3.2 Loading a Table 264 12.4 PLANNING AHEAD 266 12.4.1 PythonU: Sample Database 266 12.5 SELECT: QUERYING A DATABASE 269 12.5.1 Building a Query 271 12.5.2 Updating a Database 273 12.5.3 Deleting a Record from a Database 273 12.6 ACCESSING A DATABASE FROM PYTHON 274 12.6.1 PyMySQL Module 274 12.6.2 Establishing the Connection 274 12.6.3 Executing the Query from Python 275 12.7 SQLITE 276 12.8 NOSQL DATABASES: MONGODB 278 12.8.1 Using MongoDB with PyMongo 278 12.9 ADDITIONAL RESOURCES 282 12.10 SELF-EVALUATION 284 Chapter 13 Regular Expressions 285 13.1 INTRODUCTION TO REGULAR EXPRESSIONS (REGEX) 285 13.1.1 REGEX Syntax 286 13.2 THE RE MODULE 287 13.2.1 Compiling a Pattern 290 13.2.2 REGEX Examples 292 13.2.3 Pattern Replace 294 13.3 REGEX IN BIOINFORMATICS 294 13.3.1 Cleaning Up a Sequence 296 13.4 ADDITIONAL RESOURCES 297 13.5 SELF-EVALUATION 298 Chapter 14 Graphics in Python 299 14.1 INTRODUCTION TO BOKEH 299 14.2 INSTALLING BOKEH 299 14.3 USING BOKEH 301 14.3.1 A Simple X-Y Plot 303 14.3.2 Two Data Series Plot 304 14.3.3 A Scatter Plot 306 14.3.4 A Heatmap 308 14.3.5 A Chord Diagram 309 Section III Python Recipes with Commented Source Code Chapter 15 Sequence Manipulation in Batch 315 15.1 PROBLEM DESCRIPTION 315 15.2 PROBLEM ONE: CREATE A FASTA FILE WITH RANDOM SE- QUENCES 315 15.2.1 Commented Source Code 315 15.3 PROBLEM TWO: FILTER NOT EMPTY SEQUENCES FROM A FASTA FILE 316 15.3.1 Commented Source Code 317 15.4 PROBLEM THREE: MODIFY EVERY RECORD OF A FASTA FILE 319 15.4.1 Commented Source Code 320 Chapter 16 Web Application for Filtering Vector Contamination 321 16.1 PROBLEM DESCRIPTION 321 16.1.1 Commented Source Code 322 16.2 ADDITIONAL RESOURCES 326 Chapter 17 Searching for PCR Primers Using Primer3 329 17.1 PROBLEM DESCRIPTION 329 17.2 PRIMER DESIGN FLANKING A VARIABLE LENGTH REGION 330 17.2.1 Commented Source Code 331 17.3 PRIMER DESIGN FLANKING A VARIABLE LENGTH REGION, WITH BIOPYTHON 332 17.4 ADDITIONAL RESOURCES 333 Chapter 18 Calculating Melting Temperature from a Set of Primers 335 18.1 PROBLEM DESCRIPTION 335 18.1.1 Commented Source Code 336 18.2 ADDITIONAL RESOURCES 336 Chapter 19 Filtering Out Specific Fields from a GenBank File 339 19.1 EXTRACTING SELECTED PROTEIN SEQUENCES 339 19.1.1 Commented Source Code 339 19.2 EXTRACTING THE UPSTREAM REGION OF SELECTED PRO- TEINS 340 19.2.1 Commented Source Code 340 19.3 ADDITIONAL RESOURCES 341 Chapter 20 Inferring Splicing Sites 343 20.1 PROBLEM DESCRIPTION 343 20.1.1 Infer Splicing Sites with Commented Source Code 345 20.1.2 Sample Run of Estimate Intron Program 347 Chapter 21 Web Server for Multiple Alignment 349 21.1 PROBLEM DESCRIPTION 349 21.1.1 Web Interface: Front-End. HTML Code 349 21.1.2 Web Interface: Server-Side Script. Commented Source Code 351 21.2 ADDITIONAL RESOURCES 353 Chapter 22 Drawing Marker Positions Using Data Stored in a Database 355 22.1 PROBLEM DESCRIPTION 355 22.1.1 Preliminary Work on the Data 355 22.1.2 MongoDB Version with Commented Source Code 358 Section IV Appendices