没有合适的资源?快使用搜索试试~ 我知道了~
首页掌握生物信息学数据技能:实现可复现与稳健研究
"《生物信息学:数据技能、可重复性和稳健研究》是一本实用指南,专为科学家提供在处理大型测序数据集时所需的技能,以确保生物发现的可重复性和稳定性。传统生物信息学教材往往侧重于算法和理论,而这本书则以实践为主,全面介绍了基因组学分析中的技术、工具和最佳实践,强调数据驱动的方法。 作者将重点放在了现代数据处理技巧上,而不是过时的理论概念,使读者能够适应不断发展的领域。书中的内容分为三个部分: 1. 理念: 第一部分探讨了数据技能对于建立稳健和可重复生物信息学分析的基础,包括第一章“如何学习生物信息学”,引导读者理解学科本质并树立正确的学习态度。 2. 预备: 第二部分是基础知识,包括如何设置和管理生物信息学项目(如项目组织和版本控制)、提升Unix Shell能力、远程机器操作、Git在科研中的应用,以及对生物信息学数据的深入理解。 3. 实践: 实践篇涵盖了丰富的技能训练,如Unix数据工具的使用、R语言基础、不同类型数据(范围数据、序列数据和比对数据)的处理、编写生物信息学脚本、构建工作流和并行任务处理,以及应对大数据量的内存受限方法(如Tabix和SQLite)。 这本书不仅教会读者如何进行生物信息学分析,还强调选择和实施最适合工作的工具的重要性,帮助他们发展成为能够解决复杂问题的生物信息学家。通过本书,无论是研究生、博士后、教师还是爱好者,都能获得宝贵的实践经验,以便在当前这个生命科学高度依赖数据技能的时代开展研究工作。定价合理,对于那些希望通过开源工具推动可重复和稳健科研的读者来说,是一份极具价值的资源。"
资源详情
资源推荐
this new data. Bioinformatics Data Skills is written to provide you with training in
these core tools and help you develop these same skills.
The Approach of This Book
Many biologists starting out in bioinformatics tend to equate “learning bioinformat‐
ics” with “learning how to run bioinformatics software.” This is an unfortunate and
misinformed idea of what bioinformaticians actually do. This is analogous to think‐
ing “learning molecular biology” is just “learning pipetting.” Other than a few simple
examples used to generate data in Chapter 11, this book doesn’t cover running bioin‐
formatics software like aligners, assemblers, or variant callers. Running bioinformat‐
ics software isn’t all that difficult, doesn’t take much skill, and it doesn’t embody any
of the significant challenges of bioinformatics. I don’t teach how to run these types of
bioinformatics applications in Bioinformatics Data Skills for the following reasons:
• It’s easy enough to figure out on your own
• The material would go rapidly out of date as new versions of software or entirely
new programs are used in bioinformatics
• The original manuals for this software will always be the best, most up-to-date
resource on how to run a program
Instead, the approach of this book is to focus on the skills bioinformaticians use to
explore and extract meaning from complex, large bioinformatics datasets. Exploring
and extracting information from these datasets is the fun part of bioinformatics
research. The goal of Bioinformatics Data Skills is to teach you the computational
tools and data skills you need to explore these large datasets as you please. These data
skills give you freedom; you’ll be able to look at any bioinformatics data—in any for‐
mat, and files of any size—and begin exploring data to extract biological meaning.
Throughout Bioinformatics Data Skills, I emphasize working in a robust and reprodu‐
cible manner. I believe these two qualities—reproducibility and robustness—are too
often overlooked in modern computational work. By robust, I mean that your work is
resilient against silent errors, confounders, software bugs, and messy or noisy data. In
contrast, a fragile approach is one that does not decrease the odds of some type of
error adversely affecting your results. By reproducible, I mean that your work can be
repeated by other researchers and they can arrive at the same results. For this to be
the case, your work must be well documented, and your methods, code, and data all
need to be available so that other researchers have the materials to reproduce every‐
thing. Reproducibility also relies on your work being robust—if a workflow run on a
different machine yields a different outcome, it is neither robust nor fully reproduci‐
ble. I introduce these concepts in more depth in Chapter 2, and these are themes that
reappear throughout the book.
xiv | Preface
Why This Book Focuses on Sequencing Data
Bioinformatics is a broad discipline, and spans subfields like proteomics, metabolo‐
mics, structure bioinformatics, comparative genomics, machine learning, and image
processing. Bioinformatics Data Skills focuses primarily on handling sequencing data
for a few reasons.
First, sequencing data is abundant. Currently, no other “omics” data is as abundant as
high-throughput sequencing data. Sequencing data has broad applications across
biology: variant detection and genotyping, transcriptome sequencing for gene expres‐
sion studies, protein-DNA interaction assays like ChIP-seq, and bisulfite sequencing
for methylation studies just to name a few examples. The ways in which sequencing
data can be used to answer biological questions will only continue to increase.
Second, sequencing data is terrific for honing your data skills. Even if your goal is to
analyze other types of data in the future, sequencing data serves as great example data
to learn with. Developing the text-processing skills necessary to work with sequenc‐
ing data will be applicable to working with many other data types.
Third, other subfields of bioinformatics are much more domain specific. The wide
availability and declining costs of sequencing have allowed scientists from all disci‐
plines to use genomics data to answer questions in their systems. In contrast, bioin‐
formatics subdisciplines like proteomics or high-throughput image processing are
much more specialized and less widespread. Still, if you’re interested in these fields,
Bioinformatics Data Skills will teach you useful computational and data skills that will
be helpful in your research.
Audience
In my experience teaching bioinformatics to friends, colleagues, and students of an
intensive week-long course taught at UC Davis, most people wishing to learn bioin‐
formatics are either biologists, or computer scientists/programmers. Biologists wish
to develop the computational skills necessary to analyze their own data, while the
programmers and computer scientists wish to apply their computational skills to bio‐
logical problems. Although these two groups differ considerably in biological knowl‐
edge and computational experience, Bioinformatics Data Skills covers material that
should be helpful to both.
If you’re a biologist, Bioinformatics Data Skills will teach you the core data skills you
need to work with bioinformatics data. It’s important to note that Bioinformatics Data
Skills is not a how-to bioinformatics book; such a book on bioinformatics would
quickly go out of date or be too narrow in focus to help the majority of biologists. You
will need to supplement this book with knowledge of your specific research and sys‐
tem, as well as the modern statistical and bioinformatics methods that your subfield
Preface | xv
uses. For example, if your project involves aligning sequencing reads to a reference
genome, this book won’t tell you the newest and best alignment software for your
particular system. But regardless of which aligner you use, you will need to have a
thorough understanding of alignment formats and how to manipulate alignment data
—a topic covered in Chapter 11. Throughout this book, these general computational
and data skills are meant to be a solid, widely applicable foundation on which the
majority of biologists can build.
If you’re a computer scientist or programmer, you are likely already familiar with
some of the computational tools I teach in this book. While the material presented in
Bioinformatics Data Skills may overlap knowledge you already have, you will still
learn about the specific formats, tools, and approaches bioinformaticians use in their
work. Also, working through the examples in this book will give you good practice in
applying your computational skills to genomics data.
The
Diculty Level of
Bioinformatics Data Skills
Bioinformatics Data Skills is designed to be a thorough—and in parts, dense—book.
When I started writing this book, I decided the greatest misdeed I could do would be
to treat bioinformatics as a subject that’s easier than it truly is. Working as a professio‐
nal bioinformatician, I routinely saw how very subtle issues could crop up and
adversely change the outcome of the analysis had they not been caught. I don’t want
your bioinformatics work to be incorrect because I’ve made a topic artificially simple.
The depth at which I cover topics in Bioinformatics Data Skills is meant to prepare
you to catch similar issues in your own work so your results are robust.
The result is that sections of this book are quite advanced and will be difficult for
some readers. Don’t feel discouraged! Like most of science, this material is hard, and
may take a few reads before it fully sinks in. Throughout the book, I try to indicate
when certain sections are especially advanced so that you can skip over these and
return to them later.
Lastly, I often use technical jargon throughout the book. I don’t like using jargon, but
it’s necessary to communicate technical concepts in computing. Primarily it will help
you search for additional resources and help. It’s much easier to Google successfully
for “left outer join” than “data merge where null records are included in one table.”
Assumptions This Book Makes
Bioinformatics Data Skills is meant to be an intermediate book on bioinformatics. To
make sure everyone starts out on the same foot, the book begins with a few simple
chapters. In Chapter 2, I cover the basics of setting up a bioinformatics project, and in
Chapter 3 I teach some remedial Unix topics meant to ensure that you have a solid
xvi | Preface
grasp of Unix (because Unix is a large component in later chapters). Still, as an inter‐
mediate book, I make a few assumptions about you:
You know a scripting language
This is the biggest assumption of the book. Except for a few Python programs
and the R material (R is introduced in Chapter 8), this book doesn’t directly rely
on using lots of scripting. However, in learning a scripting language, you’ve
already encountered many important computing concepts such as working with
a text editor, running and executing programs on the command line, and basic
programming. If you do not know a scripting language, I would recommend
learning Python while reading this book. Books like Bioinformatics Programming
Using Python by Mitchell L. Model (O’Reilly, 2009), Learning Python, 5th Edition,
by Mark Lutz (O’Reilly, 2013), and Python in a Nutshell, 2nd, by Alex Martelli
(O’Reilly, 2006) are great to get started. If you know a scripting language other
than Python (e.g., Perl or Ruby), you’ll be prepared to follow along with most
examples (though you will need to translate some examples to your scripting lan‐
guage of choice).
You know how to use a text editor
It’s essential that you know your way around a text editor (e.g., Emacs, Vim, Text‐
Mate2, or Sublime Text). Using a word processor (e.g., Microsoft Word) will not
work, and I would discourage using text editors such as Notepad or OS X’s Tex‐
tEdit, as they lack syntax highlighting support for common programming lan‐
guages.
You have basic Unix command-line skills
For example, I assume you know the difference between a terminal and a shell,
understand how to enter commands, what command-line options/flags and
arguments are, and how to use the up arrow to retrieve your last entered com‐
mand. You should also have a basic understanding of the Unix file hierarchy
(including concepts like your home directory, relative versus absolute directories,
and root directories). You should also be able to move about and manipulate the
directories and files in Unix with commands like cd, ls, pwd, mv, rm, rmdir, and
mkdir. Finally, you should have a basic grasp of Unix file ownership and permis‐
sions, and changing these with chown and chmod. If these concepts are unclear, I
would recommend you play around in the Unix command line first (carefully!)
and consult a good beginner-level book such as Practical Computing for Biologists
by Steven Haddock and Casey Dunn (Sinauer, 2010) or UNIX and Perl to the Res‐
cue by Keith Bradnam and Ian Korf (Cambridge University Press, 2012).
You have a basic understanding of biology
Bioinformatics Data Skills is a BYOB book—bring your own biology. The examples
don’t require a lot of background in biology beyond what DNA, RNA, proteins,
and genes are, and the central dogma of molecular biology. You should also be
Preface | xvii
familiar with some very basic genetics and genomic concepts (e.g., single nucleo‐
tide polymorphisms, genotypes, GC content, etc.). All biological examples in the
book are designed to be quite simple; if you’re unfamiliar with any topic, you
should be able to quickly skim a Wikipedia article and proceed with the example.
You have a basic understanding of regular expressions
Occasionally, I’ll make use of regular expressions in this book. In most cases, I try
to quickly step through the basics of how a regular expression works so that you
can get the general idea. If you’ve encountered regular expressions while learning
a scripting language, you’re ready to go. If not, I recommend you learn the basics
—not because regular expressions are used heavily throughout the book, but
because mastering regular expressions is an important skill in bioinformatics.
Introducing Regular Expressions by Michael Fitzgerald (O’Reilly) is a great intro‐
duction. Nowadays, writing, testing, and debugging regular expressions is easier
than ever thanks to online tools like http://regex101.com and http://www.debug‐
gex.com. I recommend using these tools in your own work and when stepping
through my regular expression examples.
You know how to get help and read documentation
Throughout this book, I try to minimize teaching information that can be found
in manual pages, help documentation, or online. This is for three reasons:
•
I want to save space and focus on presenting material in a way you can’t find
elsewhere
• Manual pages and documentation will always be the best resource for this
information
• The ability to quickly find answers in documentation is one of the most
important skills you can develop when learning computing
This last point is especially important; you don’t need to remember all arguments of a
command or R function—you just need to know where to find this information. Pro‐
grammers consult documentation constantly in their work, which is why documenta‐
tion tools like man (in Unix) and help() (in R) exist.
You can manage your computer system (or have a system administrator)
This book does not teach you system administration skills like setting up a bioin‐
formatics server or cluster, managing user accounts, network security, managing
disks and disk space, RAID configurations, data backup, and high-performance
computing concepts. There simply isn’t the space to adequately cover these
important topics. However, these are all very, very important—if you don’t have a
system administrator and need to fill that role for your lab or research group, it’s
essential for you to master these skills, too. Frankly, system administration skills
take years to master and good sysadmins have incredible patience and experience
xviii | Preface
剩余537页未读,继续阅读
ramissue
- 粉丝: 354
- 资源: 1487
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 李兴华Java基础教程:从入门到精通
- U盘与硬盘启动安装教程:从菜鸟到专家
- C++面试宝典:动态内存管理与继承解析
- C++ STL源码深度解析:专家级剖析与关键技术
- C/C++调用DOS命令实战指南
- 神经网络补偿的多传感器航迹融合技术
- GIS中的大地坐标系与椭球体解析
- 海思Hi3515 H.264编解码处理器用户手册
- Oracle基础练习题与解答
- 谷歌地球3D建筑筛选新流程详解
- CFO与CIO携手:数据管理与企业增值的战略
- Eclipse IDE基础教程:从入门到精通
- Shell脚本专家宝典:全面学习与资源指南
- Tomcat安装指南:附带JDK配置步骤
- NA3003A电子水准仪数据格式解析与转换研究
- 自动化专业英语词汇精华:必备术语集锦
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功