医疗信息方法：使用Perl、Python和Ruby的基础医疗编程

需积分: 9 25 浏览量更新于2024-07-29 收藏 5.86MB PDF 举报

"Methods in Medical Informatics: Fundamentals of Healthcare Programming in Perl, Python, and Ruby" 本书是医疗信息学领域的入门指南，重点介绍了如何利用严谨的方法和广泛应用、成本低廉但功能强大的工具处理临床和公共卫生数据。它填补了从编程教学到处理专业医疗数据之间的空白，使在生物医学环境中教授相关编程课程成为可能。书中特别提到了三种主要的动态编程语言——Perl、Python和Ruby，以扩大其受众范围。作者面向医疗保健领域有一定开源编程语言基础的学生和专业人士，提供了将基本信息学算法应用于医疗数据集的指导。每种语言都配有算法脚本，并对用于检索、组织、合并和分析国家癌症研究所的“监测、流行病学与最终结果”项目、美国国立医学图书馆的PubMed服务、美国疾病控制和预防中心的死亡记录、美国人口普查以及关于遗传性疾病的在线孟德尔人类遗传数据库的数据源的算法进行了逐步解释。本书强调，医疗信息学并不像许多人认为的那样复杂，只需要掌握几十个简单的算法和基本的编程知识，就能充分利用临床和研究数据集中包含的医学信息。书中介绍了数据源的结构，提供了下载数据的指南，并对每个算法进行了清晰的解释。同时，提供了等效的Perl、Python和Ruby脚本，让读者可以学习如何用少量命令编写简短而有效的脚本，从而掌握数据检索、组织、合并和分析的基本信息学方法。通过案例研究，展示了生物医学科学家如何利用公开数据和开源编程语言来提出并回答问题。这本书适合已经掌握Perl、Python或Ruby的读者，即使只有基础的编程知识，也能迅速上手编写强大的程序。书中的内容涵盖了完成生物医学职业生涯中许多项目的必要方法和实现。本书作为数学和计算生物学系列的一部分，旨在促进数学、统计和计算方法在生物学中的整合，同时也面向数学、统计、计算科学以及基础生物学和生物工程领域的学生、研究人员和专业人士，以及跨学科的研究者。书中鼓励包含具体实例、应用和编程技巧。

xvi PrefaCe

Without becoming a full-time programmer, you can write powerful programs, in just

a few minutes and a few lines of code, with any of these languages.

We will use a minimal selection of commands to write short scripts that can be

learned quickly by biomedical students and professionals. is book demonstrates

that, with a few programming methods, biomedical professionals can master any kind

of data collection.

ough there are numerous books that introduce programming techniques to bio-

medical professionals (including several that I have written) no other book has these

important features:

1. All of the data, nomenclatures, programming scripts, and programming lan-

guages used in this book are free and publicly available. Most of the data comes

from U.S. government sources, providing gigabytes of high quality, curated

biomedical data to a global community of scientists, healthcare experts, clini-

cians, nurses, and students. Every student should become familiar with these

data sources, and understand their medical value. is book provides instruc-

tions for downloading all of the data sources discussed in the book.

2. Data come in many diﬀerent forms. We describe the structure of every data

source used. In the case of image formats, we provide instructions for convert-

ing between the diﬀerent ﬁle types.

3. Most medical informatics books are written for one speciﬁc language, or are

written as “concept books” that describe algorithms without actually provid-

ing programming instruction. We provide equivalent scripts in Perl, Python,

and Ruby, so that anyone with some programming skill will beneﬁt. Each trio

of scripts is preceded by a step-by-step explanation of the algorithm, in plain

English. You may wish to conﬁne your attention to scripts written in your pre-

ferred language. Over the years, you may ﬁnd it valuable to reread this book,

paying attention to the languages you ignored on the ﬁrst pass.

4. It is nearly impossible to begin a new data analysis project without ﬁrst observ-

ing some case examples. With step-by-step instructions, you will learn the

basic informatics methods for retrieving, organizing, merging, and analyzing

the following data sources.

Here are the public resources used in this book:

Data Sets and Services

SEER—e National Cancer Institute’s Surveillance Epidemiology and End

Results project, containing deidentiﬁed records for nearly 4 million cancer cases.

PubMed—e National Library of Medicine’s Web-based bibliographic retrieval

service. e title, author(s), journal publication information, and, in most

cases, article summaries, are provided for over 19 million medical citations.

PrefaCe xvii

CDC mortality data sets—e Centers for Disease Control and Prevention’s

collection of mortality records containing computer-parsable data on virtually

every death occurring in the U.S.

U.S. Census—Every 10 years, the U.S. Bureau of Census counts the number of

people living in the U.S., and collects basic demographic information in the

process. Much of the information collected by the census is freely available to

the public.

OMIM

—e Online Mendelian Inheritance in Man

is a large data set con-

taining detailed information on over 20,000 inherited conditions of humans,

made publicly available by the National Library of Medicine’s National Center

for Biotechnology Information.

Nomenclatures and Ontologies

MeSH—Medical Subject Headings, a comprehensive, hierarchical listing of

medical topics, developed by the National Library of Medicine.

ICD and ICD-O—e World Health Organization’s disease nomenclatures, the

International Classiﬁcation of Diseases and the International Classiﬁcation of

Diseases in Oncology.

Taxonomy—A computer-parsable classiﬁcation of organisms, used by biotech-

nology centers.

Developmental Lineage Classiﬁcation and Taxonomy of Neoplasms—e larg-

est nomenclature of tumors in existence, with synonymous terms grouped

under concepts and organized as a hierarchical biological classiﬁcation.

Internet Protocols, Markup Languages, and Interfaces

HTML—HyperText Markup Language, the markup language used in Web

pages.

HTTP—Hypertext Transfer Protocol, the Internet protocol supporting the

Internet’s World Wide Web.

XML—eXtensible Markup Language, a syntax for describing the data and

including both data and data descriptors in a format that can be read by

humans and computers.

RDF—Resource Description Framework, a method of organizing information

in statements that bind data, and descriptors for the data, to an identiﬁed

object. RDF is expressed in the XML markup language.

CGI—Common Gateway Interface, an Internet protocol, used by Perl, Python,

Ruby, and other languages, that receives input values submitted through

Web pages.

xviii PrefaCe

e included scripts will call upon a few programming skills, in either Perl, Python,

or Ruby. You should know the basic syntax of a language, the minimum structural

requirements for a script, how command lines are written, how iterating loops are

structured, how ﬁles are opened, read, and written, how values can be assigned to and

retrieved from data structures, how simple regular expressions are interpreted, and

how scripts are launched. e scripts are written in a style that sacriﬁces elegance for

readability. If your knowledge of Perl, Python, or Ruby is shaky, there are numerous

beginner-level books, and many Web-based tutorials for each of these languages.

e book is divided into four parts: Part I—Fundamental Algorithms and Methods

of Medical Informatics; Part II—Medical Data Resources; Part III—Primary

Tasks of Medical Informatics; and Part IV—Medical Discovery.

Part I—Fundamental Algorithms and Methods of Medical Informatics

(Chapters 1 to 4) provides simple methods for viewing text and image ﬁles, and for

parsing through large data sets line by line, retrieving, counting, and indexing selected

items. e primary purpose of these chapters is to introduce the basic computational

subroutines that are used in more complex scripts later in the book. e secondary

purpose of these chapters is to demonstrate that Perl, Python, and Ruby are quite

similar to one another, and provide equivalent functionality.

Part II—Medical Data Resources (Chapters 5 to 13) demonstrates uses of some

freely available biomedical data sets. ese data sets have cost hundreds of millions

of dollars to assemble, yet many healthcare workers are unaware of their enormous

clinical value. In these chapters, you will learn the intended uses of data sets, how the

data sets are organized, and how you can select, retrieve, and analyze information from

the ﬁles.

Part III—Primary Tasks of Medical Informatics (Chapters 14 to 18) covers some

of the computational methods of biomedical informatics, including autocoding, data

scrubbing, and data deidentiﬁcation.

A good question is hard to ﬁnd. Part IV—Medical Discovery (Chapters 19 through

27) provides examples of the kinds of questions that biomedical scientists can ask and

answer with public data and open source programming languages. In these chapters,

we combine methods developed in the earlier chapters, using freely available data

sources to answer speciﬁc questions or to develop new medical hypotheses. Many of

the informatics projects that you will use in your biomedical career can be completed

with the basic methods and implementations described in these chapters.

is book is intended to be used as a textbook in medical informatics courses.

Because the methods in the book are generalized, the book will also serve as a con-

venient reference source of script snippets that can be freely used by students and pro-

fessionals. e scripts are written in a syntax appropriate for the most current popular

version of Perl, Python, or Ruby, and based on the availability of about a dozen large,

public data sets, each with a consistent data structure. Over time, programming lan-

guages change; the availability, Internet location, and organization of the large public

剩余400页未读，继续阅读

ououdan

粉丝: 0
资源: 7

医疗信息方法：使用Perl、Python和Ruby的基础医疗编程

Biomedical Informatics -- Computer Applications in healthcare and biomedicine

Medical Image Analysis and Informatics_ Computer-Aided Diagnosis and Therapy

基于区块链的电子病历存储国外发展动态分析 文献

说出一些健康工程常用的专业英语词汇及意思

可以给我几篇与上述内容有关的开源SCI文献吗

量子图像处理技术及其应用的参考文献10篇

个人健身信息管理系统的分析与设计参考文献

PBL项目人们个性化健康推荐系统的参考来源

关于诺伯特维纳的参考文献

最新资源

基于区块链的电子病历存储国外发展动态分析文献