SAS与R结合的数据管理与可视化实战

R语言

需积分: 10 69 浏览量更新于2023-06-03 收藏 2.34MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"SAS与R的数据管理与可视化"是一本结合了SAS和R语言的图书，专注于数据管理和统计分析的可视化。该书由Ken Kleinman和Nicholas J. Horton合著，由Taylor & Francis Group的Chapman & Hall/CRC出版。书中详细探讨了如何在SAS和R环境中进行高效的数据管理工作，以及如何利用这两个工具进行深入的统计分析和创建高质量的图形。在数据管理部分，作者可能会讲解如何在SAS中使用PROC步骤来导入、清洗、转换和整理数据，以及R中的数据框操作、数据过滤和合并功能。SAS以其强大的数据处理能力闻名，如使用DATA步骤进行数据转换，而R则提供了灵活且丰富的数据结构和dplyr等包来支持数据操作。这部分内容可能包括数据类型转换、缺失值处理、数据排序和分组等关键概念。统计分析章节可能涵盖了基本的统计推断（如描述性统计、假设检验）到高级的建模技术（如线性回归、逻辑回归、主成分分析）。SAS拥有完整的统计分析套件，如PROC REG、PROC LOGISTIC等，而R则有无数的包可供选择，如lm、glm、ggplot2等，提供了更定制化的分析可能性。这部分内容可能还会讨论如何在SAS和R中评估模型性能和进行假设验证。在可视化部分，读者将了解到SAS的Graph和ODS Graphics系统以及R的ggplot2、lattice和ggvis等库。这些工具可以帮助用户创建专业级别的图表，如散点图、箱线图、直方图和复杂地图。作者可能会讨论如何选择合适的图形类型，以及如何通过调整颜色、标签和图例来增强数据的故事讲述能力。此外，书籍可能还包含一些实践案例，帮助读者将理论知识应用到实际问题中，以解决各种数据科学挑战。对于SAS用户，他们可能将学习如何编写提交到SAS服务器的宏程序，而对于R用户，他们将掌握R脚本编写和包管理的基本原则。这本书是SAS和R用户的一份宝贵资源，无论你是数据分析新手还是经验丰富的专业人士，都能从中受益。它旨在提升读者的数据处理和可视化技能，以更好地理解和传达数据中的洞察。通过比较SAS和R的不同方法，读者可以依据项目需求和自身偏好选择最合适的工具。

资源详情

资源推荐

“book” — 2009/6/16 — 16:53 — page xvii — #17

Preface

SAS™ (SAS Institute, 2009) and R (R development core team, 2009) are two statistical

software packages used in many ﬁelds of research. SAS is commercial software developed

by SAS Institute; it includes well-validated statistical algorithms. It can be licensed but

not purchased. Paying for a license entitles the licensee to professional customer support.

However, licensing is expensive and SAS sometimes incorporates new statistical methods

only after a signiﬁcant lag. In contrast, R is free, open-source software, developed by a large

group of people, many of whom are volunteers. It has a large and growing user and developer

base. Methodologists often release applications for general use in R shortly after they have

been introduced into the literature. Professional customer support is not provided, though

there are many resources for users. There are settings in which one of these useful tools is

needed, and users who have spent many hours gaining expertise in the other often ﬁnd it

frustrating to make the transition.

We have written this book as a reference text for users of SAS and R. Our primary

goal is to provide users with an easy way to learn how to perform an analytic task in both

systems, without having to navigate through the extensive, idiosyncratic, and sometimes

(often?) unwieldy documentation each provides. We expect the book to function in the

same way that an English–French dictionary informs users of both the equivalent nouns

and verbs in the two languages as well as the diﬀerences in grammar. We include many

common tasks, including data management, descriptive summaries, inferential procedures,

regression analysis, multivariate methods, and the creation of graphics. We also show some

more complex applications. In toto, we hope that the text will allow easier mobility between

systems for users of any statistical system.

We do not attempt to exhaustively detail all possible ways available to accomplish a given

task in each system. Neither do we claim to provide the most elegant solution. We have tried

to provide a simple approach that is easy to understand for a new user, and have supplied

several solutions when it seems likely to be helpful. Carrying forward the analogy to an

English-French dictionary, we suggest language that will communicate the point eﬀectively,

without listing every synonym or providing guidance on native idiom or eloquence.

Who should use this book

Those with an understanding of statistics at the level of multiple-regression analysis will

ﬁnd this book helpful. This group includes professional analysts who use statistical packages

almost every day as well as statisticians, epidemiologists, economists, engineers, physicians,

sociologists, and others engaged in research or data analysis. We anticipate that this tool

will be particularly useful for sophisticated users, those with years of experience in only one

system, who need or want to use the other system. However, intermediate-level analysts

should reap the same beneﬁt. In addition, the book will bolster the analytic abilities of a

relatively new user of either system, by providing a concise reference manual and annotated

examples executed in both packages.

xvii

“book” — 2009/6/16 — 16:53 — page xviii — #18

xviii PREFACE

Using the book

The book has three indices, in addition to the comprehensive table of contents. These

include: 1) a detailed topic (subject) index in English; 2) a SAS index, organized by SAS

syntax; and 3) an R index, describing R syntax. SAS users can use the SAS index to look

up a task for which they know the SAS code and turn to a page with that code as well as

the associated R code to carry out that task. R users can use the dictionary in an analogous

fashion using the R index.

Extensive example analyses are presented; see Table C.1 (p. 277) for a comprehensive

list. These employ a single dataset (from the HELP study), described in Appendix C.

Readers are encouraged to download the dataset and code from the book website. The

examples demonstrate the code in action and facilitate exploration by the reader.

Diﬀerences between SAS and R

SAS and R are so fundamentally distinct that an enumeration of their diﬀerences would

be counter-productive. However, some diﬀerences are important for new users to bear in

mind.

SAS includes data management tools that are primarily intended to prepare data for

analysis. After preparation, analysis is performed in a distinct step, the implementation

of which eﬀectively cannot be changed by the user, though often extensive options are

available. R is a programming environment tailored for data analysis. Data management

and analysis are integrated. This means, for example, that calculating the BMI from weight

and height can be treated as a function of the data, and as such is as likely to appear within

a data analysis as in making a “new” piece of data to keep.

SAS Institute makes decisions about how to change the software or expand the scope of

included analyses. These decisions are based on the needs of the user community and on cor-

porate goals for proﬁtability. For example, when changes are made, backwards-compatibility

is almost always maintained, and documentation of exceptions is extensive. SAS Institute’s

corporate conservatism means that techniques are sometimes not included in SAS until

they have been discussed in the peer-reviewed literature for many years. While the R Core

Team controls base functionality, a very large number of users have developed functions

for R. Methodologists often release R functions to implement their work concurrently with

publication. While this provides great ﬂexibility, it comes at some cost. A user-contributed

function may implement a desired methodology, but code quality may be unknown, docu-

mentation scarce, and paid support nonexistent. Sometimes a function which once worked

may become defunct due to a lack of backwards-compatibility and/or the author’s inability

to, or lack of interest in, updating it.

Other diﬀerences between SAS and R are worth noting. Data management in SAS is

undertaken using row by row (observation-level) operations. R is inherently a vector-based

language, where columns (variables) are manipulated. R is case-sensitive, while SAS is

generally not.

Where to begin

We do not anticipate that the book will be read cover to cover. Instead, we hope that

the extensive indexing, cross-referencing, and worked examples will make it possible for

readers to directly ﬁnd and then implement what they need. A user new to either SAS

or R should begin by reading the appropriate Appendix for that software package, which

includes a sample session and overview.

“book” — 2009/6/16 — 16:53 — page2—#22

2 CHAPTER 1. DATA MANAGEMENT

Note: The ﬁle sasfilename.sas7bdat is created by using a libref in a data statement;

see 1.2.1.

load(file="dir_location/savedfile") # works on all OS including Windows

load(file="dir_location\\savedfile") # Windows only

Note: Forward slash is supported as a directory delimiter on all operating systems; a double

backslash is supported under Windows. The ﬁle savedfile is created by save() (see 1.2.1).

1.1.2 Fixed format text ﬁles

See also 1.1.3 (read more complex ﬁxed ﬁles) and 6.4 (read variable format ﬁles)

SAS

data ds;

infile 'C:\file_location\filename.ext';

input varname1 ... varnamek;

run;

filename filehandle 'file_location/filename.ext';

proc import datafile=filehandle

out=ds dbms=dlm;

getnames=yes;

run;

Note: The infile approach allows the user to limit the number of rows read from the

data ﬁle using the obs option. Character variables are noted with a trailing ’$’, e.g., use a

statement such as input varname1 varname2 $ varname3 if the second position contains

a character variable (see 1.1.3 for examples). The input statement allows many options

and can be used to read ﬁles with variable format (6.4.1).

In proc import,thegetnames=yes statement is used if the ﬁrst row of the input ﬁle

contains variable names (the variable types are detected from the data). If the ﬁrst row

does not contain variable names then the getnames=no option should be speciﬁed. The

guessingrows option (not shown) will base the variable formats on other than the default

20 rows. The proc import statement will accept an explicit ﬁle location rather than a ﬁle

associated by the filename statement as in section 4.6.

Note that in Windows installations, SAS accepts either slashes or backslashes to de-

note directory structures. For Linux, only forward slashes are allowed. Behavior in other

operating systems may vary.

In addition to these methods, ﬁles can be read by selecting the Import Data option on

the file menu in the GUI.

ds <- read.table("dir_location\\file.txt", header=TRUE) # Windows only

ds <- read.table("dir_location/file.txt", header=TRUE) # all OS (including

# Windows)

Note: Forward slash is supported as a directory delimiter on all operating systems; a double

backslash is supported under Windows. If the ﬁrst row of the ﬁle includes the name of the

剩余301页未读，继续阅读

ouyangyuanlun

粉丝: 2
资源: 5

会员权益专享

SAS与R结合的数据管理与可视化实战

SAS可视化分析

R语言编程与数据统计分析实战.pdf

SAS从入门到精通@数据分析精选.rar

SAS和R语言编程产生的数据可视化有什么区别

python数据分析与可视化知识点

统计分析与可视化技术的国内外现状

数据分析师哪些软件必学

数据分析需要学习什么软件

r语言与指标体系建构

数据统计的统计图形和表格采用的技术

大数据可视化用了什么技术

数据分析师需要学习什么软件

R语言的优势和劣势，相关可替代R语言的软件和语言及对比

请用三点拟定一份大数据管理与应用的实习日程安排

数据分析师需要掌握什么软件

应聘数据开发工程师岗位需要具备什么专业技能?

大数据分析师通常用的工具

基于历史数据进行预测，用什么软件来开发比较合适

数据分析师需要掌握什么知识

哪些软件可以用于统计数据

会员权益专享

最新资源