R语言数据导入导出指南

需积分: 10 5 浏览量更新于2024-07-16 收藏 302KB PDF 举报

"R Data Import_Export.pdf 是一本关于R语言数据导入与导出的指南，涵盖了R语言标准库以及CRAN上可用的包提供的各种导入和导出功能。本书适用于R版本3.6.1（2019-07-05），主要讨论如何处理不同类型的数据源，如文本文件、电子表格格式数据、XML等，同时也涉及数据编码、数据重塑和 contingency tables 等主题。" 在R语言中，数据导入和导出是数据分析工作流的关键部分。这本书首先介绍了导入数据的基本概念，包括不同数据源的处理方式。对于数据导入，书中强调了`encodings`的重要性，因为不同的编码方式可能会影响到非ASCII字符的正确读取。例如，UTF-8编码能处理全球各种语言的字符，但在处理某些旧数据集时，可能需要识别并转换为其他编码。接着，书中详细讲解了如何将数据导出为文本文件，这是数据交换和存储的常见形式。固定宽度格式文件的处理也是一个重点，这种格式的文件每一列的数据宽度是固定的，理解如何通过`read.fwf`函数解析这类文件是十分必要的。对于类似于电子表格的数据，如CSV或Excel文件，书中提到了`read.table`函数的各种变体，如`read.csv`和`read_excel`。这些函数使得用户能够方便地导入结构化的数据。此外，还探讨了处理固定宽度格式文件、DIF（数据交换格式）文件的方法，以及如何直接使用`scan`函数来读取数据。在数据重塑方面，书中介绍了一些工具，用于将数据从长格式转换为宽格式或反之，这对于分析和可视化是非常有用的。例如，`reshape2`包中的`melt`和`dcast`函数可以轻松地进行数据转换。书中还涉及了XML数据的处理，XML是一种通用的数据交换格式，通常用于web服务和结构化数据的存储。R提供了一些包，如`XML`和`xml2`，它们允许用户解析和生成XML文档，这在处理网络爬虫数据或与其他系统交互时非常有用。总而言之，"R Data Import_Export.pdf"提供了全面的指导，帮助R用户有效地管理和转化来自不同来源的数据，是数据挖掘和分析过程中的重要参考资料。无论你是初学者还是经验丰富的R用户，这本书都能为你提供实用的知识和技巧，以提升你在数据处理上的效率和准确性。

XML

(https: / /

CRAN .

R-project .

org /

package=XML)

Duncan Temple Lang

Brian Ripley is the author of the support for connections.

1 Introduction

Reading data into a statistical system for analysis and exporting the results to some other system

for report writing can be frustrating tasks that can take far more time than the statistical analysis

itself, even though most readers will ﬁnd the latter far more appealing.

This manual describes the import and export facilities available either in R itself or via

packages which are available from CRAN or elsewhere.

Unless otherwise stated, everything described in this manual is (at least in principle) available

on all platforms running R.

In general, statistical systems like R are not particularly well suited to manipulations of

large-scale data. Some other systems are better than R at this, and part of the thrust of

this manual is to suggest that rather than duplicating functionality in R we can make another

system do the work! (For example Therneau & Grambsch (2000) commented that they preferred

to do data manipulation in SAS and then use package survival (https://CRAN.R-project.

org/package=survival) in S for the analysis.) Database manipulation systems are often very

suitable for manipulating and extracting data: several packages to interact with DBMSs are

discussed here.

There are packages to allow functionality developed in languages such as Java, perl and

python to be directly integrated with R code, making the use of facilities in these languages even

more appropriate. (See the rJava (https://CRAN.R-project.org/package=rJava) package

from CRAN and the SJava, RSPerl and RSPython packages from the Omegahat project, http://

www.omegahat.net.)

It is also worth remembering that R like S comes from the Unix tradition of small re-usable

tools, and it can be rewarding to use tools such as awk and perl to manipulate data before

import or after export. The case study in Becker, Chambers & Wilks (1988, Chapter 9) is an

example of this, where Unix tools were used to check and manipulate the data before input to

S. The traditional Unix tools are now much more widely available, including for Windows.

This manual was ﬁrst written in 2000, and the number of scope of R packages has increased

a hundredfold since. For specialist data formats it is worth searching to see if a suitable package

already exists.

1.1 Imports

The easiest form of data to import into R is a simple text ﬁle, and this will often be acceptable for

problems of small or medium scale. The primary function to import from a text ﬁle is scan, and

this underlies most of the more convenient functions discussed in Chapter 2 [Spreadsheet-like

data], page 8.

However, all statistical consultants are familiar with being presented by a client with a

memory stick (formerly, a ﬂoppy disc or CD-R) of data in some proprietary binary format,

for example ‘an Excel spreadsheet’ or ‘an SPSS ﬁle’. Often the simplest thing to do is to use

the originating application to export the data as a text ﬁle (and statistical consultants will

have copies of the most common applications on their computers for that purpose). However,

this is not always possible, and Chapter 3 [Importing from other statistical systems], page 14,

discusses what facilities are available to access such ﬁles directly from R. For Excel spreadsheets,

the available methods are summarized in Chapter 9 [Reading Excel spreadsheets], page 29.

In a few cases, data have been stored in a binary form for compactness and speed of access.

One application of this that we have seen several times is imaging data, which is normally stored

as a stream of bytes as represented in memory, possibly preceded by a header. Such data formats

are discussed in Chapter 5 [Binary ﬁles], page 22, and Section 7.5 [Binary connections], page 26.

For much larger databases it is common to handle the data using a database management

system (DBMS). There is once again the option of using the DBMS to extract a plain ﬁle, but

剩余36页未读，继续阅读

身份认证购VIP最低享 7 折!

30元优惠券

「已注销」

粉丝: 1

R语言数据导入导出指南

MATLAB® 7 数据导入导出教程与联系方式

使用iframe标签实现网页内嵌PDF预览技术

11gOCP官方课件/Admin-II下半部分下载指南

EMS_Advanced_Data_Export4.13.3.1.7AndImport3.9.7_for_D10.2Tokyo.rar

Less18_MovingData_MB3.pdf

ObjectModel_diagram.pdf

【TTS】传输表空间 AIX_To_Linux.pdf

STEP_7如何导入和导出在_Excel_中编辑的_Symbol_Table.pdf

【实战案例剖析】：如何在真实世界中应用vcsmx_ucli.pdf文档

【技术文档高级功能】：揭秘vcsmx_ucli.pdf隐藏的10大特性

最新资源