![](https://csdnimg.cn/release/download_crawler_static/10159996/bg11.jpg)
4
Second, data wrangling requires the ability to work with different forms of data.
Analysts and organization s are fi nding new and unique ways to leverage all forms
of data so it’s important to be able to work not only with numbers but also with
character strings , categorical variables, logical variables, regular expression , and
dates . Part II explains how to work with these different classes of data so that when
you start to learn how to manage the different data structures , which combines these
data classes into multiple dimensions , you will have a strong knowledge base.
Third, modern day datasets often contain variables of different lengths and
classes. Furthermore, many statistical and mathematical calculations operate on dif-
ferent types of data structures . Consequently, data wrangling requires a strong
knowledge of the different structures to hold your datasets. Part III covers the differ-
ent types of data structures available in R , how they differ by dimensionality and
how to create, add to, and subset the various data structures . Lastly, I cover how to
deal with missing values in data structures . Consequently, this part provides a robust
understanding of managing various forms of datasets.
Fourth, data are arriving from multiple sources at an alarming rate and analysts
and organizations are seeking ways to leverage these new sources of information.
Consequently, analysts need to understand how to get data from these sources.
Furthermore, since analysis is often a collaborative effort, analysts also need to know
how to share their data. Part IV covers the basics of importing tabular and spread-
sheet data, scraping data stored online, and exporting data for sharing purposes.
Fifth, minimizing duplication and writing simple and readable code is important
to becoming an effective and effi cient data analyst. Moreover, clarity should always
be a goal throughout the data analysis process. Part V introduces the art of writing
functions and using loop control statements to reduce redundancy in code. I also
discuss how to simplify your code using pipe operators to make your code more
readable. Consequently, this part will help you to perform data wrangling tasks
more effectively, effi ciently, and with more clarity.
Last, data wrangling is all about getting your data into the right form in order to
feed it into the visualization and modeling stages. This typically requires a large
amount of reshaping and transforming of your data. Part VI introduces some of the
fundamental functions for “ tidying ” your data and for manipulating, sorting , sum-
marizing, and joining your data. These tasks will help to signifi cantly reduce the
time you spend on the data wrangling process.
Individually, each part will provide you important tools for performing individual
data wrangling tasks. Combined, these tools will help to make you more effective
and effi cient in the front end of the data analysis process so that you can spend more
of your time visualizing and modeling your data and communicating your results!
Fig. 1.1 Data Wrangling
1 The Role of Data Wrangling