![](https://csdnimg.cn/release/download_crawler_static/10752903/bg12.jpg)
2 INTRODUCTION
gavote
equip econ perAA rural atlanta gore bush other votes ballots
APPLING LEVER poor 0.182 rural notAtlanta 2093 3940 66 6099 6617
ATKINSON LEVER poor 0.230 rural notAtlanta 821 1228 22 2071 2149
....
The output in this text is shown in typewriter font. I have deleted most of the
output to save space. This dataset is small enough to be comfortably examined in
its entirety. Sometimes, we simply want to look at the first few cases. The head
command is useful for this:
head(gavote)
equip econ perAA rural atlanta gore bush other votes ballots
APPLING LEVER poor 0.182 rural notAtlanta 2093 3940 66 6099 6617
ATKINSON LEVER poor 0.230 rural notAtlanta 821 1228 22 2071 2149
BACON LEVER poor 0.131 rural notAtlanta 956 2010 29 2995 3347
BAKER OS-CC poor 0.476 rural notAtlanta 893 615 11 1519 1607
BALDWIN LEVER middle 0.359 rural notAtlanta 5893 6041 192 12126 12785
BANKS LEVER middle 0.024 rural notAtlanta 1220 3202 111 4533 4773
The cases in this dataset are the counties of Georgia and the variables are (in order)
the type of voting equipment used, the economic level of the county, the percentage
of African Americans, whether the county is rural or urban, whether the county is
part of the Atlanta metropolitan area, the number of voters for Al Gore, the number
of voters for George Bush, the number of voters for other candidates, the number of
votes cast, and ballots issued.
The str command is another useful way to examine an R object:
str(gavote)
’data.frame’: 159 obs. of 10 variables:
$ equip : Factor w/ 5 levels "LEVER","OS-CC",..: 1 1 1 2 1 1 2 3 3 2 ...
$ econ : Factor w/ 3 levels "middle","poor",..: 2 2 2 2 1 1 1 1 2 2 ...
$ perAA : num 0.182 0.23 0.131 0.476 0.359 0.024 0.079 0.079 0.282 0.107 ...
$ rural : Factor w/ 2 levels "rural","urban": 1 1 1 1 1 1 2 2 1 1 ...
$ atlanta: Factor w/ 2 levels "Atlanta","notAtlanta": 2 2 2 2 2 2 2 1 2 2 ...
$ gore : int 2093 821 956 893 5893 1220 3657 7508 2234 1640 ...
$ bush : int 3940 1228 2010 615 6041 3202 7925 14720 2381 2718 ...
$ other : int 66 22 29 11 192 111 520 552 46 52 ...
$ votes : int 6099 2071 2995 1519 12126 4533 12102 22780 4661 4410 ...
$ ballots: int 6617 2149 3347 1607 12785 4773 12522 23735 5741 4475 ...
We can see that some of the variables, such as the equipment type, are factors. Fac-
tor variables are categorical. Other variables are quantitative. The perAA variable is
continuous while the others are integer valued. We also see the sample size is 159.
A potential voter goes to the polling station where it is determined whether he
or she is registered to vote. If so, a ballot is issued. However, a vote is not recorded
if the person fails to vote for President, votes for more than one candidate or the
equipment fails to record the vote. For example, we can see that in Appling county,
6617 −6099 = 518 ballots did not result in votes for President. This is called the
undercount. The purpose of our analysis will be to determine what factors affect the
undercount. We will not attempt a full and conclusive analysis here because our main
purpose is to illustrate the use of linear models and R. We invite the reader to fill in
some of the gaps in the analysis.
Initial Data Analysis: The first stage in any data analysis should be an initial
graphical and numerical look at the data. A compact numerical overview is: