List of R Commands from each lab

HANDOUTS ON VARIOUS R FUNCTIONS

1. Importing data into R.pdf

2. Cross-tabulating and Recoding Variables in R

3. R recoding.R

PLS 300

Political Analysis

Lab 2

To tabulate variable contents or calculate summary statistics, we use two functions:

summary( ) # lists min, max, median, mean, 25 and 75th %

Example:

> summary(poll$cheneyft)

Min. 1st Qu. Median Mean 3rd Qu. Max. NA's

0.00 30.00 50.00 49.51 70.00 100.00 72.00

In the example, `NA’s’ refer to missing observations. (In this poll, some people skipped the question.)

We can inspect variable contents across groups of a second, factor variable. We use the function

tapply(VARIABLE NAME, GROUPING VARIABLE, FUNCTION NAME, na.rm=TRUE)

The function name can be any command such as mean, min, max, sd, var, or quantile. For example, to calculate means on the feminist feeling thermometer score, by broad party identification groups, we use:

tapply(poll$feministsft, poll$party, mean, na.rm=TRUE)

Republican Democrat Independent

48.96497 64.78370 55.59221

The above output shows the means for feeling toward feminists given party groups.

For quantiles:

> tapply(poll$feministsft, poll$party, quantile, na.rm=TRUE)

$Republican

0% 25% 50% 75% 100%

0 40 50 60 100

$Democrat

0% 25% 50% 75% 100%

0 50 60 85 100

$Independent

0% 25% 50% 75% 100%

0 50 50 70 100

The other portion of the lab consists of graphical displays of data. I demonstrate those command here:

histogram(~ feministsft, data=poll, main="Feelings toward feminists, 2004", xlab="Feminists feeling thermometer score")

The basic structure of the command is histogram(~ VARIABLE NAME, data=DATA SOURCE)

Density plots can be thought of as smoothed versions of histograms. (We’ll consider the technical details at the end of the course. For now, when plotting these figures, focus your attention on the visual distribution of each variable. )

To plot two feeling thermometers together, each variable name is separated by a + sign. You can plot as many as you like. Here’s a density plot comparing Bush, Kerry, and Hillary Clinton

densityplot(~ bushft + kerryft + hclintonft, data = poll, plot.points=FALSE, auto.key=TRUE)

Here are a couple more examples of density plots with paneling.

densityplot(~ cheneyft + bushft + kerryft + edwardsft | party, data = poll, plot.points = FALSE, auto.key=TRUE)

densityplot(~ cheneyft + bushft + kerryft + edwardsft | party, data = poll, groups = gender, plot.points = FALSE, auto.key=TRUE)

A Box-Whiskers plot. The | symbol is used to panel different comparisons of feeling thermometers. The | symbol appears after the main variable intended to be displayed as a boxplot.

> bwplot(~ cheneyft | factor(party), data=poll, xlab="Feelings toward Richard Cheney" )

Two Scatterplots, one with and without jittering:

First we plot two feeling thermometers:

xyplot(bushft ~ cheneyft, data=poll, ylab="Bush", xlab="Cheney", main="Feeling Thermometer Comparisons")

The plot obscures a lot of covariation, due to overplotting. For example, all the people who gave Bush and Cheney 100s (or nearly so) in their evaluations have their points plotted on top of one another.

Next, we add “jitteryness” to the plot --- a tiny random score is added or subtracted to each variable (such as plus or minus .01), so that in the plot the scores appear plotted next to one another instead of directly on top.

xyplot(jitter(bushft) ~ jitter(cheneyft), data=poll, ylab="Bush", xlab="Cheney", main="Feeling Thermometer Comparisons")

While better, the overplotting is still apparent. You can add stronger jittering with the factor= argument. (Factor can vary from 1 to 3 without too much distortion; but try larger numbers such as 5 and you will see what results.)

xyplot(jitter(bushft, factor=2) ~ jitter(cheneyft, factor=2), data=poll, ylab="Bush", xlab="Cheney", main="Feeling Thermometer Comparisons")