9 Graphics

In this lecture, you will learn how to create graphics in R.

You can create graphics using both base R functions or the ggplot2 R package. Personally, I like to use base R functions to preview my graphs, but I use ggplot2 for graphs I want to show to the masses.

Here are some packages that we will use for this lesson.

library("here")
library("readxl") # read excel data in R
library("countrycode")

And let’s refer to some files and folders.

# setup folders and directories
here("data")
here("code")

9.1 I am going to pay lip service to the comparativists…

By now, students of comparative politics students are probably familiar with the Correlates of War (CoW) and Polity IV datasets. These are extremely popular datasets that scholars tends to use separately or in tandem. If you ever do want to use them in tandem, you are in luck! Because they are so popular, there is already an R package that allows you to merge the two datasets painlessly.

And Americanists: it’s good to know about the flagship datasets in other subfields. It makes you a well-rounded scholar.

First, download the CoW national mateiral capabilities dataset and Polity IV coup d’etat dataset. Then, load them into R.

Please download the codebooks as well. You should always download the codebooks or bookmark an online version if they’re available.

# read excel data in R: coup data
# http://www.systemicpeace.org/inscrdata.html
coups_polity <- read_excel(here("data", "CSPCoupsListv2018.xls"))

# read csv file in R: national trade data
# https://correlatesofwar.org/data-sets/bilateral-trade
power_cow <- read.csv(here("data", "NMC_v4_0.csv"))

You have installed the countrycode R package earlier in the lesson. Now it is time to use it. Here, you basically want to make sure the country IDs fo reach dataset “link” together. Many datasets label their countries differently–some with numbers, some with abbreviations, some with different abbreviations. Think of this linking exercise in this logical sequence: A = B, B = B, B = C.

In the example below, we can see that the Polity scode variable is equal to the p4c variable in the countrycode package. We want to link Polity data to CoW numeric data, which is in the CoW dataset we downloaded. Once we merge the Polity ID to the CoW ID, we can merge the coup and trade datasets and use that common ID to link to a country name.

coups_polity$p4c <- countrycode(coups_polity$scode, origin = "p4c", destination = "cown")

power_cow$cown <- countrycode(power_cow$ccode, origin = "cown", destination = "cown")

coups_power <- merge(coups_polity, power_cow, id="cown")

coups_power$country <- countrycode(coups_power$cown, origin = "cown", destination = "country.name")

As a sanity check—and because these datasets aren’t that big anyway—I’d like to View() my dataset in R and look at their variable names().

# view the dataset
View(coups_power)

# look at the variable names
names(coups_power)

9.2 base R graphs

Sometimes you want to look at the distribution of a particular variable or the correlation of two variables, so you want a quick and dirty function that allows you to do those things, without having to think too much about other stuff. That’s where base R graphs can be useful.

9.2.1 Scatter plots

To learn more about scatterplots, review the STHDA tutorial.

So say for some reason, we want to answer the question: Does a higher urban population lead to a higher total population? (This is a random question with no theoretical backing. I don’t know anything about these datasets.)

We want to define our X (urban population) and Y (total population).

x <- coups_power$upop
y <- coups_power$tpop

You want to use the plot() command to create a scatterplot. Let’s break it down.

  1. You have already defined x and y in the previous two lines of code.

  2. main defines the title of the graph.

  3. xlab and ylab are the labels for the x and y axes.

  4. pch are plot symbols. You can obtain the full list of plot symbols at STHDA.

  5. frame = TRUE creates a box around the graph. If frame = FALSE, the box disappears.

  6. abline(lm(y ~ x, data = coups_power), col = "blue") adds a blue regression line.

plot(x, y, main = "Effect of urban population on total population",
     xlab = "urban population", ylab = "total population",
     pch = 19, frame = TRUE)
abline(lm(y ~ x, data = coups_power), col = "blue")

base-scatter

9.2.2 Box plots

You want to create a boxplot of the variable the counts the number of successful coups.

# Box plot of one variable: success of coups
boxplot(coups_power$success)

base-boxplot

9.2.3 Bar plots

Now we can create a simple barplot of the number of successful coups.

barplot(coups_power$success)

base-barplot

Well, that doesn’t look too great. Why don’t you plop the counts into a table instead?

barplot(table(coups_power$success))

base-barplot-table

You can even consider looking at the number of successful coups per year.

barplot(table(coups_power$success, coups_power$year))

base-barplot-xtable

9.2.4 Line plots

Instead of pre-defining the X and Y variables like we did for the scatterplots, we can directly plop the X and Y variables into the plot() function. Let’s review the different components.

  1. type indicates the type of line. So type = l indicates a line, and type = p indicates points. Quick-R has a list of type values.
plot(coups_power$year, coups_power$success, type = "l", lty = 1)

base-lineplot

9.2.5 Histogram and density plots

You can also create a histogram of the number of successful coups. You can also specify the breaks by indicating the algorithm you prefer.

hist(coups_power$success, breaks = "Sturges")

base-hist

9.2.6 Pie charts

I’m not going to teach how to create a pie chart. There is extensive documentation on the Internet about why pie charts are evil. Look them up. In any case, social scientists are not fans of pie charts. Do not be surprised if you create a pie chart and someone tells you to change it into a bar plot. If you somehow learn how to create a pie chart in R, I just want it to be on the record that you did not learn it from me.

9.3 World of ggplot2

The gg in ggplot2 apparently stands for “grammar of graphics.” This sounds way too hifaluten for me to actually explain, but essentially, it is a package that abides by a certain structure. What structure? you may ask. Well, there are components of a ggplot2 graph that you will see all the time, not unlike prepositions or punctuations in the English language.

If you are curious about the nitty gritty theory of graphic-building, Hadley Wickham’s “A Layered Grammar of Graphics” (Wickham created ggplot2) might make a good nightcap.

In general, I highly recommend looking at the official ggplot2 website because they do a pretty good job explaining everything. If you care about making things look pretty, I would point you to the R Gallery for some ggplot2 inspo.

I am also indebted to Selva Prabhakaran and Mike DeCrescenzo for their excellent tutorials.

9.3.1 Load packages

library("magrittr")
library("tidyverse")
library("ggplot2")
library("scales")

9.3.2 Load dataset

The truth is, you’re probably not going to write a ggplot2 graph from scratch because you’re not a machine. You’re probably going to copy and paste the code from the Internet or an old .R file. The goal of this lesson is to help you learn how to recycle the ggplot2 structure so you can customize it to your intended use. In short, I just want you to understand what to do with a ggplot2 structure and mold it into something you can use.

9.3.3 Components

Here are some ggplot2 components that you need to know. We’ll revisit them again and discuss how to use them, but here’s a general overview.

  • data: the data source that you are using to grab variables from to put on the graph

  • aesthetics: how your graph should look like (e.g., colors)

  • geoms: how you want to plot your graph (e.g, line, points, bargraph)

  • scale: how you want to present your graph (e.g., by color, which axes)

  • coordinates: what kind of plane you are plotting (for the purpose of not being confusing, we are going to use the x and y coordinates)

  • facets: split up your graph into different categories

  • theme: change font size, font color, etc.

9.3.4 Create plot

Create a plot with ggplot2.

  1. You put everything in the command ggplot().

  2. data allows you to refer to your dataset and in this case, it is power_cow.

  3. Within the aes command, you put in your X and Y variables. In this case, the x-axis is year and the y-axis is milex (military expenditures).

# milex: military expenditures
ggplot(data = power_cow, aes(x = year, y = milex))

ggplot-create

You’ve created the plot. But it’s not populated…

9.3.5 Populate plot

You want to “color in the lines” and decide how to populate the graph.

9.3.5.1 Dot plot

  • geom_point: create a dot plot
ggplot(data = power_cow, aes(x = milper, y = milex)) +
  geom_point()

ggplot-point

9.3.5.2 Fit a line

You can also fit a regression line on your plot. Notice that, instead of lm, you use geom_smooth and define the method to use to create your line. All options are on the ggplot2 smoothed conditional means page, under the methods definition.

ggplot(data = power_cow, aes(x = milper, y = milex)) +
  geom_point() +
  geom_smooth(method="lm")

ggplot-point-lm

You can also turn your ggplot into an object.

g <- ggplot(power_cow, aes(x = milper, y = milex)) + geom_point() + geom_smooth(method="lm")

Then you can access the ggplot by just referring to the object g.

g

9.3.6 Aesthetics

You can control how your plot looks in ggplot. Of course, you can do this with base R as well, but ggplot is much more versatile with what you can do with your graph.

9.3.6.1 Labels

g + labs(title="Military Personnel and Expenditures", subtitle="From power COW dataset", y="Expenditures", x="Personnel", caption="Military")

ggplot-scatter-labels

9.3.6.2 Colors

Be careful. Color-blind people hate colors. People with colorblindness have a difficult time reading graphs with too many colors/too many similar colors. Please consider reading about best practices when it comes to using colors in a way that is accessible to all people.

There are some palettes that are supposedly colorblind-friendly. I cannot attest to their effectiveness. In general, you should try to use black and white as much as possible just because your paper will likely bepublished in black and white. Colors can be fun for presentations, but not always useful otherwise.

9.3.6.2.1 Change color and size of points
ggplot(power_cow, aes(x = milper, y = milex)) + 
  # Set color and size for points
  # here are some colors (starts from page 2): http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
  geom_point(col="steelblue", size=3) + 
  geom_smooth(method="lm", col="firebrick") +  # change the color of line
  labs(title="Military Personnel and Expenditures", subtitle="From power COW dataset", y="Expenditures", x="Personnel", caption="Military")

ggplot-point-size-color

You may also change colors to reflect the category in another variable.

ggplot(power_cow, aes(x = milper, y = milex)) + 
  geom_point(aes(col=year), size=3) + 
  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) +
  labs(title="Military Personnel and Expenditures", subtitle="From power COW dataset", y="Expenditures", x="Personnel", caption="Military")

ggplot-point-year-gradient

Notice how the year is a gradient. You may want to change it into factors…

ggplot(power_cow, aes(x = milper, y = milex)) + 
  geom_point(aes(col=factor(year)), size=3) + 
  # Set color to vary based on state categories.
  geom_smooth(method="lm", col="firebrick", size=2) +
  labs(title="Military Personnel and Expenditures", subtitle="From power COW dataset", y="Expenditures", x="Personnel", caption="Military")

ggplot-point-toomanyyears

You may notice that there are just too many years. Instead, you may want to filter to the last five years in the dataset. You can do this by leveraging your data manipulation skills.

power_cow2 <- dplyr::filter(power_cow, year == 1995:2000)
power_cow2

Now you can plot a slightly less annoying graph.

  • by doing factor(year), I am grouping the points together by year.
# I turned it into an object so we can play around with it later...
gg2 <- ggplot(power_cow2, aes(x = milper, y = milex)) + 
  geom_point(aes(col=factor(year)), size=3) + 
  geom_smooth(method="lm", col="firebrick", size=2) +
  labs(title="Military Personnel and Expenditures", subtitle="From power COW dataset", y="Expenditures", x="Personnel", caption="Military")
  
# check out the graph-object
gg2

ggplot-factor-years

You may also change the color schemes if you’re not happy with it. You can use the RColorBrewer package. All the palettes in the library are available here. There are also many other palette packages you can find on the web.

library(RColorBrewer)
gg2 + scale_colour_brewer(palette = "Set1")

ggplot-year-colorbrewer

9.3.6.3 Bar graphs

  • geom_bar: you want to create a bar graph this time so you are going to use geom_bar

  • stat = 'identity': sums the number of military expenditures per year

  • scale_y_continuous(labels = scales::comma): tells the graph to create a y-axis that has labels separated by commas. For more options. review ggplot2’s guide on position scales for continuous data.

ggplot(power_cow2, aes(x=factor(year), y=milex, label=milex)) + 
  geom_bar(stat='identity', width=.5) +
  labs(title="Military Expenditures by Year", subtitle="Around the world", y="Expenditures ($)", x="Year") +
  scale_y_continuous(labels = scales::comma)

ggplot-bargraph

9.3.6.4 Faceting

What if I want to separate my graphs by states (i.e., countries)? You can use facet_wrap. You can use facets to put your graphs in a grid. Review the ggplot2 guide on facets for more information.

ggplot(power_cow2, aes(x=factor(year), y=milex, label=milex)) + 
  geom_bar(stat='identity', width=.5) +
  labs(title="Military Expenditures by Year", subtitle="Around the world", y="Expenditures ($)", x="Year") +
  scale_y_continuous(labels = scales::comma) +
  facet_wrap(~ stateabb)

ggplot-facetwrap

Sigh, but the x-axis looks ugly and I can’t read it. Why don’t we rotate it?

  • angle = 90: rotate it by 90 degrees
  • hjust: horizontal justification (right-justified) if 1; left-justified if hjust = 0
ggplot(power_cow2, aes(x=factor(year), y=milex, label=milex)) + 
  geom_bar(stat='identity', width=.5) +
  labs(title="Military Expenditures by Year", subtitle="Around the world", y="Expenditures ($)", x="Year") +
  scale_y_continuous(labels = scales::comma) +
  facet_wrap(~ stateabb) +
  theme(axis.text.x=element_text(angle=90, hjust=1))

ggplot-facetwrap-rotate

This is a lot to digest…

We can do some more tweaking…

  • scale_y_continuous: you can change the scale to 0, 10, 20, 30 M

  • theme(axis.text.x=element_text(angle=90, hjust=1, size = 4)): size = 4 is the size of the font in the x-axis

ggplot(power_cow2, aes(x=stateabb, y=milex, label=milex)) + 
  geom_bar(stat='identity', width=.5) +
  labs(title="Military Expenditures by Year", subtitle="Around the world", y="Expenditures ($)", x="Year") +
  scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6)) +
  facet_wrap(~ year) +
  theme(axis.text.x=element_text(angle=90, hjust=1, size = 4))

ggplot-facetwrap-yaxis

You can play around with the y-axis even more…

We can: * scale_y_log10: rescale the y-axis to “zoom in” more * this is when we take a log10 of the y values * many people handle expenditures in the millions * logarithms are an easier (and clearer) way to express large numbers

  • we are using facet_wrap again but you may use facet_grid(var1 ~ var2) if you have data and want to split the graph into more categories (e.g., year ~ gender)
ggplot(power_cow2, aes(x=stateabb, y=milex, label=milex)) + 
  geom_bar(stat='identity', width=.5) +
  labs(title="Military Expenditures by Year", subtitle="Around the world", y="Expenditures ($)", x="Year") +
  scale_y_log10(labels = scales::label_number_si()) +
  facet_wrap(~ year) +
  theme(axis.text.x=element_text(angle=90, hjust=1, size = 4))

ggplot-facetwrap-year

The best way to learn ggplot2 is through experience. Trust me on this.

9.3.6.5 Extensions

You can do a bunch of other things with ggplot2, such as creating animated graphs, adding themes to graphs, making your graphs interactive, and so on! I recommend looking at the ggplot2 extensions for a full suite of options.