Chapter 13 Plotting with ggplot

The goal of this set of exercises is to get more familarity with some of the common ggplot functions that we as economists use frequently. By the end of the exercise we should have a relatively nice looking figure that we would be happy to show someone.

The data we will use comes from the UN Human Development Report from 2011. We are interested in understanding the relationship between the Human Development Index (HDI) and the Corruption Perceptions Index (CPI).

First let us load the libraries we will need:

library(readr)
library(dplyr)
library(ggplot2)

Let’s get started!

13.1 Load Data and Clean variable names

  1. Load the data from the file data/EconomistData.csv.
  2. Convert all columns names to snakecase (i.e. my_variable)

13.2 One Variable Graphs

First we work with some single variable plots.

  1. Create a histogram of the human development index. Customize the number of bins to make the plot look nicer than the default.
  2. Instead of a histogram, create a density plot of the HDI. Extend your plot by:
    1. In one graph plotting the densities by region.
    2. Creating separate plots per region, with the area under the density to be coloured blue.
  3. Repeat (1) and (2) for the corruption perception index.

13.3 Two Variable Graphs

Now we are going to build up a ‘pretty’ graph that plots the corruption index (along the x-axis) against the human development index (along the y-axis).

  1. Create the simple scatter plot
  2. Let’s extend the plot in different ways. Modify the plot to (each point should be a different plot)
  1. Make the points blue
  2. Color the points by region
  3. Color the points by region and make the size of the point vary by HDI.
  1. Let’s extend the plot in (1) by adding some summary functions to it.
    1. Add a loess smoother
    2. Add a linear smoother, without the confidence interval. Color the line red.
    3. Add the line y ~ x + log(x), without the confidence interval. Color the line red.
  2. Now we will add the country names to the plot from (1).
    1. Use geom_text() to add country names to our plot.
    2. We might not want all the points labelled. Create the vector
    points_to_label <- c("Russia", "Venezuela", "Iraq", "Myanmar", "Sudan",
                   "Afghanistan", "Congo", "Greece", "Argentina", "Brazil",
                   "India", "Italy", "China", "South Africa", "Spane",
                   "Botswana", "Cape Verde", "Bhutan", "Rwanda", "France",
                   "United States", "Germany", "Britain", "Barbados", "Norway", "Japan",
                   "New Zealand", "Singapore")
    Now adjust the code in (a) to only label these points
    1. Install the package ggrepel. Use the function geom_text_repel to repeat (b). Use the help to figure out how it works.
  3. Now let’s combine what we learned above, and from the class notes to build up a presentable notes. Proceed as follows:
    1. Create the simple scatter plot
  1. Make the points hollow, and colored by region. Adjust the size of the dots to make them easier to see.
  2. Add the line y ~ x + log(x), without the confidence interval. Color the line red.
  3. Change the color of the dots to be less ugly. I used scale_color_manual() but you don’t need to.
    1. Add meaningful x and y labels. And a title (which is centered). Can you add a note near the bottom of the figure to say that the data comes from “Transparency International and UN Human Development Report”?
    2. Label the points from points_to_label in 4b.
    3. Adjust the x and y axes to have a better range, and set of axis ticks. You are free to choose what you like.
    4. Move the legend to the bottom of the plot. Adjust the legend names so that they are easier to read and more meaningful. The easiest way to do this is to use dplyr to recode the region variable as a factor, and give it appropriate labels. Using the help menu for factor should help you here.