Objectives


Related to: Data Computing, “Frames, Glyphs, and other Components of Graphics”, Ch. 6; “Graphics and Their Grammar”, Ch. 8

<– Back to Table of Contents

ggplot: R’s graphical “sandwich artist”

Have you ever admired the “sandwich artists” who so skillfully put together your sandwich at Subway? Have you ever wanted to learn how to create and serve up graphics that are a feast for the eyes and as tasty as your favorite sub sandwich? Well, today’s your lucky day! Pull on your latex gloves and get ready to become a “sandwich”…erm… “graphics” artist!

At this point, you should know how to make a few different basic plots (ex: barplot, histogram, scatterplot). And up until now, you’ve been making these basic plots using built-in “base R” functions and graphics settings. This approach works just fine for exploratory analysis as you’re first getting acquainted with your data. But if you want to make more polished graphics to share as part of a report or publication, you’re going to want up your graphics game a bit. This is where the ggplot library comes in!

Putting together graphics using ggplot is a lot Like making a sub sandwich: you need to decide what kind of “bread” to have on the outside, what kinds of “toppings” you’re going to have on the inside, and how you’re going to “slice it up” before you serve it. Do you need a graphic that’s as simple and delicious as a single-serving ham and cheese sub? Or do you need a graphic that’s as loaded as a giant party sub, with lots of different topics and slices? Either way, ggplot can help you out.

To get started with ggplot, you’ll need to make sure to install it the first time you use it:

install.packages("ggplot2")

After ggplot is successfully installed, every time you want to use it, just make sure to load the library by running the following line of code at the beginning of your project. Generally, you should only need to run this once at the beginning of each R project or session:

library(ggplot2)

And before we get started making some graphics, let’s load up the Minneapolis buildings energy benchmarking dataset. Make sure the .CSV file containing the data is in your R working directory, then load the dataset into your R environment:

data <- read.csv("../datasets/mpls_energy_benchmarking_2015.csv", header=TRUE, na.strings=c("N/A", "Not Available", "NA", "0"))

Pro tip: Why “ggplot2”?

You may be wondering: Why do I need to load “ggplot2” instead of just simply “ggplot”? What happened to ggplot number 1? ggplot is a relatively old library, so there was an original version called ggplot1. Starting around 2008, however, the library was rewritten as “ggplot2”. The original ggplot is now out of date, and you will need to load ggplot2 to get the most current version. To learn more about ggplot2, check out the ggplot2.org website.

The whole shebang

ggplot graphics always share the same general format, so do your best to try to memorize this general pattern as you work through the various examples in this tutorial. You always need to start with a frame to hold it all together, then you can add glyphs like “toppings” to display your actual data using different visual styles and symbology, and finally you can slice up your graphic using facets. You’ll also need to string each of these components together using ggplot’s pipe function, the plus symbol (+). Here’s a big picture look at how an entire ggplot graphic can be assembled, sandwich-style:

graphic showing format for ggplot(data, aes(x= , y= )) and equating it to sandwich making

Now, let’s take a look at each of these components separately…

The Frame: The “bread” for your graphics

The frame acts a lot like the bread that will hold your graphical “sandwich” together. The frame is simply the grid upon which you’ll build the rest of your graphic. When you’re setting up the frame for a graphic, you need to call the ggplot() function and pass it two arguments: 1) the dataset you want to plot, and 2) a set of parameters called “aesthetics”, wrapped in aes().

The “aesthetics” parameters can be a little confusing at first, so pay attention to how they are used throughout the examples that follow. In general, the aesthetics will include a combination of the following information:

Some graphics only need a single “x” value in the aesthetic. For example, here’s the frame for a plot that looks at one variable–the site energy use intensity (EUI) (“site_EUI”) of the various buildings in our dataset:

ggplot(data, aes(x=site_EUI))

Some graphics need both an “x” and a “y” value in the aesthetic. For example, here’s the frame for a plot comparing the year a building was built (explanatory variable) to its site EUI (response variable):

ggplot(data, aes(x=year_built, y=site_EUI))