<– Back to Table of Contents

Use color to add additional variables or emphasis to graphics

Sometimes you want to add a little splash of color to your graphic–kind of like adding a little bit of “special sauce” to a subway sandwich. Colors are useful if you want to add additional emphasis to one of the relationships in your graphic, or add an additional variable to the relationships already being displayed in your graphic.

Before we get started making colorful graphics, let’s again load both the ggplot library and the Minneapolis buildings energy benchmarking dataset:


data <- read.csv("../datasets/mpls_energy_benchmarking_2015.csv", header=TRUE, na.strings=c("N/A", "Not Available", "NA", "0"))

Generally, you can add color to your plot by simply adding a col=variable_name or fill=variable_name argument to your aesthetics when you’re setting up the plot:

sketch of ggplot aesthetics with color setting added inside of aesthetics

Pro tip: col= or fill=?

There are two different aesthetics that affect your graphic’s color: col and fill. Each of them has a slightly different effect, and it may take a bit of trial and error before you learn which one is most appropriate for which glyphs.

In general, col is what you should use if want to add colors to the points on a scatterplot, or add colors to the outline of your glyphs. And fill is what you should use if you want to fill in colors on a boxplot, violin plot, or other plot that has fillable geometric shapes. Here’s a quick table that summarizes when you should use col and when you should use fill:

Plot type col effect fill effect
boxplot adds color to outline of each boxplot adds color to inside of each boxplot
histogram adds color to the outline of each bar in histogram adds color to inside of each bar in histogram
barchart adds color to outline of each bar adds color to the inside of each bar
scatterplot colors the points in the scatterplot N/A

Sometimes it’s hard to remember if your glyph requires you to ues col= or fill= inside of your ggplot aesthetics. If aren’t sure which one to use, you can always simply try both and see which one works best!

Colors for categorical variables

Colors can be used to illustrate categorical variables. For example, for our Minneapolis buildings energy benchmarking dataset, you can make a scatterplot comparing the “year_built” vs. the building’s “site”. You can then add an additional dimension to the plot that shows points color-coded based on which organization the building belongs to:

ggplot(data, aes(x=year_built, y=site_EUI, col=org_name)) +

If you don’t like ggplot’s default colors, you can also define your own custom colors to use in your graphic. Simply add scale_colour_manual(values = c()) to your plot to choose custom colors to use with a categorical variable. Just make sure you have the same number of color values as you do categories within your variable. For example, the “organization” variable has 5 different categories within it, so we can manually assign each of these categories a distinct color as follows:

ggplot(data, aes(x=year_built, y=site_EUI, col=org_name)) +
  geom_point() +
  scale_colour_manual(values = c("red", "orange", "yellow", "green", "blue"))