Introduction

Getting Started

First, let’s load up our required packages and data.

library(ggplot2) #for plotting
library(dplyr) #for data manipulation
library(plotly) #for interactivity
data(diamonds)

Diamonds

What variables are included in the diamond dataset, and what are their types? Getting a good sense of the variables included in your data is the first step to making great visualizations!

glimpse(diamonds)
## Observations: 53,940
## Variables: 10
## $ carat   <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, ...
## $ cut     <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very G...
## $ color   <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, ...
## $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI...
## $ depth   <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, ...
## $ table   <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54...
## $ price   <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339,...
## $ x       <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, ...
## $ y       <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, ...
## $ z       <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, ...

Initial Graphing Ideas

Let’s try a scatter plot between carat and price, adding in a smoothed regression line to help us see the relationship between these variables. ggplot2 makes it very easy to “stack” aethetic layers in this way.

diamonds <- diamonds[sample(nrow(diamonds), 5000), ]
p <- ggplot(diamonds, aes(x = carat, y = price)) + 
  geom_point() + 
  geom_smooth()

The Plot

Let’s check out the plot we just created.

Expanding on our Aesthetics

The plot looks nice, but seems to be missing something…color! Notice how easy it is to add an additional parameter to the plot we just created.

p2 <- p + aes(col = color)

Graphing with Color

Looks much better now. However, the color argument has the additional effect of creating unique regression lines for each color, making the plot more cluttered and hard to interpret as a result.

Plotly to the Rescue

Here’s where plotly comes in! The ggplotly() function can be applied to (almost) any ggplot2 object in order to make it interactive, with additional features like toggling which data to show and ability to zoom in and hover on specific data points.

Interactive ggplot 1

ggplotly(p2)

GGplot

Attempting to make similar graphs in ggplot requires extra effort, without the added customizability.

p <- diamonds %>% 
  filter(color == "D" | color == "J") %>% 
  ggplot(aes(x = carat, y = price, col = color)) + 
  geom_point() + 
  geom_smooth()

ggplot Attempt

More Plotly Examples

We can try a similar plot to before, adding in an additional variable to facet by as well as changing the type of the regression line from smoothed to linear.

p<- ggplot(diamonds, aes(x =carat, y = price, col = color)) + 
           geom_point() + 
           geom_smooth(method="lm") + 
           facet_grid(~clarity)

Interactive ggplot 2