Plotting in R

Explore different types of plots in ggplot2

ggplot2 is an R package for creating graphics based on The Grammar of Graphics1. The Grammar of Graphics is a language for talking about the different parts of a plot, and allow you to build plots creatively and iterively. The following material was developed by Maria Pachiadaki, Sarah K. Hu, Brett Longworth and David Geller.

R version and required packages

The demonstration material was developed and tested in R 4.1.0. It requires the following packages:

palmerpenguins

DT

tidyverse

ggpubr

plotly

dygraphs

cowplot

patchwork

viridis

Dataset

We will use the palmerpenguins dataset. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. Alison Horst gathered the data into an R package and is responsible for all the great penguin illustrations.

We will briefly check the structure of the data table (penguins) before we start plotting. Here I am using the datatable function from from DT package which facilitates the display of dataframes, matrices or tibbles on HTLM pages.

library(palmerpenguins) # load palmerpenguins package
library(DT)# load DT package
datatable(penguins) #check table structure

And summarize the penguins table using the summary function:

summary(penguins)  #summarize data
      species          island    bill_length_mm  bill_depth_mm  
 Adelie   :152   Biscoe   :168   Min.   :32.10   Min.   :13.10  
 Chinstrap: 68   Dream    :124   1st Qu.:39.23   1st Qu.:15.60  
 Gentoo   :124   Torgersen: 52   Median :44.45   Median :17.30  
                                 Mean   :43.92   Mean   :17.15  
                                 3rd Qu.:48.50   3rd Qu.:18.70  
                                 Max.   :59.60   Max.   :21.50  
                                 NA's   :2       NA's   :2      
 flipper_length_mm  body_mass_g       sex           year     
 Min.   :172.0     Min.   :2700   female:165   Min.   :2007  
 1st Qu.:190.0     1st Qu.:3550   male  :168   1st Qu.:2007  
 Median :197.0     Median :4050   NA's  : 11   Median :2008  
 Mean   :200.9     Mean   :4202                Mean   :2008  
 3rd Qu.:213.0     3rd Qu.:4750                3rd Qu.:2009  
 Max.   :231.0     Max.   :6300                Max.   :2009  
 NA's   :2         NA's   :2                                 

Skimr is another useful package to summarize data tables

#install skimr package
library(skimr)
skim(penguins)  #summarize data
Data summary
Name penguins
Number of rows 344
Number of columns 8
_______________________
Column type frequency:
factor 3
numeric 5
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
species 0 1.00 FALSE 3 Ade: 152, Gen: 124, Chi: 68
island 0 1.00 FALSE 3 Bis: 168, Dre: 124, Tor: 52
sex 11 0.97 FALSE 2 mal: 168, fem: 165

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
bill_length_mm 2 0.99 43.92 5.46 32.1 39.23 44.45 48.5 59.6 ▃▇▇▆▁
bill_depth_mm 2 0.99 17.15 1.97 13.1 15.60 17.30 18.7 21.5 ▅▅▇▇▂
flipper_length_mm 2 0.99 200.92 14.06 172.0 190.00 197.00 213.0 231.0 ▂▇▃▅▂
body_mass_g 2 0.99 4201.75 801.95 2700.0 3550.00 4050.00 4750.0 6300.0 ▃▇▆▃▂
year 0 1.00 2008.03 0.82 2007.0 2007.00 2008.00 2009.0 2009.0 ▇▁▇▁▇

As we can see from the summary table three different species of penguins were recorded in three different islands. alt penguins

Scatterplots

Let’s explore if there is a correlation between the body mass of the penguins and the flipper length of the penguins using geom_point():

library(tidyverse) # load the tidyverse package (contains ggplot2)

ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g))+
  #ass points
  geom_point()

Let’s add the trend line (fitting linear model) using geom_smooth():

ggplot(penguins, aes(x=flipper_length_mm,y=body_mass_g))+
  geom_point()+ 
  #add trend line
  geom_smooth(method="lm") 

Let’s add a trend line together with the equation and the R2 value using the package ggpubr:

library(ggpubr) #package the facilitates the display of the equation

ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g))+
  geom_point()+ 
  geom_smooth(method="lm") + 
  # add equation use label.y to define the position
  stat_regline_equation(label.y = 5800, aes(label = ..eq.label..)) + 
  stat_regline_equation(label.y = 5600, aes(label = ..rr.label..))

Are there any differences between the species? Use color in aesthetics to color and group by species:

#regression equations will overlap, we will use faceting for them (below)
ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g, color=species))+
  geom_point()+ 
  geom_smooth(method="lm")

Besides of using different colors for the data points, we can also use different shapes:

ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g, color=species, shape=sex))+
  geom_point()

Geoms that draw points have a shape parameter. Legal shape values are the numbers 0 to 25, and the numbers 32 to 127.

  • Shapes 0 to 14 are outline only: use color to change colors (outline)

  • Shapes 15 to 20 are fill only: use color to change colors (fill)

-Shapes 21 to 25 are outline + fill: use color to change the outline color and fill to change the fill color

Shapes 32 to 127 represent the corresponding ASCII characters: We can also change the point size:

ggplot(penguins, aes(x=flipper_length_mm, y=bill_length_mm, color=species))+
  geom_point(aes(shape=sex, size=body_mass_g))

Challenge: Create a similar plot where flipper length is the x-axis and body mass is along the y-axis. Use a scatterplot where the shapes will all be triangles that all have a black outline and filled in color associated with each penguin species.

The Grammar of Graphics

In the exploration of the palmerpenguins data, we started with a simple plot and added to it. We added a linear model as a trend line, added model parameters to the plot, grouped the data by species, and changed things like point size and shape.

Starting with the first plot, we started by using the ggplot() function to create a ggplot object. As parameters, we told ggplot() we wanted to use penguins as data for the plot, and used the aes() function to define how we wanted to map the penguin data to the plot aesthetics. We mapped flipper_length_mm onto the x axis and body_mass_g onto the y axis.

Next, we have to tell ggplot how we want to display the data. Geoms take mapped data and make it visible on the plot. geom_point() is a geom that (you guessed it) plots points. Note that we’ve added our first layer to the plot by sending the object created by ggplot() to geom_point() using the +. Why not use the pipe (%>%)? Ggplot was developed before the magrittr pipe, so + it is. This has made a lot of people very angry and been widely regarded as a bad move2.

Layers are functions, so they take parameters that control what they do. For instance, when we used geom_point() as a layer to display available plot symbols above, we used this line:

geom_point(aes(shape = shape), size = 5, fill = 'red')

This uses aes() to map shape from the data to the shape displayed. size = 5 and fill = 'red' define the size and fill color of all points plotted. Assigning a constant to an aesthetic sets it for the entire geom, while mapping data to an aesthetic with aes() allows it to vary with the data mapped. More details on aesthetics specifications can be found here

Each additional layer adds something to the plot or modifies the plot defaults. We can add layers to add additional plots with the same or different data mapped to the plot aesthetics, modify the plot scales, change the coordinate system of the axes, change the theme of the plot, and break the plot into subplots by a categorical variable, which we’ll look at next.

Faceting

Faceting is the process that split the chart window in several small parts (a grid), and display a similar chart in each section. Each section usually shows the same graph for a specific group of the dataset. We will be working with facet_wrap():

ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g, color=species))+
  geom_point(aes(shape=sex))+
  # lay out panels  horizontally, split species, set the x axis free
  facet_wrap(~species, scales="free_x")+
  geom_smooth(method="lm", se=FALSE)+
  stat_regline_equation(label.y = 6000, aes(label = ..eq.label..)) + 
  stat_regline_equation(label.y = 5800, aes(label = ..rr.label..))

And facet_grid:

ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g, color=species))+
  geom_point(aes(shape=sex))+
  # lay out panels  horizontally by species and vertically by sex
  facet_grid(sex~species, scales="free_x")+
  geom_smooth(method="lm", se=FALSE)

Themes

There are built in-ggplot themes, as well as theme packages. There is a long list of cosmetic changes you can make with theme(). Let’s try changing themes in other type of plot, histograms using geom_histogram(). Let’s plot the distribution of the flipper length for each species. We will use Maria’s favourite theme, them_bw():

#use fill to color the different species. What would happen if you used color instead?
ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  #set the transparency at 0.6 in order to be able to observe overlap, use the position "identity" not to have the bins stacked 
  geom_histogram(alpha=0.8, position="identity")+
  #use theme bw
  theme_bw()

theme_void() is another build-in theme:

ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.8, position="identity")+
  theme_void()

As we will see below, themes can be modified.

Labels

Labels can be modified using labs(x = "Title on x axis", y = "Title on y axis")and theme(axis.title.x = element_text(family, face, colour, size), axis.title.y = element_text(family, face, colour, size)):

ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.8, position="identity")+
  theme_bw() +
  #set the labels for x and y axis
  labs(x = "Flipper length (mm)", y = "Counts")+
  #modify the the color, the face and the size of the label text
  theme(axis.title.x = element_text(color = "grey30", face = "bold", size = 14),
        axis.title.y = element_text(color = "grey30", face = "bold", size = 14))

Axis

The appearance of the text on the axis can be modified using theme(axis.text.x = element_text(family, face, colour, size), axis.text.y = element_text(family, face, colour, size)):

#modify axis font size
ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.8, position="identity")+theme_bw()+
  labs(x = "Flipper length (mm)", y = "Counts")+
  theme(axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        #change color and size of axis text
        axis.text.x = element_text(color = "grey30", size = 12),
        axis.text.y = element_text(color = "grey30", size = 12))

Modify x axis to add more tick points:

#modify axis font size
ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.8, position="identity")+theme_bw()+
  labs(x = "Flipper length (mm)", y = "Counts")+
  #add breaks
  scale_x_continuous(breaks=seq(170, 230,10))+
  #force the y axis to start at 0
  scale_y_continuous(expand = c(0,0))+
  theme(axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        axis.text.x = element_text(color = "grey30", size = 12),
        axis.text.y = element_text(color = "grey30", size = 12))

We can change the angle, and justification of the axis text:

#modify axis font size
ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.8, position="identity")+theme_bw()+
  labs(x = "Flipper length (mm)", y = "Counts")+
  #change the text angle on the x axis
  theme(axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        #rotate x text angle to 45
        axis.text.x = element_text(color = "black", size = 12, angle = 45),
        axis.text.y = element_text(color = "black", size = 12))

The horizontal or vertical justification, (hjust and vjust) can also be adjusted. This hjust and vjust argument can be best explained using this figure [Source from Stackoverflow]:

e.g.:

ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.8, position="identity")+
  theme_bw()+
  labs(x = "Flipper length (mm)", y = "Counts")+
  theme(axis.title.x = element_text(color = "grey30", face = "bold", size = 14),
        axis.title.y = element_text(color = "grey30", face = "bold", size = 14),
        axis.text.x = element_text(color = "black", size = 12, angle = 45, hjust = 1, vjust = 1),
        axis.text.y = element_text(color = "black", size = 12))

Legends

Legends can be modified inside theme using legend.title=element_text(family, face, size, color) and legend.text=element_text(family, face, size, color). The position of the legend can be modified using legend.position:

ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.8, position="identity")+
  theme_bw()+
  labs(x = "Flipper length (mm)", y = "Counts", fill="Species")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        #change appearance of legend title
        legend.title=element_text(color = "black", face = "bold", size=14),
        #change appearance of legend text
        legend.text=element_text(size=12), 
        legend.position="top")

It is also possible to position the legend inside the plotting area.The numeric position below is relative to the entire area, including titles and labels, not just the plotting area; where x,y is 0,0 (bottom left) to 1,1 (top right):

ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.8, position="identity")+theme_bw()+
  labs(x = "Flipper length (mm)", y = "Counts", fill="Species")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        legend.title=element_text(color = "black", face = "bold", size=14), 
        legend.text=element_text(size=12),
        #adjuct legend position
        legend.position=c(0.9, 0.85))

Challenge: Remake this plot, remove the legend and the x-axis labels.

Other modifications

Adjust the appearance of the facet text using strip.text and the background using strip.background:

ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g, color=species))+
  geom_point(aes(shape=sex))+
  facet_wrap(~species, scales="free_x")+
  geom_smooth(method="lm", se=FALSE)+
  stat_regline_equation(label.y = 6000, aes(label = ..eq.label..)) + 
  stat_regline_equation(label.y = 5800, aes(label = ..rr.label..))+
  labs(x = "Flipper length (mm)", y = "Body mass (g)", fill="Species", shape="Sex")+
  theme_bw()+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        legend.title=element_text(color = "black", face = "bold", size=14), 
        legend.text=element_text(size=12),
        #modify text
        strip.text.x = element_text(colour = "grey30", face = "bold", size=16),
        #modify background
        strip.background =element_rect(fill="white"))

To completely remove the background of the facet text you can use strip.background = element_blank():

ggplot(penguins, aes(x=flipper_length_mm, y=body_mass_g, color=species))+
  geom_point(aes(shape=sex))+
  facet_grid(island~species, scales="free_x")+
  geom_smooth(method="lm", se=FALSE)+
  stat_regline_equation(label.y = 6000, aes(label = ..eq.label..)) + 
  stat_regline_equation(label.y = 5600, aes(label = ..rr.label..))+
  labs(x = "Flipper length (mm)", y = "Body mass (g)", fill="Species", shape="Sex")+
  theme_bw()+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        legend.title=element_text(color = "black", face = "bold", size=14), 
        legend.text=element_text(size=12),
        #modify text
        strip.text = element_text(colour = "grey30", face = "bold", size=16),
        #modify background
        strip.background =element_blank())

Colors

There are several ways colors can be modified in ggplot2.

  • Manually with scale_color_manual() or scale_fill_manual() (can accept hex numbers or names):
ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.8, position="identity")+
  scale_fill_manual(values=c("orange" , "purple", "#69b3a2"))+
  theme_bw()+
  labs(x = "Flipper length (mm)", y = "Counts", fill="Species")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        legend.title=element_text(color = "black", face = "bold", size=14), 
        legend.text=element_text(size=12),
        legend.position=c(0.9, 0.85))

ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.8, position="identity")+
  scale_fill_hue(h = c(0, 90))+
  theme_bw()+
  labs(x = "Flipper length (mm)", y = "Counts", fill="Species")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        legend.title=element_text(color = "black", face = "bold", size=14), 
        legend.text=element_text(size=12),
        legend.position=c(0.9, 0.85))

  • Using packages that contain palettes e.g. RColorBrewer, viridis, or Paletteer:
library(viridis)
ggplot(penguins, aes(flipper_length_mm, fill=species)) + 
  geom_histogram(alpha=0.6, position="identity")+
  scale_fill_viridis(discrete=TRUE)+
  theme_bw()+
  labs(x = "Flipper length (mm)", y = "Counts", fill="Species")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        legend.title=element_text(color = "black", face = "bold", size=14), 
        legend.text=element_text(size=12),
        legend.position=c(0.9, 0.85))

Factor colors

# Insert custom colors with factoring:
species_order <- c("Adelie", "Chinstrap", "Gentoo")
species_color <- c("pink", "lightgreen", "grey")

# Set new column equal to factor of correct ORDER
penguins$SPECIES_ORDER <- factor(penguins$species, levels = species_order)

# Set order equal to names of colors
names(species_color) <- species_order

colnames(penguins) # New column has been added
[1] "species"           "island"            "bill_length_mm"   
[4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
[7] "sex"               "year"              "SPECIES_ORDER"    

Then modify the fill= in the ggplot code and add scale_fill_manual(values = ...)

ggplot(penguins, aes(flipper_length_mm, fill = SPECIES_ORDER)) +
  geom_histogram(alpha = 0.8, position = "identity") +
  scale_fill_manual(values = species_color) + 
  theme_bw() + 
  labs(x = "Flipper length (mm)",
       y = "Counts", fill = "Species") + 
  theme(axis.text.x = element_text(color = "black", size = 12), 
        axis.text.y = element_text(color = "black", size = 12), 
        axis.title.x = element_text(color = "black",face = "bold", size = 14), 
        axis.title.y = element_text(color = "black", face = "bold",size = 14), 
        legend.title = element_text(color = "black", face = "bold", size = 14),
        legend.text = element_text(size = 12), legend.position = c(0.9, 0.85))

An example of why this is important:

# If we take out one of the species, like Chinstrap, we want the colors to remain the same for the species. This way you can link colors throughout your whole analysis
penguins %>% 
  filter(species != "Chinstrap") %>% 
  ggplot(aes(flipper_length_mm, fill = SPECIES_ORDER)) +
  geom_histogram(alpha = 0.8, position = "identity") +
  scale_fill_manual(values = species_color) + 
  theme_bw() + 
  labs(x = "Flipper length (mm)",
       y = "Counts", fill = "Species") + 
  theme(axis.text.x = element_text(color = "black", size = 12), 
        axis.text.y = element_text(color = "black", size = 12), 
        axis.title.x = element_text(color = "black",face = "bold", size = 14), 
        axis.title.y = element_text(color = "black", face = "bold",size = 14), 
        legend.title = element_text(color = "black", face = "bold", size = 14),
        legend.text = element_text(size = 12), legend.position = c(0.9, 0.85))

# By removing Chinstrap - the green data was removed, and we kept the same color for Adelie and Gentoo.
# This is the same syntax for shapes

Barplots

A simple barplot can be created using the function geom_bar(). We will plot how many individuals from each species were recorded in each island:

ggplot(penguins,aes(x=island, fill=species))+
  geom_bar()+ 
  scale_fill_manual(values = c("orange" , "purple", "#69b3a2"))+
  theme_bw()+
  labs(x = "Island", y = "Number of individuals")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

If we want to produce an non-stacked barplot we need to use the argument position=position_dodge2 in geom_bar():

ggplot(penguins,aes(x=island, fill=species))+
  geom_bar(position=position_dodge2(preserve = "single"))+
  scale_fill_manual(values = c("orange" , "purple", "#69b3a2"))+
  theme_bw()+
  labs(x = "Island", y = "Number of individuals")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

Changing the position to fill, will give us the relative abundance in a stacked plot:

ggplot(penguins,aes(x=island, fill=species))+
  geom_bar(position="fill")+ 
  scale_fill_manual(values = c("orange" , "purple", "#69b3a2"))+
  theme_bw()+
  labs(x = "Island", y = "Relative abundance")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

We can aslo create the same plot horizontally with coord_flip():

ggplot(penguins,aes(x=island, fill=species))+
  geom_bar(position="fill")+ 
  scale_fill_manual(values = c("orange" , "purple", "#69b3a2"))+
  theme_bw()+
  labs(x = "Island", y = "Relative abundance")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))+
  coord_flip()

For axis scales adjustments and transformations, please see these examples.

Boxplots

Let visualize the distribution of flipper length in the different species and sexes using the function geom_boxplot() to create a box and whiskers plot:

ggplot(penguins, aes(x=species, y=flipper_length_mm, fill=sex))+
  geom_boxplot()+
  theme_bw()+
  labs(x = "Species", y = "Flipper length (mm)")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

Let’s omit the NA values during plotting and reduce the outlier size:

ggplot(na.omit(penguins), aes(x=species, y=flipper_length_mm, fill=sex))+
  geom_boxplot(outlier.size = 1)+ 
  theme_bw()+
  labs(x = "Species", y = "Flipper length (mm)")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

You can add the individual points using geom_jitter(). We will use position_jitterdodge() to align the points with dodged the boxplots:

ggplot(na.omit(penguins), aes(x=species, y=flipper_length_mm, fill=sex))+
  geom_boxplot()+
  geom_point(position = position_jitterdodge(), size=0.4, alpha=0.9) + 
  theme_bw()+
  labs(x = "Species", y = "Flipper length (mm)")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

Violin plots

We can visualize the same distribution using the function geom_violin() to create a violin plot, a mirrored density plot displayed in the same way as a boxplot:

ggplot(na.omit(penguins), aes(x=species, y=flipper_length_mm, fill=sex))+
  geom_violin()+ 
  theme_bw()+
  labs(x = "Species", y = "Flipper length (mm)")+
  theme(axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

We also overlay different types of plots:

ggplot(na.omit(penguins), aes(x = species, y = flipper_length_mm, fill = sex)) +
  geom_violin() + 
  geom_boxplot(position = position_dodge(width = 0.9), width = 0.2) +
  theme_bw() + 
  labs(x = "Species", y = "Flipper length (mm)") + 
  theme(axis.text.x = element_text(color = "black", size = 12), 
        axis.text.y = element_text(color = "black", size = 12), 
        axis.title.x = element_text(color = "black", size = 14), 
        axis.title.y = element_text(color = "black", face = "bold", size = 14),
        #remove internal grid
        panel.grid = element_blank())

Combining plots

Lets set 2 plots equal to R objects and combine them:

# Horizontal bar:
horizontal_bar <- ggplot(penguins, aes(x = island, fill = species)) + 
  geom_bar(position = "fill") +
  scale_fill_manual(values = c("orange" , "purple", "#69b3a2")) + 
  theme_bw() + labs(x = "Island", y = "Relative abundance") + 
  theme(axis.text.x = element_text(color = "black", size = 12), 
        axis.text.y = element_text(color = "black", size = 12), 
        axis.title.x = element_text(color = "black", face = "bold", size = 14), 
        axis.title.y = element_text(color = "black", face = "bold", size = 14)) + 
  coord_flip()

# Violin plot (slightly different than above)
violin_mod<-ggplot(na.omit(penguins), aes(x=sex, y=flipper_length_mm, fill=species))+
  geom_violin()+
  geom_boxplot(position=position_dodge(width=0.9), width=0.2) +
  scale_fill_manual(values = c("orange" , "purple", "#69b3a2"))+
  theme_bw()+
  labs(x = "Sex", y = "Flipper length (mm)")+
  theme(legend.position = "none",
        axis.text.x = element_text(color = "black", size = 12),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "black", face = "bold", size = 14),
        axis.title.y = element_text(color = "black", face = "bold", size = 14))

And combine them:

library(cowplot)
library(patchwork)
cowplot::plot_grid(horizontal_bar,
          violin_mod,
          ncol = 1)

# Patchwork
horizontal_bar + violin_mod + patchwork::plot_layout(ncol = 1)

Challenge: add labels (“A” and “B”) to cowplot function

Saving plots

ggsave() is a function for saving the last plot displayed. It guesses the type of graphics device from the extension.

#make directory called "plots"
dir.create("plots", showWarnings = F)

#and save the last plot as png, adjust width and height
ggsave("plots/combined.png", width = 15, height =10, units = "cm")
#save the last plot as svg, adjust width and height
ggsave("plots/combined.svg", width = 15, height =10, units = "cm")

Plots can also be saved with print()

p <- horizontal_bar + violin_mod + patchwork::plot_layout(ncol = 1)
png("plots/combined_print.png", width=1800,height=1600,res=300)
print(p)
dev.off()

Plot interactively with plotly or dygraphs

Dataset

The following material is from the “Reproducible Reporting with R (R3) for marine ecological indicators” Webminar designed and instructed by Ben Best, who gracefully allowed us to use it.

Get URL to CSV

Visit the ERDDAP server https://oceanview.pfeg.noaa.gov/erddap and do a Full Text Search for Datasets using “cciea” in the text box before clicking Search. These are all the California Current IEA datasets. From the listing of datasets, click on data for the “CCIEA Anthropogenic Drivers” dataset. Note the filtering options for time and other variables like consumption_fish (Millions of metric tons) and cps_landings_coastwide (1000s metric tons). Set the time filter from being only the most recent time to the entire range of available time. Scroll to the bottom and Submit with the default .htmlTable view. You get an web table of the data. Notice the many missing values in earlier years. Go back in your browser to change the the File type to .csv. Now instead of clicking Submit, click on Just generate the URL. Although the generated URL lists all variables to include, the default is to do that, so we can strip off everything after the .csv, starting with the query parameters ? .

Download CSV

Let’s use this URL to download a new file

# set variables
csv_url  <- "https://oceanview.pfeg.noaa.gov/erddap/tabledap/cciea_AC.csv"
# if ERDDAP server down (Error in download.file) with URL above, use this:
#    csv_url <- "https://raw.githubusercontent.com/microbiaki/workshop_t2/main/data/cciea_AC.csv"
dir_data <- "data"
# derived variables
csv <- file.path(dir_data, basename(csv_url))
# create directory
dir.create(dir_data, showWarnings = F)
# download file
if (!file.exists(csv))
  download.file(csv_url, csv)

Read table

Now open the file by going into the Files RStudio pane, More -> Show Folder in New Window. Then double click on data/cciea_AC.csv to open in your Spreadsheet program (like Microsoft Excel or Apple Pages or LibreOffice Calc).

# attempt to read csv
d <- read.csv(csv)
# show the data frame
head(d)

Note how the presence of the 2nd line with units makes the values character <chr> data type. But we want numeric values. So we could manually delete that second line of units or look at the help documentation for this function (?read.csv in Console pane; or F1 key with cursor on the function in the code editor pane). Notice the skip argument, which we can implement like so:

# read csv by skipping first two lines, so no header
d <- read.csv(csv, skip = 2, header = FALSE)

# update data frame to original column names
names(d) <- names(read.csv(csv))

#fix year
d$time<-sub("-.+", "", d$time) 

d$time<-as.integer(d$time)
# update for future reuse (NEW!)
write.csv(d, csv, row.names = F)
datatable(d)

Series line plot

Next, let’s also show the other regional values (CA, OR and WA; not coastwide) in the plot as a series with different colors aes(color = region). To do this, we’ll want to tidy the data into long format so we can have a column for total_fisheries_revenue and another region column to supply as the group and color aesthetics based on aesthetics we see are available for geom_line():

d_rgn <- d %>% 
  # select columns
  select(
    time, 
    starts_with("total_fisheries_revenue")) %>% 
  # exclude column
  select(-total_fisheries_revenue_coastwide) %>% 
  # pivot longer
  pivot_longer(-time) %>% 
  # mutate region by stripping other
  mutate(region = name %>% 
      str_replace("total_fisheries_revenue_", "") %>% 
      str_to_upper()) %>% 
  # filter for not NA
  filter(!is.na(value)) %>% 
  # select columns
  select(time, region, value)
  
# create plot object
p_rgn <- ggplot(
  d_rgn,
  # aesthetics
  aes(x= time, y = value, group = region, color = region))+
    theme_bw()+
    labs(x = "Year", y = "Millions $")+
    theme(axis.text.x = element_text(color = "black", size = 12, hjust = 1, vjust = 1),
        axis.text.y = element_text(color = "black", size = 12),
        axis.title.x = element_text(color = "grey30", face = "bold", size = 14),
        axis.title.y = element_text(color = "grey30", face = "bold", size = 14))+
  # geometry
  geom_line()
# show plot
p_rgn

Make interactive ggplots with ggplotly()

When rendering to HTML, you can render most ggplot objects interactively with ggplotly(). The plotly library is an R htmlwidget providing simple R functions to render interactive JavaScript visualizations.

library(plotly)
ggplotly(p_rgn)

More information on plotly can be found on “Improving plotly”.

Create interactive time series with dygraph()

Another htmlwidget plotting library written more specifically for time series data is dygraphs. Unlike the ggplot2 data input, a series is expected in wide (not tidy long) format. So we use tidyr’s pivot_wider() first.

library(dygraphs)
d_rgn_wide <- d_rgn %>% 
  rename(Year = time) %>%
  pivot_wider(names_from  = region,values_from = value)
datatable(d_rgn_wide)
d_rgn_wide %>% 
  dygraph() %>% 
  dyRangeSelector()
d_rgn_wide %>%
  dygraph() %>%
  dyOptions(stackedGraph=TRUE)%>%
dyRangeSelector()

The example above is rather simple since we used a table that contained time as a numeric variable. If time is a date variable the process is a bit more involved. Information on how to work with dates and how to to get your data at the date format can be found here.


  1. Leland Wilkinson. The Grammar of Graphics (Statistics and Computing) 2nd Edition↩︎

  2. Douglas Adams, Hitchhiker’s Guide to the Galaxy↩︎