Adding regression lines to a plot

You can add regression lines to your plots via geom_smooth().

Regression lines for different groups

You can add one regression line per group of data by setting the color argument equal to the column that defines the groups.

Linear and nonlinear regression methods

The method argument allows to define what regression model to use:

  • lm: linear model, fits a straight line (assuming the data are normal) or can be used to fit a curve with a predefined shape
  • glm: generalised linear model, fits a straight line without assuming the data are normal
  • loess: fits a curve that follows the data

The formula of the regression model

The formula argument allows to specify the formula of the model:

  • y ~ x: model the Y values according to the X values using a straight line
  • y ~ poly(x,2): model the Y values using a polynomial model of the second degree (= a parabola)

How does regression work?

Regression fits a line to your data, the line that best fits your data, i.e. the line that generates the smallest residuals. Check out the video for a detailed explanation on how it works:

Add the equation of the regression line

To add the equation to the plot you need the ggpmisc package, a package with extra features for ggplots:

library(ggpmisc)

The function to add the equation is stat_poly_eq(). It generates a label with the equation and/or the coefficient of determination (R square). You add it as a layer to your ggplot:

ggplot(...) + geom_point() + geom_smooth(...) + stat_poly_eq(use_label(c("eq","R2")))

The R square value is a measure for how well the line fits the data. If it’s close to 0, the line doesn’t fit well.

For the glm method you can use exactly the same approach. Suppose you have counts data (discrete integers with a lot of low values and a few high ones). In this case, you should fit a poisson or negative binomial model to the data:

p <- p + geom_smooth(method="glm", method.args=list(family=poisson(link="log")))

Also here we can add the equation using the stat_poly_eq() function:

p + stat_poly_eq(use_label(c("eq","R2")))

More info (examples of how to use it and extra features).