Randomness in R

Randomness is a double edged sword: sometimes you want it, sometimes you want to control it. Many processes in R include an element of randomness:

  • performing random picks from a collection of values
  • dimensionality reductions like PCA, t-SNE and UMAP
  • clustering

Because these processes involve random choices, running the same code twice can yield slightly different results.

Ensuring reproducibility

To ensure that random processes behave consistently in R R allows you to fix the state of the random number generator:

set.seed(n) 

As long as you use the same seed value n, R will generate the same sequence of random numbers, and thus the same results, every time the code is executed.

Generating reproducible random data

To create a reproducible random data set you first set a seed and then use a function that picks random numbers like:

set.seed(42)
x <- rnorm(10)        # 10 values from a normal distribution
y <- runif(10)        # 10 values from a uniform distribution
z <- sample(1:20,10)  # 10 integers between 1 and 20