Randomness in R
Randomness is a double edged sword: sometimes you want it, sometimes you want to control it. Many processes in R include an element of randomness:
- performing random picks from a collection of values
- dimensionality reductions like PCA, t-SNE and UMAP
- clustering
Because these processes involve random choices, running the same code twice can yield slightly different results.
Ensuring reproducibility
To ensure that random processes behave consistently in R R allows you to fix the state of the random number generator:
set.seed(n)
As long as you use the same seed value n, R will generate the same sequence of random numbers, and thus the same results, every time the code is executed.
Generating reproducible random data
To create a reproducible random data set you first set a seed and then use a function that picks random numbers like:
set.seed(42)
x <- rnorm(10) # 10 values from a normal distribution
y <- runif(10) # 10 values from a uniform distribution
z <- sample(1:20,10) # 10 integers between 1 and 20