apply() as a substitute for for-loops in R
apply() is one of my favorite functions in R. It not only allows to easily calculate statistics on the rows of tables but it allows to apply any function (that works on vectors) on either the rows or the columns of a table.
Write multiple files with lapply()
To automatically save multiple variables use lapply() in combination with seq_along().
As an example we use the data of Exercise 27B (Volcano plot) in the tutorial. It consists of a set of DE genes from a bulk RNASeq experiment. I split the data into a list with:
- a data frame with the upregulated genes (log fold change > 0)
- a data frame with the downregulated genes (lfc < 0)
L <- split(DE,DE$log2FoldChange > 0)
Now I want to write both data frames to a file. I use lapply() to repeat the write function on both objects in the list. However, the write function expects a file name. If I specify a name, it will use the same name for both files and the first data frame will be overwritten. I need to specify a different name for each file.
The easiest solution is to give the file the same name as the object in the list. This means I need to access the names of the objects in the list. The only way I can achieve this is to loop over the indices in the list, instead of looping over the objects as you normaly do in lapply(). The function seq-along() allows to do just this: it retrieves the indices of the objects in the list:
lapply(seq_along(L),function(i) write.table(L[[i]], file=paste0(names(L)[[i]],".txt"), quote=FALSE, col.names=FALSE, row.names=FALSE))
You can then use the index i to retrieve both the name of the ith object (via names(L)[]i]) and the object itself (via L[[i]]).
Save multiple plots with lapply()
As an example we use data of Exercise 41E (exercise on correlations) in the tutorial.
In this example we have two data frames:
- pval
- dist
I want to make a scatter plot between each column of pval and each column of dist and I want to save these scatter plots to .png files. First watch the lesson on making graphs with ggplot2 if you have no experience with plotting in R.
First I create the plots and save them to a list. On each column (MARGIN=2) of pval I apply a function that makes a facet (via facet_wrap()) of scatter plots (via geom_point()). It makes a facet with a scatter plot for each column in dist. To be able to use a facet I need the data in long format (via melt()).
Before the melt, I add the column from pval that was retrieved by apply() to the dist data frame (via cbind()). The argument id.vars=1 ensures that the first column of the combined data frame (=the column from pval) is not melted along.
The result is a list with faceted plots (one for each column of pval)
L <- apply(pval,2, function(x) ggplot(melt(cbind(x,dist), id.vars=1),aes(x,value)) + geom_point() + facet_wrap(~variable))
Then I save the plots to my computer. I use the same strategy as described above. On the list I use lapply() but instead of looping over the objects, seq_along() loops over the indices in the list. This allows to retrieve the name of each object (via names(L)) and use that name as the name of the .png file that is created by ggsave()
lapply(seq_along(L), function(i) ggsave(file=paste0(names(L)[[i]],".png"),plot=L[[i]]))