Finding doublets

Contributors to the code: Liesbet Martens.

Below you can find a tutorial and R code on how to use the DoubletFinder tool to find doublets (droplets with more than 1 cell) in your data. It works on any data set.

In the topic on cell hashing you find a description on how to remove doublets with cells that originate from different samples (and thus contain different hash tags). Obviously this is only possible for HTO data (when you used hash tags to label the cells of each sample). This removal is typically done in combination with DoubletFinder because the hash tag-based approach cannot identify doublets of cells from the same sample.

A third alternative is identifying doublets based on genetic polymorphisms (SNPs) using demuxlet or freemuxlet. This will only work if you have sequenced cells from different individuals.

Freemuxlet identifies a number of SNPs in the reads and tries to deduce a representative set of SNPs for each individual. If it finds cells with SNPs from more than one individual, that cell is considered a doublet.

Demuxlet does the same but it assumes that you have genotyped the individuals prior to the sc experiment and thus already possess a set of SNPs for each individual. Freemuxlet only needs a set of general SNPs that occur in the population. Both are command line tools and will not be discussed further.