Finding Candidate Reference Genes in the Free Version of Genevestigator

Now we will make a more elaborate exercise on finding candidate reference genes. We will do the analysis in the free version of RefGenes but the analysis in the commercial version is very similar. Suppose we want to compare the expression stability of the 4 commonly used reference genes for qPCR on mouse liver samples (ACTB, GAPDH, HPRT and TUBB4B) to that of 4 reference genes that are suggested by Genevestigator. To this end we open the RefGenes tool and select the liver samples of the mouse 430_2 arrays.

Check the expression stability of the 4 commonly used reference genes?

  • Click the New button in the Gene Selection panel to create a new selection. The selection of samples defines which data are used for the analysis.
  • Enter the name of your target gene in the text area (for example: ACTB) and click OK.

When you are using the commercial version, you may enter multiple genes at the same time, in the free version you have to enter them one by one. This means that you have to add the first gene as described above and then add the next gene by clicking the Add button and so on …

Finally you end up with an expandable list of the genes you asked for and you can tick or untick them to control the display of their expression data in the main window. When you tick the 4 commonly used reference genes you can see how stable they are expressed in the 651 mouse liver samples that are stored in Genevestigator:

As you can see, the expression levels of the commonly used reference genes in the selected mouse liver samples is pretty variable which is also confirmed by their relatively high SD values. Often there are multiple probe sets for the same gene. When you use the free version you may only choose one probe set per gene so you have to make a choice. How to make that choice? Affymetrix probe set IDs have a certain meaning: what comes after the underscore tells you something about the quality of the probes:

  • _at means that all the probes of the probe set hit one known transcript. This is what you want: probes specifically targeting one transcript of one gene
  • _a_at means that all the probes in the probe set hit alternate transcripts from the same gene. This is still ok the probes bind to multiple transcripts but at least the transcripts come from the same gene (splice variants)
  • _x_at means that some of the probes hit transcripts from different genes. This is still not what you want: the expression level is based on a combination of signals of all the probes in a probe set so also probes that cross-hybridize
  • _s_at means that all the probes in the probe set hit transcripts from different genes. This is definitely not what you want: if the probes bind to multiple genes you have no idea whose expression you have measured on the array

So I always ignore probe sets with s or x. If you have two specific probe sets for a gene, they should more or less give similar signals. If this is not the case, I base my choice upon the expression level that I expect for that gene based on previous qPCR results.

As you can see, each of these 4 commonly used reference genes has a high expression level. Most genes do not have such high expression levels. In most qPCR experiments your genes of interest will have low or medium expression levels, so these reference genes will not be representative for the genes of interest.

Reference genes should ideally have similar expression levels as the genes of interest. Therefore, we will select the four most stably expressed genes with a medium expression level (between 8 and 12) according to the RefGenes tool.

Select the 4 most stably expressed candidate reference gene with medium expression levels

  • Untick all target genes.
  • Click the Run button at the top of the main window and check if the range is set correctly.

Select the 4 candidates with the lowest SD: Then, we performed qPCR on a representative set of 16 of our liver samples to measure the expression of these 8 candidate reference genes and analyzed the data (See how to select the best reference genes using geNorm in qbase+).