Create a new Experiment called GeNormHuman in Project1
You can find the details on how to create a new experiment in Creating a project and an experiment
Import Run6 . This file is in qBase format.
You can find the details on how to start the data import in Loading data into qbase+. Unlike the previous exercise, qbase+ does not allow you to do a quick import this time. In the Import Run
window Manual import is selected:
Make sure that Upload file to Biogazelle support for further analysis
is NOT selected and click Next
. Select the correct File type (qBase) and click Finish
. This file contains the data of the geNorm pilot experiment. In the pilot experiment, 10 candidate reference genes were measured in 20 representative samples.
In this experiment we want to select the ideal reference genes for our next experiments so we choose selection of reference genes (geNorm)
You can find the details on how to check the quality of the replicates in the Checking the quality of technical replicates and controls section of Analyzing gene expression data in qbase+
All replicates and controls have met the quality criteria so there’s no need to inspect them further.
You can find the details on how to select the Amplification effciencies strategy in the Taking into account amplification efficiencies section of Analyzing gene expression data in qbase+. We haven’t included dilution series nor do we have data from previous qPCR experiments regarding the amplification efficiencies so we choose to use the same efficiency (E=2) for all genes.
It is of course better to include a dilution series for each gene to have an idea of the amplification efficiencies of each primer pair.
You can convert all the genes simultaneously by selecting Use all targets as candidate reference genes.
Click Finish
.
Upon clicking Finish
, the geNorm window containing the analysis results is automatically opened. The geNorm window consists of three tabs. The tabs are located at the bottom of the window: geNorm M, geNorm V and Interpretation. The first tab, geNorm M, shows a ranking of candidate genes according to their stability, expressed in M values, from the most unstable genes at the left (highest M value) to the best reference genes at the right (lowest M value): The second tab, geNorm V, shows a bar chart that helps determining the optimal number of reference genes to be used in subsequent analyses:
The number of reference genes is a trade-off between practical considerations and accuracy. It is a waste of resources to quantify more genes than necessary if all candidate reference genes are relatively stably expressed and if normalization factors do not significantly change when more genes are included. However, Biogazelle recommends the minimal use of the 3 most stable candidate reference genes and stepwise inclusion of more reference genes until the next gene has no significant contribution to the normalization factors. To determine the need of including more than 3 genes for normalization, pairwise variations Vn/n+1 are calculated between two sequential normalization factors. Simply stated: V is measure of the added value of adding a next reference gene to the analysis. A large variation means that the added gene has a significant effect and should be included. In normal experiments like the Gene expression experiment, see Analyzing_gene_expression_data_in_qbase+, we only have 3 reference genes so we will see only 1 bar here. But in this geNorm pilot experiment, we analyzed 10 candidate reference genes, so we see 8 bars. All pairwise variations are very low, so even the inclusion of a third gene has no significant effect. Based on a preliminary experiment that was done by Biogazelle, 0.15 is taken as a cut-off value for V, below which the inclusion of an additional reference gene is not required. Normally this threshold is indicated by a green line on the geNorm V bar chart. However since all V-values fall below the threshold in this geNorm pilot experiment, you don’t see this line on the bar chart. So, these results mean that for all subsequent experiments on these samples, two reference genes, HPRT1 and GADP, would be sufficient. However, as stated before, Biogazelle recommends to always include at least three reference genes in case something goes wrong with one of the reference genes (so also include YHWAZ).
In this example we will analyze data from an artificial expression study containing the following samples:
In this study, the expression of the following genes was measured:
In general, the lower the expression level, the more noisy the qPCR results will become. For each of the genes of interest we have included a run in which a 2-fold difference in expression between control and treated samples was created (Low1, Medium1 and HighVar1) and a run with a 4-fold difference in expression (Low2, Medium2 and HighVar2). There are three technical replicates per reaction. In a second experiment we used the reference genes that were obtained via Genevestigator and that proved to be more stably expressed in mouse liver samples than the commonly used references. The data can be found in the NormGenes folder on the BITS laptops or can be downloaded: from our website.
Create a new Experiment called NormGenes1 in Project1
You can find the details on how to create a new experiment in Creating a project and an experiment
Import Run1 to Run5. These files are in qBase format.
You can find the details on how to import the data file in the Loading the data into qbase+ section of Analyzing data from a geNorm pilot experiment in qbase+.
We are going to compare expression in treated versus untreated samples so we need to tell qbase+ which samples are treated and which not. To this end, we have constructed a sample properties file in Excel containing the grouping annotation as a custom property called Treatment
.
You can find the details on how to import the data file in the Adding annotation to the data section of Loading data into qbase+.
So as you can see we have 6 treated and 6 untreated samples and we have measured the expression of the 4 commonly used reference genes and 6 genes of interest:
You don’t have data of serial dilutions of representative template to build standard curves so the only choice you have is to use the default amplification efficiency (E = 2) for all the genes.
Appoint the reference genes. You can find the details on how to appoint reference targets in the Normalization section of Analyzing gene expression data in qbase+. ACTB, GAPDH, HPRT and TUBB4B are the reference genes.
The M and CV values of the reference genes are shown in green so the stability of the reference genes is ok.
Since you have a treated and a control group, it seems logical to use the average of the control group for scaling.
You can find the details on how to specify the scaling strategy in the Scaling section of Analyzing gene expression data in qbase+ Look at the target bar charts.
In the Grouping
section at the bottom of the chart you can select Plot group average
: Now do exactly the same for the second experiment with the same genes of interest but with other reference genes. This means that you have to return to the Analysis wizard. To this end, click the Launch wizard
button a the top of the page.
You can find the details on how to create a new experiment in Creating a project and an experiment.
Import Run5 to Run9. These files are in qBase format.
You can find the details on how to import the data file in the Loading the data into qbase+ section of Analyzing data from a geNorm pilot experiment in qbase+.
Import the Sample Properties file.
You can find the details on how to import the data file in the Adding annotation to the data section of Loading data into qbase+. Select to import the custom property.
So as you can see we have 6 treated and 6 untreated samples and we have measured the expression of the 4 new reference genes and 6 genes of interest.
Appoint the reference genes
You can find the details on how to appoint reference targets in the Normalization section of Analyzing gene expression data in qbase+.
The M and CV values of the reference genes are shown in green so the stability of the reference genes is ok.
As you can see the M and CV values of these reference genes is much lower than these of the 4 commonly used reference genes pointing to the fact that genes are more stably expressed. It’s not that the commonly used reference genes are bad references. Then qbase+ would not display them in green. It’s just that the other reference genes are more stable. But this can have a big impact on the results of your analysis…
Use the average of the control group for scaling.
You can find the details on how to specify the scaling strategy in the Scaling section of Analyzing gene expression data in qbase+
Plot the average expression level of each group. Now we will compare the target bar charts of the second and the first experiment to assess the influence of the stability of the reference targets on the analysis results.
You can display the bar charts next to each other by clicking the tab of the bar chart of the second experiment. Drag the tab to the right while you hold down the mouse button until you see and arrow at the right side of the qbase+ window and a dark grey box in the right half of qbase+ window. Release the mouse button when you see the arrow and the box. Now the two bar charts should be next to each other. Some laptop screens are too small to nicely display the two bar charts next to other. If this is the case switch to full screen mode by double clicking the tab of the first experiment.
Now you can compare the expression of each gene in the first and in the second experiment.
When we do this for HighVar1 for instance, you see that the average expression levels of both groups are the same in the first and the second experiment (check the scales of the Y—axis!). Both experiments detect the two-fold difference in expression level between the groups. However, the error bars are much larger in the first experiment than in the second. The variability of the reference genes does have a strong influence on the errors and the size of the error bars will influence the outcome of the statistical test to determine if a gene is differentially expressed or not. The larger the error bars the smaller the less likely it is that the test will say that the groups differ.
Remember that the error bars represent 95% confidence intervals:
Check out the results of HighVar2. Here, you clearly see the influence of the reference genes. Again, the fourfold difference in expression is detected by both experiments but:
This means that in experiment 2, a statistical test will probably declare that HighVar2 is differentially expressed while in experiment 1 this will not be the case. We will test this assumption by performing a statistical test.
You can find full details on statistical analyses in qbase+ in the statistical analysis section of analyzing gene expression data in qbase+. In brief, you need to perform the following steps:
Open the Statistical wizard
The goal of this analysis is to compare the mean expression levels of our genes of interest in treated and untreated samples
Use the Treatment property to identify treated and untreated samples
Analyze all genes of interest
Use the default settings to perform the non-parametric Mann-Whitney test
As you can see, none of the genes is considered DE by the very conservative non-parametric test. Additionally most genes have the same p-value. That’s normal when you don’t have many replicates. In our case, we have 6 replicates. Non-parametric tests are based on a ranking of the data values and there are not so many ways to rank 6 data points. This is why you see the same p-values for many genes. As said before, the non-parametric test is very stringent. If the data do come from a normal distribution, the test will generate false positives. Some of the genes might have have been labeled not DE while in fact they are DE so you might have missed some differential expression. The choice of statistical test with 6 biological replicates depends on what you prefer: false negatives or false positives. Most people will choose false negatives since they don’t want to invest time and money in research on a genes that was labeled DE while in fact it is not DE.
Suppose I don’t mind false positives but I don’t want to miss any potential DE genes. In that case, it’s better to go for a t-test. Let’s repeat the test now choosing a parametric t-test.
You can find full details on statistical analyses in qbase+ in the statistical analysis section of analyzing gene expression data in qbase+. In brief, you need to perform the following steps: Open the Statistical wizard The goal of this analysis is to compare the mean expression levels of our genes of interest in treated and untreated samples Use the Treatment property to identify treated and untreated samples Analyze all genes of interest Describe the data set as log-normally distributed
Still none of the genes is considered DE but you do see that the p-values of the t-test are lower than these of the Mann-Whitney test.
You can find full details on statistical analyses in qbase+ in the statistical analysis section of analyzing gene expression data in qbase+. In brief, you need to perform the following steps:
Open the Statistical wizard.
The goal of this analysis is to compare the mean expression levels of our genes of interest in treated and untreated samples Use the Treatment property to identify treated and untreated samples Analyze all genes of interest Use default settings
Now you see that 4 out of the 6 genes are considered DE. This is also what we expected since 3 of our genes of interest have a 4-fold difference in expression level between the two groups. It’s understandable that it’s hard to detect 2-fold differences in expression especially when the expression of the gene is somewhat variable as is the case for Low1 and HighVar1 but a 4-fold difference is a difference that you would like to detect.
You can find full details on statistical analyses in qbase+ in the statistical analysis section of analyzing gene expression data in qbase+. In brief, you need to perform the following steps:
Open the Statistical wizard.
The goal of this analysis is to compare the mean expression levels of our genes of interest in treated and untreated samples Use the Treatment
property to identify treated and untreated samples Analyze all genes of interest Describe the data as log normally distributed
Again the t-test generates lower p-values than the Mann-Whitney test but realize that choosing the t-test when the data is not normally distributed will generate false positives!