Statistical Analysis
Once you generate target bar charts you leave the Analysis wizard and you go to the regular qbase+ interface. Suppose that you want to perform a statistical test to prove that the difference in expression that you see in the target chart is significant. At some point, qbase+ will ask you if your data is coming from a normal distribution. If you don’t know, you can select I don’t know
and qbase+ will assume the data are not coming from a normal distribution and perform a stringent non-parametric test. However, when you have 7 or more replicates per group, you can check if the data is normally distributed using a statistical test. If it is, qbase+ will perform a regular t-test. The upside is that the t-test is less stringent than the non-parametric tests and will find more DE genes. However, you may only perform it on normally distributed data. If you perform the t-test on data that is not normally distributed you will generate false positives i.e. qbase+ will say that genes are DE while in fact they are not. Performing a non-parametric test on normally distributed data will generate false negatives i.e. you will miss DE genes.
Checking if the data is normally distributed can be easily done in GraphPad Prism. To this end you have to export the data.
How to export the data?
To export the results click the upward pointing arrow in the qbase+ toolbar: You want to export the normalized data so select Export Result Table (CNRQ)
: You will be given the choice to export results only (CNRQs) or to include the errors (standard error of the mean) as well . We don’t need the errors in Prism so we do not select this option. The scale of the Result table can be linear or logarithmic (base 10) . Without user intervention, qbase+ will automatically log10 transform the CNRQs prior to doing statistics. So we need to check in Prism if the log transformed data are normally distributed. Additionally, you need to tell qbase+ where to store the file containing the exported data. Click the Browse
button for this .
Exporting will generate an Excel file in the location that you specified. However, the file contains the results for all samples and we need to check the two groups (treated and untreated) separately. The sample properties show that the even samples belong to the treated group and the odd samples to the untreated group. This means we have to generate two files:
- a file containing the data of the untreated samples
- a file containing the data of the treated samples
Now we can open these files in Prism to check if the data is normally distributed.
How to import the data of the untreated samples in Prism?
- Open Prism
- Expand
File
in the top menu - Select
New
- Click
New Project File
- In the left menu select to create a Column table. Data representing different groups (in our case measurements for different genes) should always be loaded into a column table.
- Select
Enter replicate values, stacked into columns
(this is normally the default selection) since the replicates (measurements for the same gene) are stacked in the columns. - Click
Create
Prism has now created a table to hold the data of the untreated samples but at this point the table is still empty. To load the data:
- Expand
File
in the top menu - Select
Import
- Browse to the resultslog.csv file, select it and click
Open
- In the
Source
tab selectInsert data only
- Since this is a European csv file commas are used as decimal separators so in contrast to what its name might imply, semicolons and not commas are used to separate the columns in the csv file (you can open the file in a text editor to take a look). In American csv files dots are used as decimal separator and the comma is used to separate the columns. Prism doesn’t know the format of your csv file so you have to tell him the role of the comma in your file. Select
Separate decimals
- Go to the
Filter
tab and specify the rows you want to import (the last rows are these of the standard and the water samples, you don’t want to include them) - Click
Import
As the file is opened in Prism you see that the first column containing the sample names is treated as a data column. Right click the header of the first column and select Delete
How to check if the data of the untreated samples comes from a normal distribution?
- Click the
Analyze
button in the top menu - Select to do the
Column statistics
analysis in theColumn analyses
section of the left menu - In the right menu, deselect
Flexible
. It’s a bad reference gene so you will not include it in the qbase+ analysis so there’s no point checking its normality (it is probably not normally distributed). In that respect you could also deselect the other two reference genes since you will do the DE test on the target genes and not on the reference genes. - Click
OK
- In the
Descriptive statistics
and theConfidence intervals
section deselect everything exceptMean, SD, SEM
. These statistics is not what we are interested in: we want to know if the data comes from a normal distribution. The only reason we select Mean, SD, SEM is because if we make no selection here Prism throws an error. - In the
Test if the values come from a Gaussian distribution
section select theD’agostino-Pearson omnibus test
to test if the data are drawn from a normal distribution. Although Prism offers three tests for this, the D’Agostino-Pearson test is the safest option. - Click
OK
Prism now generates a table to hold the results of the statistical analysis: As you can see, the data for Palm are not normally distributed.
Since we found that there’s one group of data that does not follow a normal distribution, it’s no longer necessary to check if the treated data are normally distributed but you can do it if you want to. We will now proceed with the statistical analysis in qbase+. Statistical analyses can be performed via the Statistics wizard.
How to open the Statistics wizard?
You can open it in the Project Explorer (window at the left):
- expand
Project1
if it’s not yet expanded - expand the
Experiments
folder in the project if it’s not yet expanded - expand the
GeneExpression
experiment if it’s not yet expanded - expand the
Analysis
section if it’s not yet expanded - expand the
Statistics
section - double click
Stat wizard
This opens the Statistics wizard that allows you to perform various kinds of statistical analyses.
Which kind of analysis are you going to do?
On the Goal
page: Select Mean comparison
since you want to compare expression between two groups of samples so what you want to do is comparing the mean expression of each gene in the treated samples with its mean expression level in the untreated samples. Click Next
.
How to define the groups that you are going to compare?
On the Groups
page: specify how to define the two groups of samples that you want to compare. Select Treatment
as the grouping variable to compare treated and untreated samples. Click Next
.
How to define the genes that you want to analyze?
On the Targets
page: specify for which targets of interest you want to do the test. Deselect Flexible
since you do not want to include it in the analysis. It’s just a bad reference gene. Click Next
.
On the Settings
page you have to describe the characteristics of your data set, allowing qbase+ to choose the appropriate test for your data.
The first thing you need to tell qbase+ is whether the data was drawn from a normal or a non-normal distribution. Since we have 8 biological replicates per group we can do a test in Prism to check if the data are normally distributed.
Which gene(s) is/are differentially expressed?
On the Settings
page you describe the characteristics of your data set so that qbase+ can choose the ideal test for your data. For our data set we can use the default settings. Click Next
. In the results Table
you can see that the p-value for Palm is below 0.05 so Palm is differentially expressed.