Statistical Analysis

Once you generate target bar charts you leave the Analysis wizard and you go to the regular qbase+ interface. Suppose that you want to perform a statistical test to prove that the difference in expression that you see in the target chart is significant. At some point, qbase+ will ask you if your data is coming from a normal distribution. If you don’t know, you can select I don’t know and qbase+ will assume the data are not coming from a normal distribution and perform a stringent non-parametric test. However, when you have 7 or more replicates per group, you can check if the data is normally distributed using a statistical test. If it is, qbase+ will perform a regular t-test. The upside is that the t-test is less stringent than the non-parametric tests and will find more DE genes. However, you may only perform it on normally distributed data. If you perform the t-test on data that is not normally distributed you will generate false positives i.e. qbase+ will say that genes are DE while in fact they are not. Performing a non-parametric test on normally distributed data will generate false negatives i.e. you will miss DE genes.

Checking if the data is normally distributed can be easily done in GraphPad Prism. To this end you have to export the data.

How to export the data?

To export the results click the upward pointing arrow in the qbase+ toolbar: You want to export the normalized data so select Export Result Table (CNRQ): You will be given the choice to export results only (CNRQs) or to include the errors (standard error of the mean) as well . We don’t need the errors in Prism so we do not select this option. The scale of the Result table can be linear or logarithmic (base 10) . Without user intervention, qbase+ will automatically log10 transform the CNRQs prior to doing statistics. So we need to check in Prism if the log transformed data are normally distributed. Additionally, you need to tell qbase+ where to store the file containing the exported data. Click the Browse button for this .

Exporting will generate an Excel file in the location that you specified. However, the file contains the results for all samples and we need to check the two groups (treated and untreated) separately. The sample properties show that the even samples belong to the treated group and the odd samples to the untreated group. This means we have to generate two files:

Now we can open these files in Prism to check if the data is normally distributed.

How to import the data of the untreated samples in Prism?

  • Open Prism
  • Expand File in the top menu
  • Select New
  • Click New Project File
  • In the left menu select to create a Column table. Data representing different groups (in our case measurements for different genes) should always be loaded into a column table.
  • Select Enter replicate values, stacked into columns (this is normally the default selection) since the replicates (measurements for the same gene) are stacked in the columns.
  • Click Create

Prism has now created a table to hold the data of the untreated samples but at this point the table is still empty. To load the data:

  • Expand File in the top menu
  • Select Import
  • Browse to the resultslog.csv file, select it and click Open
  • In the Source tab select Insert data only
  • Since this is a European csv file commas are used as decimal separators so in contrast to what its name might imply, semicolons and not commas are used to separate the columns in the csv file (you can open the file in a text editor to take a look). In American csv files dots are used as decimal separator and the comma is used to separate the columns. Prism doesn’t know the format of your csv file so you have to tell him the role of the comma in your file. Select Separate decimals
  • Go to the Filter tab and specify the rows you want to import (the last rows are these of the standard and the water samples, you don’t want to include them)
  • Click Import

As the file is opened in Prism you see that the first column containing the sample names is treated as a data column. Right click the header of the first column and select Delete

How to check if the data of the untreated samples comes from a normal distribution?

  • Click the Analyze button in the top menu
  • Select to do the Column statistics analysis in the Column analyses section of the left menu
  • In the right menu, deselect Flexible. It’s a bad reference gene so you will not include it in the qbase+ analysis so there’s no point checking its normality (it is probably not normally distributed). In that respect you could also deselect the other two reference genes since you will do the DE test on the target genes and not on the reference genes.
  • Click OK
  • In the Descriptive statistics and the Confidence intervals section deselect everything except Mean, SD, SEM. These statistics is not what we are interested in: we want to know if the data comes from a normal distribution. The only reason we select Mean, SD, SEM is because if we make no selection here Prism throws an error.
  • In the Test if the values come from a Gaussian distribution section select the D’agostino-Pearson omnibus test to test if the data are drawn from a normal distribution. Although Prism offers three tests for this, the D’Agostino-Pearson test is the safest option.
  • Click OK

Prism now generates a table to hold the results of the statistical analysis: As you can see, the data for Palm are not normally distributed.

Since we found that there’s one group of data that does not follow a normal distribution, it’s no longer necessary to check if the treated data are normally distributed but you can do it if you want to. We will now proceed with the statistical analysis in qbase+. Statistical analyses can be performed via the Statistics wizard.

How to open the Statistics wizard?

You can open it in the Project Explorer (window at the left):

  • expand Project1 if it’s not yet expanded
  • expand the Experiments folder in the project if it’s not yet expanded
  • expand the GeneExpression experiment if it’s not yet expanded
  • expand the Analysis section if it’s not yet expanded
  • expand the Statistics section
  • double click Stat wizard

This opens the Statistics wizard that allows you to perform various kinds of statistical analyses.

Which kind of analysis are you going to do?

On the Goal page: Select Mean comparison since you want to compare expression between two groups of samples so what you want to do is comparing the mean expression of each gene in the treated samples with its mean expression level in the untreated samples. Click Next.

How to define the groups that you are going to compare?

On the Groups page: specify how to define the two groups of samples that you want to compare. Select Treatment as the grouping variable to compare treated and untreated samples. Click Next.

How to define the genes that you want to analyze?

On the Targets page: specify for which targets of interest you want to do the test. Deselect Flexible since you do not want to include it in the analysis. It’s just a bad reference gene. Click Next.

On the Settings page you have to describe the characteristics of your data set, allowing qbase+ to choose the appropriate test for your data.

The first thing you need to tell qbase+ is whether the data was drawn from a normal or a non-normal distribution. Since we have 8 biological replicates per group we can do a test in Prism to check if the data are normally distributed.

Which gene(s) is/are differentially expressed?

On the Settings page you describe the characteristics of your data set so that qbase+ can choose the ideal test for your data. For our data set we can use the default settings. Click Next. In the results Table you can see that the p-value for Palm is below 0.05 so Palm is differentially expressed.