This material is linked to an online training. It is not intended to be followed via e-learning only because not all info is here.
Exercises
- Belgian Galaxy server: you have to create a free account
- European Galaxy server: you have to create a free account
Data sets
- Arabidopsis control SE RNASeq sample (SRR074285.fastq.gz)
- Arabidopsis disease SE RNASeq sample (SRR074262.fastq.gz)
- shared raw Arabidopsis control SE RNASeq data on Galaxy
- shared groomed Arabidopsis control SE RNASeq data on Galaxy
- E. coli SE ChIP sample (SRR576933.fastq.gz)
- E. coli SE control sample (SRR576938.fastq.gz)
- human PE RNASeq sample, file 1 (SRR1039509_1.fastq.gz)
- human PE RNASeq sample, file 2 (SRR1039509_2.fastq.gz)
Tools used during this course
- Groomer
- FastQC
- Trimmomatic
- Bowtie
- BWA-MEM
- STAR
- HISAT2
- Kallisto
- Salmon
- PICARD
- samtools
- Qualimap
- RSeQC
- IGV
Links
- ASCII table to convert Phred scores into single characters
- FastQC manual
- Examples of good FastQC reports
- Explanation of the SAM/BAM file format
- Interpretation of SAM flags
- Overview of common NGS problems
Software for ChipSeq
Software for variant calling
Toy data sets for variant calling
The data sets of the variant analysis training. These are human paired-end data of the 1000 genomes project (sample ID: NA18507):
- the two files at the bottom consist of reads that map to chromosome 21
- the two files in the middle contain randomly picked 10% of the reads of the previous files (even smaller and therefore suited for mapping)
- the two files at the top contain randomly picked 1% of the reads of the two files with reads that map to chromosome 21.
These are small, workable files that were generated based on real DNASeq data.