Overview
Here is the info to get you started quickly:
- We are maintaining six workflows for different sequencing platforms (Illumina or Oxford Nanopore) and library preparation strategies (Ampliconic or Metatranscriptomic).
- All workflows can be used to analyze any number of samples.
- All workflows can be used right now on any of our global instances in EU (https://usegalaxy.eu), US (https://usegalaxy.org), or Australia (https://usegalaxy.org.au) via Galaxy's graphical user interface as shown in this tutorial.
- All workflows can also be accessed programmatically by either submitting a list of accession numbers to our Request an analysis service or via Galaxy's API for automatically triggered analyses.
- We provide powerful computational infrastructure for data analysis supported by national supercomputing resources in the US, EU, and Australia.
Workflows for discovery of sequence variants
We developed a number of workflows for the analysis of SARS-CoV-2 sequencing data. The workflows are available from WorkflowHub in the EU and DockStore in the US. Workflows listed in the table below were specifically designed for identifying sequence variants from SARS-CoV-2 raw read datasets and for reporting them in tabular formats and/or as a viral consensus sequence:
Link | Workflow | Inputs | Outputs | Aligner | Caller |
---|---|---|---|---|---|
WorkFlowHub DockStore | Illumina ARTIC: Variant analysis from ampliconic data produced with ARTIC protocol v1, v2, v3, or v4, or any alternative primer scheme. ILL-AMP | 1. Paired reads [fastqsanger ]2. SARS-CoV-2 reference [ fasta ]3. Primer coordinates [ bed ]4. Primer pairs table [ tsv ] | Variants [vcf ] | BWA MEM | lofreq |
WorkFlowHub DockStore | Oxford Nanopore ARTIC: Variant analysis from ampliconic data produced with ARTIC protocol v1, v2, v3, or v4, or any alternative primer scheme. ONT-AMP | 1. Reads [fastqsanger ]2. SARS-CoV-2 reference [ fasta ]3. Primer coordinates [ bed ] | Variants [vcf ] | minimap2 | medaka |
WorkFlowHub DockStore | Illumina metatranscriptomic PE: Variant analysis from metatranscriptomic data. ILL-MT-PE | 1. Paired reads [fastqsanger ]2. SARS-CoV-2 reference [ fasta ] | Variants [vcf ] | BWA MEM | lofreq |
WorkFlowHub DockStore | Illumina metatranscriptomic SE: Variant analysis from metatranscriptomic data. ILL-MT-SE | 1. Reads [fastqsanger ]2. SARS-CoV-2 reference [ fasta ] | Variants [vcf ] | Bowtie2 | lofreq |
WorkFlowHub DockStore | Report generation: Generation of final variant analysis reports/plots. REPORTING | 1. Variants [vcf ]2. Gene name translation table [ tsv ] | Reports [tsv ], overview [svg ] | - | - |
WorkFlowHub DockStore | Consensus construction: Generation of sample consensus sequences. CONSENSUS | 1. Variants [vcf ]2. SARS-CoV-2 reference [ fasta ]3. Mapped reads [ bam ] | Consensus [fasta ] | - | - |
vcf
= variant call format; tsv
= TAB-separated values; svg
= scalable vector graphics; fastqsanger
= fastq format with Sanger encoding of base quality values; bed
= browser extensible format; bam
= sequence alignment/map format, BGZF-compressed
The following tutorial explains how to import workflows into your Galaxy instance.
Which workflow do I use?
Each of the four variant calling workflows from the table above is designed to be usable together with the reporting and consensus workflows. The table below shows which workflows to use for a full analysis depending on the combination of library prep and sequencing platform:
↓ Library Prep / Platform1 → | Illumina | ONT |
---|---|---|
Ampliconic | ILL-AMP + REPORTING + CONSENSUS | ONT-AMP + REPORTING + CONSENSUS |
Metatranscriptomic | (ILL-MT-PE or ILL-MT-SE) + REPORTING + CONSENSUS | -2 |
1 - there is an increasing number of PacBio data. Our workflows can be easily adapted for these data as well. Use OPEN CHAT below to let us know. 2 - this conceptually is identical to ILL-MT-SE except for replacing the mapper with minimap2
and the variant caller with medaka
How do I use it and where do I run my analyses?
This depends on who you are. If you are:
You are a ... | Where do you start ... |
---|---|
Biomedical researcher | Use any of the three global Galaxy instances in EU (https://usegalaxy.eu), US (https://usegalaxy.org), or Australia (https://usegalaxy.org.au). Take a look at the following tutorial to begin: Mutation calling, viral genome reconstruction and lineage/clade assignment from SARS-CoV-2 sequencing data - a Galaxy Training Network Tutorial. |
Bioinformatician or data scientist | You have two options:
|
These analysis capabilities are supported by public computational infrastructure provided by the XSEDE consortium in the US, the deNBI and ELIXIR consortia in the EU, and Nectar Cloud in Australia. The figure below illustrates current processing times (in EU) for analysis of SARS-CoV-2 data. You can see that most analyses complete within a 1-2 hour interval.