← Back to covid19

Workflows

Overview


Here is the info to get you started quickly:

  • We are maintaining six workflows for different sequencing platforms (Illumina or Oxford Nanopore) and library preparation strategies (Ampliconic or Metatranscriptomic).
  • All workflows can be used to analyze any number of samples.
  • All workflows can be used right now on any of our global instances in EU (https://usegalaxy.eu), US (https://usegalaxy.org), or Australia (https://usegalaxy.org.au) via Galaxy's graphical user interface as shown in this tutorial.
  • All workflows can also be accessed programmatically by either submitting a list of accession numbers to our Request an analysis service or via Galaxy's API for automatically triggered analyses.
  • We provide powerful computational infrastructure for data analysis supported by national supercomputing resources in the US, EU, and Australia.

Workflows for discovery of sequence variants


We developed a number of workflows for the analysis of SARS-CoV-2 sequencing data. The workflows are available from WorkflowHub in the EU and DockStore in the US. Workflows listed in the table below were specifically designed for identifying sequence variants from SARS-CoV-2 raw read datasets and for reporting them in tabular formats and/or as a viral consensus sequence:

LinkWorkflowInputsOutputsAlignerCaller
WorkFlowHub
DockStore
Illumina ARTIC:
Variant analysis from ampliconic data produced with ARTIC protocol v1, v2, v3, or v4, or any alternative primer scheme.
ILL-AMP
1. Paired reads [fastqsanger]
2. SARS-CoV-2 reference [fasta]
3. Primer coordinates [bed]
4. Primer pairs table [tsv]
Variants [vcf]BWA MEMlofreq
WorkFlowHub
DockStore
Oxford Nanopore ARTIC:
Variant analysis from ampliconic data produced with ARTIC protocol v1, v2, v3, or v4, or any alternative primer scheme.
ONT-AMP
1. Reads [fastqsanger]
2. SARS-CoV-2 reference [fasta]
3. Primer coordinates [bed]
Variants [vcf]minimap2medaka
WorkFlowHub
DockStore
Illumina metatranscriptomic PE:
Variant analysis from metatranscriptomic data.
ILL-MT-PE
1. Paired reads [fastqsanger]
2. SARS-CoV-2 reference [fasta]
Variants [vcf]BWA MEMlofreq
WorkFlowHub
DockStore
Illumina metatranscriptomic SE:
Variant analysis from metatranscriptomic data.
ILL-MT-SE
1. Reads [fastqsanger]
2. SARS-CoV-2 reference [fasta]
Variants [vcf]Bowtie2lofreq
WorkFlowHub
DockStore
Report generation:
Generation of final variant analysis reports/plots.
REPORTING
1. Variants [vcf]
2. Gene name translation table [tsv]
Reports [tsv], overview [svg]--
WorkFlowHub
DockStore
Consensus construction:
Generation of sample consensus sequences.
CONSENSUS
1. Variants [vcf]
2. SARS-CoV-2 reference [fasta]
3. Mapped reads [bam]
Consensus [fasta]--

vcf = variant call format; tsv = TAB-separated values; svg = scalable vector graphics; fastqsanger = fastq format with Sanger encoding of base quality values; bed = browser extensible format; bam = sequence alignment/map format, BGZF-compressed

The following tutorial explains how to import workflows into your Galaxy instance.

Which workflow do I use?


Each of the four variant calling workflows from the table above is designed to be usable together with the reporting and consensus workflows. The table below shows which workflows to use for a full analysis depending on the combination of library prep and sequencing platform:

↓ Library Prep / Platform1IlluminaONT
AmpliconicILL-AMP + REPORTING + CONSENSUSONT-AMP + REPORTING + CONSENSUS
Metatranscriptomic(ILL-MT-PE or ILL-MT-SE) + REPORTING + CONSENSUS-2

1 - there is an increasing number of PacBio data. Our workflows can be easily adapted for these data as well. Use OPEN CHAT below to let us know. 2 - this conceptually is identical to ILL-MT-SE except for replacing the mapper with minimap2 and the variant caller with medaka

How do I use it and where do I run my analyses?


This depends on who you are. If you are:

You are a ...Where do you start ...
Biomedical researcherUse any of the three global Galaxy instances in EU (https://usegalaxy.eu), US (https://usegalaxy.org), or Australia (https://usegalaxy.org.au). Take a look at the following tutorial to begin: Mutation calling, viral genome reconstruction and lineage/clade assignment from SARS-CoV-2 sequencing data - a Galaxy Training Network Tutorial.
Bioinformatician or data scientistYou have two options:
  1. Option 1: Use our "Request an analysis" service to submit a list of datasets to us and trigger automated analyses.
  2. Option 2: Configuring your own Galaxy instance to automatically trigger the analyses. Use this option if you run your own Galaxy installation

These analysis capabilities are supported by public computational infrastructure provided by the XSEDE consortium in the US, the deNBI and ELIXIR consortia in the EU, and Nectar Cloud in Australia. The figure below illustrates current processing times (in EU) for analysis of SARS-CoV-2 data. You can see that most analyses complete within a 1-2 hour interval.