Workflows

Overview
Workflows for discovery of sequence variants
Which workflow do I use?
How do I use it and where do I run my analyses?

Overview

Here is the info to get you started quickly:

We are maintaining six workflows for different sequencing platforms (Illumina or Oxford Nanopore) and library preparation strategies (Ampliconic or Metatranscriptomic).
All workflows can be used to analyze any number of samples.
All workflows can be used right now on any of our global instances in EU (https://usegalaxy.eu), US (https://usegalaxy.org), or Australia (https://usegalaxy.org.au) via Galaxy's graphical user interface as shown in this tutorial.
All workflows can also be accessed programmatically by either submitting a list of accession numbers to our Request an analysis service or via Galaxy's API for automatically triggered analyses.
We provide powerful computational infrastructure for data analysis supported by national supercomputing resources in the US, EU, and Australia.

Workflows for discovery of sequence variants

We developed a number of workflows for the analysis of SARS-CoV-2 sequencing data. The workflows are available from WorkflowHub in the EU and DockStore in the US. Workflows listed in the table below were specifically designed for identifying sequence variants from SARS-CoV-2 raw read datasets and for reporting them in tabular formats and/or as a viral consensus sequence:

Link	Workflow	Inputs	Outputs	Aligner	Caller
WorkFlowHub DockStore	Illumina ARTIC: Variant analysis from ampliconic data produced with ARTIC protocol v1, v2, v3, or v4, or any alternative primer scheme. ILL-AMP	1. Paired reads [`fastqsanger`] 2. SARS-CoV-2 reference [`fasta`] 3. Primer coordinates [`bed`] 4. Primer pairs table [`tsv`]	Variants [`vcf`]	`BWA MEM`	`lofreq`
WorkFlowHub DockStore	Oxford Nanopore ARTIC: Variant analysis from ampliconic data produced with ARTIC protocol v1, v2, v3, or v4, or any alternative primer scheme. ONT-AMP	1. Reads [`fastqsanger`] 2. SARS-CoV-2 reference [`fasta`] 3. Primer coordinates [`bed`]	Variants [`vcf`]	`minimap2`	`medaka`
WorkFlowHub DockStore	Illumina metatranscriptomic PE: Variant analysis from metatranscriptomic data. ILL-MT-PE	1. Paired reads [`fastqsanger`] 2. SARS-CoV-2 reference [`fasta`]	Variants [`vcf`]	`BWA MEM`	`lofreq`
WorkFlowHub DockStore	Illumina metatranscriptomic SE: Variant analysis from metatranscriptomic data. ILL-MT-SE	1. Reads [`fastqsanger`] 2. SARS-CoV-2 reference [`fasta`]	Variants [`vcf`]	`Bowtie2`	`lofreq`
WorkFlowHub DockStore	Report generation: Generation of final variant analysis reports/plots. REPORTING	1. Variants [`vcf`] 2. Gene name translation table [`tsv`]	Reports [`tsv`], overview [`svg`]	-	-
WorkFlowHub DockStore	Consensus construction: Generation of sample consensus sequences. CONSENSUS	1. Variants [`vcf`] 2. SARS-CoV-2 reference [`fasta`] 3. Mapped reads [`bam`]	Consensus [`fasta`]	-	-

vcf = variant call format; tsv = TAB-separated values; svg = scalable vector graphics; fastqsanger = fastq format with Sanger encoding of base quality values; bed = browser extensible format; bam = sequence alignment/map format, BGZF-compressed

The following tutorial explains how to import workflows into your Galaxy instance.

Which workflow do I use?

Each of the four variant calling workflows from the table above is designed to be usable together with the reporting and consensus workflows. The table below shows which workflows to use for a full analysis depending on the combination of library prep and sequencing platform:

↓ Library Prep / Platform¹ →	Illumina	ONT
Ampliconic	ILL-AMP + REPORTING + CONSENSUS	ONT-AMP + REPORTING + CONSENSUS
Metatranscriptomic	(ILL-MT-PE or ILL-MT-SE) + REPORTING + CONSENSUS	-²

¹ - there is an increasing number of PacBio data. Our workflows can be easily adapted for these data as well. Use OPEN CHAT below to let us know. ² - this conceptually is identical to ILL-MT-SE except for replacing the mapper with minimap2 and the variant caller with medaka

How do I use it and where do I run my analyses?

This depends on who you are. If you are:

You are a ...	Where do you start ...
Biomedical researcher	Use any of the three global Galaxy instances in EU (https://usegalaxy.eu), US (https://usegalaxy.org), or Australia (https://usegalaxy.org.au). Take a look at the following tutorial to begin: Mutation calling, viral genome reconstruction and lineage/clade assignment from SARS-CoV-2 sequencing data - a Galaxy Training Network Tutorial.
Bioinformatician or data scientist	You have two options: Option 1: Use our "Request an analysis" service to submit a list of datasets to us and trigger automated analyses. Option 2: Configuring your own Galaxy instance to automatically trigger the analyses. Use this option if you run your own Galaxy installation

These analysis capabilities are supported by public computational infrastructure provided by the XSEDE consortium in the US, the deNBI and ELIXIR consortia in the EU, and Nectar Cloud in Australia. The figure below illustrates current processing times (in EU) for analysis of SARS-CoV-2 data. You can see that most analyses complete within a 1-2 hour interval.