Welcome everybody, and thank you for joining this course!

Everything you need for this course can be found on this webpage. More information including links to all training materials can be found by clicking on each session


Note: Problems viewing this page? Try the simple view instead.

Note: Slack will not be as active anymore, but you can also ask your quesions on Gitter.

Welcome & Practical Information

Start here. This will cover all the logistics and practical information for this training week.

Self-Study Tutorial

Start here; we will go over all the important things to know to get the most out of this workshop

Speaker

Helena Rasche

Helena Rasche

Avans Hogeschool

Speaker

Saskia Hiltemann

Saskia Hiltemann

Erasmus Medical Center

Supporting Materials

GTN logo

Self-Study Tutorial

Supporting Materials

  • Slack channel: - Have question about the training? Did you run into a problem? Just wanna chat?
  • Finished the session? - Let us know that you've finished it, and what you thought of it! On Slack: . Thanks!
  • Enjoyed it? - Like the video on YouTube, Tweet (hashtag #usegalaxy), and follow the GTN on Twitter! @gxytraining
GTN logo

Self-Study Tutorial

Supporting Materials

  • Slack channel: - Have question about the training? Did you run into a problem? Just wanna chat?
  • Finished the session? - Let us know that you've finished it, and what you thought of it! On Slack: . Thanks!
  • Enjoyed it? - Like the video on YouTube, Tweet (hashtag #usegalaxy), and follow the GTN on Twitter! @gxytraining
GTN logo

Self-Study Tutorial

This short video gives an overview of the worldwide Galaxy community, and different ways you can get involved! Video created by Beatriz Serrano-Solano.

Speaker

The Global GTN Community

The Global GTN Community

Supporting Materials

GTN logo

Self-Study Tutorial

Supporting Materials

  • Slack channel: - Have question about the training? Did you run into a problem? Just wanna chat?
  • Finished the session? - Let us know that you've finished it, and what you thought of it! On Slack: . Thanks!
  • Enjoyed it? - Like the video on YouTube, Tweet (hashtag #usegalaxy), and follow the GTN on Twitter! @gxytraining
GTN logo

Introduction to Galaxy

Start here if you are new to Galaxy. These videos will introduce you to the Galaxy platform, and walk you through your first analyses

Self-Study Tutorial

This video will introduce the Galaxy data analysis platform, and give a short demo on how to use it.

Speaker

Anton Nekrutenko

Anton Nekrutenko

Penn State University

Supporting Materials

GTN logo

Self-Study Tutorial

In this tutorial, we will walk you through your first (toy) analysis in Galaxy. This tutorial is aimed at familiarizing you with the Galaxy platform, and some basic NGS concepts.

Speaker

Dave Clements

Dave Clements

Johns Hopkins University


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

This video tutorial will provide a non-genomics based first look at the Galaxy platform, how to use it, and discusses how you can get support for your data analysis.

Speaker

Anne Fouilloux

Anne Fouilloux

EOSC-Nordic


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Each of the 4 webinars in this series that ran earlier this year highlights avaialble Galaxy resources for a different audience. These videos are a nice way to get an overview of what Galaxy has to offer for different types of users.

Supporting Materials

GTN logo

Advanced Galaxy Features

These tutorials cover some more advanced Galaxy features

Self-Study Tutorial

Through a series of examples, this tutorial aims to familiarize the reader with building Galaxy collections from tabular data containing URLs, sample sheets, list of accessions or identifiers, etc..

Speaker

Assunta DeSanto

Assunta DeSanto

Penn State University


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

This is a more technical tutorial which will teach you how to run workflows from the command line and scale your analyses. If you are on Windows, you will need WSL/WSL2 setup before this session.


Supported Servers

Supporting Materials

GTN logo

Introduction to NGS Analysis

This module will introduce the basics of NGS analysis, from cleaning your data to mapping and assembly

Self-Study Tutorial

In this demo video, we will show how to perform an NGS data analysis, using a SARS-CoV-2 example dataset. If you would like to run the full tutorial yourself, please find the link below (duration 1h-1.5h).

Speaker

Anton Nekrutenko

Anton Nekrutenko

Penn State University


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

This session covers the basics about how to assess and improve the quality of your sequencing data.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Mapping sequencing reads to a reference genome is often the next step after QC. This session covers the basic concepts of mapping and the practical will guide you through performing a mapping step on sequending data.

Speaker

Peter van Heusden

Peter van Heusden

SANBI/UWC


Supported Servers

Supporting Materials

  • Slides: Mapping
  • Tutorial: Mapping
  • FAQ Document - Have a question about this training? Check here to see if it has already been answered
  • Slack channel: #ngs_mapping - Have question about the training? Did you run into a problem? Just wanna chat? Ask an Instructor on Slack!
  • Finished the session? - Let us know that you've finished it, and what you thought of it! On Slack: (Channel: #ngs_mapping ). Thanks!
  • Enjoyed it? - Like the video on YouTube, Tweet (hashtag #usegalaxy), and follow the GTN on Twitter! @gxytraining
GTN logo

Self-Study Tutorial

When there is no reference genome available for your organism, you will have to assemble the short reads into larger segments (contigs).


Supported Servers

Supporting Materials

GTN logo

Advanced NGS Analysis

Self-Study Tutorial

This tutorial guides you through the preprocessing of sequencing data of bronchoalveolar lavage fluid (BALF) samples obtained from early COVID-19 patients in China. Since such samples are expected to be contaminated signficantly with human sequenced reads, the goal is to enrich the data for SARS-CoV-2 reads by identifying and discarding reads of human origin before trying to assemble the viral genome sequence.

Speaker

Cristóbal Gallardo

Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Genome annotation is a multi-level process that includes prediction of protein-coding genes, as well as other functional genome units such as structural RNAs, tRNAs, small RNAs, pseudogenes, control regions, direct and inverted repeats, insertion sequences, transposons and other mobile elements.

Supporting Materials

GTN logo

Self-Study Tutorial

After automatically annotating your genome using Prokka or Maker, it is important to visualize your results so you can understand what your organism looks like, and then to manually refine these annotations along with any additional data you might have. This process is most often done as part of a group, smaller organisms may be annotated individually though.

Apollo (Dunn et al. 2019) provides a platform to do this. It is a web-based, collaborative genome annotation editor. Think of it as “Google Docs” for genome annotation, multiple users can work together simultaneously to curate evidences and annotate a genome.

This tutorial is inspired by the Apollo User’s Guide, which provides additional guidance.

Speaker

Anthony Bretaudeau

Anthony Bretaudeau

INRAE


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

In many eukaryotic organisms, such as humans, the genome is tightly packed and organized with the help of nucleosomes (chromatin). A nucleosome is a complex formed by eight histone proteins that is wrapped with ~147bp of DNA. When the DNA is being actively transcribed into RNA, the DNA will be opened and loosened from the nucleosome complex. Many factors, such as the chromatin structure, the position of the nucleosomes, and histone modifications, play an important role in the organization and accessibility of the DNA. Consequently, these factors are also important for the activation and inactivation of genes. Assay for Transposase-Accessible Chromatin using sequencing (ATAC-Seq) is a method to investigate the accessibility of chromatin and thus a method to determine regulatory mechanisms of gene expression. The method can help identify promoter regions and potential enhancers and silencers.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

ncbi logo Traditionally, after a list of run accessions has been filtered on the NCBI website, the accessions are used to download and extract fastq using the SRA toolkit to enter into the next steps of the workflow. A newer compressed data type generated from raw submission data for sequences containing hits to SARS-CoV-2 is also accessible to Galaxy users from SRA in the Cloud.

SRA Aligned Read Format (SARFs) provides further output options other than basic FASTQ format, for example:

  1. contigs created from the raw reads in the run (FASTA format)
  2. reads aligned back to the contigs (SAM format)
  3. VCF files generated for each record relative to the SARS-CoV-2 RefSeq record.

We will demonstrate how to access just the reads (FASTA or FASTQ format), just the contigs, the reads aligned to the contigs and VCF files for selected SARFs. These formats can speed up workflows such as assembly and variant calling, and this data format is still referenced by the run accession and accessed using the SRA toolkit.

This workshop describes the SARF data objects and demonstrates a few ways to filter them using the metadata before accessing them and entering them into Galaxy workflows.


Supported Servers

Supporting Materials

GTN logo

Transcriptomics

The following module cover transcriptomic analysis in Galaxy, from bulk RNA-seq to Single-cell RNA-Seq, from Quality control to Visualisation

Introduction to RNA-Seq

Self-Study Tutorial

RNA sequencing is used to assess the expression levels of genes. This video introduces the important concepts related to RNA-seq analysis.

Speaker

Fotis E. Psomopoulos

Fotis E. Psomopoulos

INAB|CERTH

Supporting Materials

GTN logo

Self-Study Tutorial

In this tutorial we will walk you through the process of an RNA-seq analysis, comparing gene expression levels between different conditions and assessing impacted gene pathways.

Speaker

Bérénice Batut

Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

In this tutorial, we will use de novo transcript reconstruction to infer transcript structures from aligned reads. We will identify and quantify transcripts present in two different cell state and determine which transcripts are differentially expressed between the two states.

Speaker

Mallory Freeberg

Mallory Freeberg

EMBL-EBI


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

In this tutorial, we will analyze RNA sequencing data to extract information about potential genes regulated in response to abiotic stress in plants. For this purpose, the chosen approach is the identification of genes with complementary regions to miRNAs upregulated in response to brassinosteroids.

Speaker

Cristóbal Gallardo

Supported Servers

Supporting Materials

GTN logo

RNA-Seq analysis using Rstudio in Galaxy

Self-Study Tutorial

This tutorial will show you how you can start Rstudio from within Galaxy. This option is only available on Galaxy EU for the time being. If you are working on a different Galaxy server, you can use Rstudio Cloud (https://rstudio.cloud/)

Speaker

Fotis E. Psomopoulos

Fotis E. Psomopoulos

INAB|CERTH

Supporting Materials

GTN logo

Self-Study Tutorial

This tutorial will provide an introduction to using R with Rstudio in Galaxy.

Speaker

Fotis E. Psomopoulos

Fotis E. Psomopoulos

INAB|CERTH


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

An advanced tutorial covering downstream analysis of RNA-seq data using R.

Speaker

Fotis E. Psomopoulos

Fotis E. Psomopoulos

INAB|CERTH

Supporting Materials

GTN logo

Self-Study Tutorial

In this tutorial, we will visualize our RNA-seq analysis results using R.

Speaker

Fotis E. Psomopoulos

Fotis E. Psomopoulos

INAB|CERTH


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Volcano plots are commonly used to display the results of RNA-seq or other omics experiments. A volcano plot is a type of scatterplot that shows statistical significance (P value) versus magnitude of change (fold change). It enables quick visual identification of genes with large fold changes that are also statistically significant. These tutorials will first teach you how to create such plots from your RNA-Seq results in Galaxy, and then how you can further customized the plots using R directly within Galaxy.

Speaker

Maria Doyle

Supporting Materials

GTN logo

Single-cell RNA-Seq Analysis

Why smash tissues up to analyse them, when you can find what’s inside each individual cell? In these scRNA-seq tutorials, you’ll move from FASTQ to trajectory using the same dataset throughout – a case study of previously published, real mouse data. And for the plant enthusiasts, you’ll do the same thing in Arabidopsis! Enjoy life at a single cell scale!

Self-Study Tutorial

Single-cell RNA-seq analysis is a rapidly evolving field at the forefront of transcriptomic research, used in high-throughput developmental studies and rare transcript studies to examine cell heterogeneity within a populations of cells. The cellular resolution and genome wide scope make it possible to draw new conclusions that are not otherwise possible with bulk RNA-seq. Slides created by Mehmet Tekman.

Speaker

These slides are narrated by AWS Polly.

These slides are narrated by AWS Polly.

This helps us keep the video slides up-to-date more easily.

Supporting Materials

GTN logo

Self-Study Tutorial

This tutorial will take you from raw FASTQ files to a cell x gene data matrix in AnnData format. What’s a data matrix, and what’s AnnData format? Well you’ll find out! Importantly, this is the first step in processing single cell data in order to start analysing it.

Speaker

Wendi Bacon

Wendi Bacon

The Open University


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

You’ve done all the work to make a single cell matrix, with gene counts and mitochondrial counts and buckets of cell metadata from all your variables of interest, now it’s time to fully process our data, to remove low quality cells, to reduce the many dimensions of data that make it difficult to work with, and ultimately to try to define our clusters and to find our biological meaning and insights!

Speaker

Wendi Bacon

Wendi Bacon

The Open University


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

You’ve done all the hard work of preparing a single cell matrix, processing it, plotting it, interpreting it, finding lots of lovely genes, all within the glorious Galaxy interface. Now you want to infer trajectories, or relationships between cells… and you’ve been threatened with learning Python to do so! Well, fear not. If you can have a run-through of a basic python coding introduction, then that will help you make more sense of this tutorial, however you’ll be able to make and interpret glorious plots even without understanding the Python coding language. This is the beauty of Galaxy - all the ‘set-up’ is identical across computers, because it’s browser based. So fear not!

Speaker

Wendi Bacon

Wendi Bacon

The Open University


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Single cell RNA-seq analysis is a cornerstone of developmental research and provides a great level of detail in understanding the underlying dynamic processes within tissues. In the context of plants, this highlights some of the key differentiation pathways that root cells undergo.

Speaker

Mehmet Tekman

Supported Servers

Supporting Materials

  • Tutorial: Analysis of plant scRNA-Seq Data with Scanpy
  • FAQ Document - Have a question about this training? Check here to see if it has already been answered
  • Slack channel: - Have question about the training? Did you run into a problem? Just wanna chat?
  • Finished the session? - Let us know that you've finished it, and what you thought of it! On Slack: . Thanks!
  • Enjoyed it? - Like the video on YouTube, Tweet (hashtag #usegalaxy), and follow the GTN on Twitter! @gxytraining
GTN logo

Proteomics

In this module we explore the world of proteomics! Today we have a mixture of lectures, hands-on tutorials, and workflow demonstrations. The FAQ document provides links to example histories if you would like to explore the outputs of the demos yourself.

Self-Study Tutorial

This lecture will provide an introduction to mass spectrometry (MS) based proteomics analysis. Slides created by Melanie Föll.

Speaker

These slides are narrated by AWS Polly.

These slides are narrated by AWS Polly.

This helps us keep the video slides up-to-date more easily.

Supporting Materials

GTN logo

Self-Study Tutorial

Modern mass spectrometry-based proteomics enables the identification and quantification of thousands of proteins. Therefore, quantitative mass spectrometry represents an indispensable technology for biological and clinical research. Statistical analyses are required for the unbiased answering of scientific questions and to uncover all important information in the proteomic data. In this training we will cover the full analysis workflow from label-free, data dependent acquisition (DDA) raw data to statistical results. We’ll use two popular quantitative proteomics software: MaxQuant and MSstats.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Data Independent Acquisition Mass Spectrometry (DIA-MS) provides reproducible quantitative information as an improvement over the Data Dependent Acquisition (DDA). The EncyclopeDIA workflow tutorial will guide the user through the steps of a) conversion of the input DIA-MS RAW data files to mzML; b) the generation of a Chromatogram Library using the gas-phase fractionation (GPF) method and spectral library and c) generation of peptide and protein quantitation outputs.

Speaker

Emma Leith

Speaker

James Johnson

James Johnson

University of Minnesota

Speaker

Pratik Jagtap

Pratik Jagtap

University of Minnesota


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

This tutorial covers an annotation pipeline for a protein list identified by LC-MS/MS experiments.

Speaker

Yves Vandenbrouck

Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

A biomarker is a measurable biological component that can be routinely detected in clinical practice and reflects a disease state, response to therapeutic treatment, or other relevant biological state. In this tutorial we introduce successively the tools of this pipeline, and guide you to execute them in order to complete the entire pipeline on a concrete example.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

In this tutorial, the basic workflow for metaproteomics is described, which includes database search, seeking taxonomy information and functional analysis. Related metaproteomics and functional microbiome tools and GTN tutorials are also mentioned in this tutorial.

Speaker

Pratik Jagtap

Pratik Jagtap

University of Minnesota


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

The Coronavirus Disease 2019 (COVID-19) global pandemic has had a profound, lasting impact on the world’s population. Accurate and timely diagnosis of COVID-19 infection is an important step for providing care and containing its further spread. In this tutorial, attendees will be introduced to two workflows – a) database search workflow and b) peptide validation workflow. Attendees of the workshop will get an in-depth knowledge of the Galaxy workflows that detect SARS-CoV-2 peptides (10.1186/s12014-021-09321-1)) and co-infecting pathogen peptides (10.1021/acs.jproteome.0c00822).

Speaker

Timothy J. Griffin

Timothy J. Griffin

University of Minnesota

Speaker

Subina Mehta

Speaker

Andrew Rajczewski

Andrew Rajczewski

University of Minnesota

Speaker

Pratik Jagtap

Pratik Jagtap

University of Minnesota

Supporting Materials

GTN logo

Proteogenomics

Proteogenomics utilizes a combination of proteomics, genomics, and transcriptomics to aid in the discovery and identification of peptides

Self-Study Tutorial

In this opening presentation, the basic components of proteogenomics are described, including the main steps in the bioinformatics analysis workflow that make up this approach and will be detailed in the following tutorials. Some examples of research questions that benefit from a proteogenomics approach are also highlighted.

Speaker

Timothy J. Griffin

Timothy J. Griffin

University of Minnesota

Supporting Materials

GTN logo

Self-Study Tutorial

In this tutorial, we will provide a walkthrough on how to generate a customized protein sequence database using RNA-Seq data. The resultant FASTA database contains sequences with single amino acid variants (SAVs), insertions and deletions (indels)and transcript assemblies (splicing variants). This database can then be used to identify protein sequence variants from the mass spectrometry data.

Speaker

James Johnson

James Johnson

University of Minnesota


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

In this demonstration, we’re continuing the process of proteogenomic analysis. Here, we search mass spectrometry data against a custom proteogenomics FASTA database to create peptide spectral matches (PSMs) for each dataset; we also isolate non-canonical peptides from the resulting data for further analysis. If you would like to view the results yourself, we provide example histories in the FAQ document

Speaker

Andrew Rajczewski

Andrew Rajczewski

University of Minnesota


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

In this demonstration, we will investigate the peptides that were identified during proteogenomics tutorial 2. We will identify the presence of novel proteoforms that are absent in the reference database, annotate the novel peptides and visualize them using the Multi-omics Visualization Platform (MVP). If you would like to view the results yourself, we provide example histories in the FAQ document

Speaker

Subina Mehta

Supported Servers

Supporting Materials

GTN logo

Microbial Analysis using Galaxy

Self-Study Tutorial

Tuberculosis (TB) is an infectious disease caused by the bacterium Mycobacterium tuberculosis. According to the WHO, in 2018 there were 10.0 million new cases of TB worldwide and 1.4 million deaths due to the disease, making TB the world’s most deadly infectious disease.

Speaker

Peter van Heusden

Peter van Heusden

SANBI/UWC


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

This lecture briefly introduces 16S sequencing, a popular technique used for taxonomic profiling of microbial communities.

Speaker

Saskia Hiltemann

Saskia Hiltemann

Erasmus Medical Center


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Antimicrobial resistnace (AMR) poses a major threat to human health. Plasmids are able to transfer AMR genes among bacterial isolates Long-read sequencing technologies aid in the reconstruction of bacterials genomes and plasmids, in order to determine the presence of antimicrobial resistant genes.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Metatranscriptomics analysis examines how the microbiome responds to the environment by studying the taxonomic composition and functional analysis of genes expressed by the microbiome, using microbial community RNASeq data and subsequent metatranscriptomics workflows. This workshop will introduce researchers to the basic concepts and tools from the ASaiM-MT workflow. ASaiM-MT provides a curated collection of tools to explore and visualize taxonomic and functional information from metatranscriptomic sequences.

Speaker

Pratik Jagtap

Pratik Jagtap

University of Minnesota

Speaker

Timothy J. Griffin

Timothy J. Griffin

University of Minnesota

Speaker

Subina Mehta

Speaker

Saskia Hiltemann

Saskia Hiltemann

Erasmus Medical Center


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

In this tutorial, the basic workflow for metaproteomics is described, which includes database search, seeking taxonomy information and functional analysis. Related metaproteomics and functional microbiome tools and GTN tutorials are also mentioned in this tutorial.

Speaker

Pratik Jagtap

Pratik Jagtap

University of Minnesota


Supported Servers

Supporting Materials

GTN logo

Machine Learning

Self-Study Tutorial

The lecture explains introductory concepts in machine learning such as supervised and unsupervised learning, classification and regression, hyperparameter optimisation, cross-validation, train, test and validation sets.

Supporting Materials

GTN logo

Self-Study Tutorial

The talk includes a lecture followed by a hands-on session to apply multiple classification algorithms on the Quantitative structure-activity relationship (QSAR) dataset to predict the biodegradable activity of chemical compounds. QSAR models attempt to predict the activity or property of chemicals based on their chemical structure. To achieve this, a database of compounds is collected for which the property of interest is known. For each compound, molecular descriptors are collected which describe the structure (for example - molecular weight, number of nitrogen atoms, number of carbon-carbon double bonds). Using these descriptors, a model is constructed which is capable of predicting the property of interest for a new, unknown molecule. In this tutorial, we will use a database assembled from experimental data of the Japanese Ministry of International Trade and Industry to create a classification model by applying simple and complex classifiers to learn the nature of biodegradation. We will use this model to classify new molecules into one of two classes - biodegradable or non-biodegradable. Different visualisations are used to analyze the results after applying each classification algorithm. Hyperparameters of one of the classifiers are also optimised.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

The talk includes a lecture followed by a hands-on session to apply multiple regression algorithms on the DNA-methylation dataset to predict biological age. In this tutorial, we will build a regression model for chronological age prediction, based on DNA methylation. This is based on the work of Jana Naue et al. 2017, in which biomarkers are examined to predict the chronological age of humans by analyzing the DNA methylation patterns. Different machine learning algorithms are used in this study to make an age prediction. It has been recognized that within each individual, the level of DNA methylation changes with age. This knowledge is used to select useful biomarkers from DNA methylation datasets. The CpG sites with the highest correlation to age are selected as the biomarkers (and therefore features for building a regression model). In this tutorial, specific biomarkers are analyzed by machine learning algorithms to create an age prediction model. Multiple visualisations are also used to analyse the predictions made by simple and complex regressors and hyperparameters of one of the regressors also optimised.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

This is an Introduction to Machine Learning in R, in which you’ll learn the basics of unsupervised learning for pattern recognition and supervised learning for prediction. At the end of this workshop, we hope that you will

  • appreciate the importance of performing exploratory data analysis (or EDA) before starting to model your data.
  • understand the basics of unsupervised learning and know the examples of principal component analysis (PCA) and k-means clustering.
  • understand the basics of supervised learning for prediction and the differences between classification and regression.
  • understand modern machine learning techniques and principles, such as test train split, k-fold cross validation and regularization.
  • be able to write code to implement the above techniques and methodologies using R, caret and glmnet.

We will not be focusing on the mathematical foundation for each of the methods and approaches we’ll be discussing. There are many resources that can provide this context, but for the purposes of this workshop we believe that they are beyond the scope.

Speaker

Fotis E. Psomopoulos

Fotis E. Psomopoulos

INAB|CERTH


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Artificial neural networks are a machine learning discipline roughly inspired by how neurons in a human brain work. In the past decade, there has been a huge resurgence of neural networks thanks to the vast availability of data and enormous increases in computing capacity (Successfully training complex neural networks in some domains requires lots of data and compute capacity). There are various types of neural networks, 3 of which we will cover in this set of tutorials.

Speaker

Kaivan Kamali

Kaivan Kamali

Penn State University

Supporting Materials

GTN logo

SARS-CoV-2 analysis

Here we have collected all the training sessions that cover SARS-CoV-2 analysis. These sessions also appear in other modules and cover a range of topics, but one thing they have in common is that they use SARS-CoV-2 or COVID-19 data.

Self-Study Tutorial

In this demo video, we will show how to perform an NGS data analysis, using a SARS-CoV-2 example dataset. If you would like to run the full tutorial yourself, please find the link below (duration 1h-1.5h).

Speaker

Anton Nekrutenko

Anton Nekrutenko

Penn State University


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

ncbi logo Traditionally, after a list of run accessions has been filtered on the NCBI website, the accessions are used to download and extract fastq using the SRA toolkit to enter into the next steps of the workflow. A newer compressed data type generated from raw submission data for sequences containing hits to SARS-CoV-2 is also accessible to Galaxy users from SRA in the Cloud.

SRA Aligned Read Format (SARFs) provides further output options other than basic FASTQ format, for example:

  1. contigs created from the raw reads in the run (FASTA format)
  2. reads aligned back to the contigs (SAM format)
  3. VCF files generated for each record relative to the SARS-CoV-2 RefSeq record.

We will demonstrate how to access just the reads (FASTA or FASTQ format), just the contigs, the reads aligned to the contigs and VCF files for selected SARFs. These formats can speed up workflows such as assembly and variant calling, and this data format is still referenced by the run accession and accessed using the SRA toolkit.

This workshop describes the SARF data objects and demonstrates a few ways to filter them using the metadata before accessing them and entering them into Galaxy workflows.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

This tutorial guides you through the preprocessing of sequencing data of bronchoalveolar lavage fluid (BALF) samples obtained from early COVID-19 patients in China. Since such samples are expected to be contaminated signficantly with human sequenced reads, the goal is to enrich the data for SARS-CoV-2 reads by identifying and discarding reads of human origin before trying to assemble the viral genome sequence.

Speaker

Cristóbal Gallardo

Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

The Coronavirus Disease 2019 (COVID-19) global pandemic has had a profound, lasting impact on the world’s population. Accurate and timely diagnosis of COVID-19 infection is an important step for providing care and containing its further spread. In this tutorial, attendees will be introduced to two workflows – a) database search workflow and b) peptide validation workflow. Attendees of the workshop will get an in-depth knowledge of the Galaxy workflows that detect SARS-CoV-2 peptides (10.1186/s12014-021-09321-1)) and co-infecting pathogen peptides (10.1021/acs.jproteome.0c00822).

Speaker

Timothy J. Griffin

Timothy J. Griffin

University of Minnesota

Speaker

Subina Mehta

Speaker

Andrew Rajczewski

Andrew Rajczewski

University of Minnesota

Speaker

Pratik Jagtap

Pratik Jagtap

University of Minnesota

Supporting Materials

GTN logo

Plant data analysis

Here we have collected all the training sessions that cover plant data analysis. These sessions also appear in other modules and cover a range of topics, but one thing they have in common is that they use plant datasets.

Self-Study Tutorial

In this tutorial, we will analyze RNA sequencing data to extract information about potential genes regulated in response to abiotic stress in plants. For this purpose, the chosen approach is the identification of genes with complementary regions to miRNAs upregulated in response to brassinosteroids.

Speaker

Cristóbal Gallardo

Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Single cell RNA-seq analysis is a cornerstone of developmental research and provides a great level of detail in understanding the underlying dynamic processes within tissues. In the context of plants, this highlights some of the key differentiation pathways that root cells undergo.

Speaker

Mehmet Tekman

Supported Servers

Supporting Materials

  • Tutorial: Analysis of plant scRNA-Seq Data with Scanpy
  • FAQ Document - Have a question about this training? Check here to see if it has already been answered
  • Slack channel: - Have question about the training? Did you run into a problem? Just wanna chat?
  • Finished the session? - Let us know that you've finished it, and what you thought of it! On Slack: . Thanks!
  • Enjoyed it? - Like the video on YouTube, Tweet (hashtag #usegalaxy), and follow the GTN on Twitter! @gxytraining
GTN logo

Visualisation in Galaxy

This module will cover various ways of visualizing your data in Galaxy

Self-Study Tutorial

Circos is a popular tool for creating circular graphs to display genomic data. This video will introduce this tool and how to use it within Galaxy

Speaker

Helena Rasche

Helena Rasche

Avans Hogeschool


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

JBrowse is a popular Genome Browser. It can be used directly within Galaxy. You may have seen it in our Mapping tutorial on Day 1. This tutorial will go into more depth about JBrowse and how to use it to explore your genome.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Volcano plots are commonly used to display the results of RNA-seq or other omics experiments. A volcano plot is a type of scatterplot that shows statistical significance (P value) versus magnitude of change (fold change). It enables quick visual identification of genes with large fold changes that are also statistically significant. These tutorials will first teach you how to create such plots from your RNA-Seq results in Galaxy, and then how you can further customized the plots using R directly within Galaxy.

Speaker

Maria Doyle

Supporting Materials

GTN logo

Galaxy for Non-Genomics

Galaxy is widely used for analysis of genomics data, but is not limited to any scientific domain. This module covers some non-genomics topics such as Climate research and cheminformatics.

Self-Study Tutorial

Terrestrial ecosystem models have been widely used to study the impact of climate changes on vegetation and terrestrial biogeochemical cycles in climate modelling community. They are also more and more applied in ecological studies to help ecologists to better understand the processes. But the technical challenges are still too high for most of the ecologists to use them. This practical aims at familiarizing you (especially ecologists) with running a terrestrial ecosystem model (i.e., CLM-FATES) at site-level in Galaxy and analyzing the model results.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Molecular dynamics (MD) is a method to simulate molecular motion by iterative application of Newton’s laws of motion. This tutorial provides an introduction to using high-throughput molecular dynamics to study protein-ligand interaction, as applied to the N-terminal domain of Hsp90 (heat shock protein 90).

Speaker

Chris Barnett

Chris Barnett

University of Cape Town


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Cheminformatics is the use of computational techniques and information about molecules to solve problems in chemistry. This involves a number of steps: retrieving data on chemical compounds, sorting data for properties which are of interest, and extracting new information. This tutorial will provide a brief overview of all of these, centered around protein-ligand docking, a molecular modelling technique.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

Molecular dynamics (MD) is a method to simulate molecular motion by iterative application of Newton’s laws of motion. It is often applied to large biomolecules such as proteins or nucleic acids. This is a introductory guide to using GROMACS (Abraham et al. 2015) in Galaxy to prepare and perform molecular dynamics on a small protein. For the tutorial, we will perform our simulations on hen egg white lysozyme.


Supported Servers

Supporting Materials

GTN logo

Self-Study Tutorial

This tutorial aims to present the PAMPA Galaxy workflow, how to use it to compute common biodiversity metrics from species abundance data and analyse it through generalized linear (mixed) models (GLM and GLMM). This workflow made up of 5 tools will allow you to process temporal series data that include at least year, location and species sampled along with abundance value and, finally, generate article-ready data products.


Supported Servers

Supporting Materials

GTN logo

All other Tutorials

Please also feel free to explore all the other tutorials on the GTN website, and do any that sound interesting to you. Our instructors will do their best to support you in any of these tutorials.



All done?

Please feel free to hang around in Slack and talk to us and the rest of the Galaxy community! Thanks for joining!!

Galaxy logo
Give us Feedback


Let us know what you thought about today! What did you like? Suggestions for improvements? Spot a typo? Tell us about it!

  • About the Materials? (slides, training manuals) use the feedback forms at the end of the tutorial
  • About other things? - Let us know in Slack (channel #feedback )



Socializing Pictograph
Socialize with each other!


Join one of the social sessions These include games, quizzes, discussions and more Check out all channels in Slack starting with #social- Want to organize something yourself? Tell us your idea and we will make a channel for it!



Galaxy Community image
Meet the Galaxy Community!


Enjoying the course so far? Learn more about the Global Galaxy Community and how YOU can become part of it, by watching this video!

Video Tutorial

Supporting Materials

GTN logo

feedback pictogram
Feedback Survey


Please take a moment to fill out this feedback survey . This helps use improve this event in the future.

Survey (~5 minutes): Click here!



certificate image
Course Certificates


Do you require a certificate for this course? Please fill out this Certificate Request Form to obtain a certificate of attendance.

You do not need to complete everything to receive a certificate. In the request form you will be able to indicate which parts you followed. We will also ask you to provide links to your Galaxy histories.



After the Course

All these materials will remain online, so you can continue working on them for as long as you want. The only difference will be that you should ask your questions on the GTN Gitter channel, instead of Slack.




Acknowledgements

This Global Galaxy course is only possible thanks to a Global network of instructors and institutes.

Presenters & Instructors & Facilitators & Community Caption Contributors

jvanbraekel
Julien Van Braekel
blankenberg
Daniel Blankenberg
shiltemann
Saskia Hiltemann
pvanheus
Peter van Heusden
galaxycommunity
The Global GTN Community
rwinand
Raf Winand
abretaud
Anthony Bretaudeau
annefou
Anne Fouilloux
ic4f
Sergey Golitsynskiy
mvdbeek
Marius van den Beek
nsoranzo
Nicola Soranzo
hexylena
Helena Rasche
delphine-l
Delphine Lariviere
eancelet
Estelle Ancelet
nekrut
Anton Nekrutenko
MariaTsayo
Maria Tsagiopoulou
assuntad23
Assunta DeSanto
jennaj
Jennifer Hillman-Jackson
malloryfreeberg
Mallory Freeberg
petrnovak
Petr Novák
timothygriffin
Timothy J. Griffin
fpsom
Fotis E. Psomopoulos
FilipposZ
Filippos Zacharopoulos
andrewr
Andrew Rajczewski
s3by01
Sébastien Fouilloux
kmurat1
Katarzyna Kamieniecka
fubar2
Ross Lazarus
cat-bro
Catherine Bromhead
kpoterlowicz
Krzysztof Poterlowicz
ddan
Dinh Duy An Nguyen
davelopez
David Lopez
miaomiaozhou88
Miaomiao Zhou
ennovytje
Yvonne Hiltemann
willemdek11
Willem de Koning

Institutions

logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo logo