Training Day


GCC2014 Training Day

The 2014 Galaxy Community Conference (GCC2014) started on June 30 with a Training Day featuring five parallel tracks, each with several two and a half hour workshops. Tracks cover using Galaxy for biological research, and deploying and managing Galaxy instances, and everything in between.

Topics were nominated by the Galaxy Community in December and voted on in January.

Workshops will be hands-on and participants are strongly encouraged to bring a laptop and follow along.

Meeting participants select which Training Day topics to attend when they register.

Day 0: Learning Galaxy

Time Barber Room 302 Salon A Room 303 Salon B Room 303 Salon C Room 303 Multipurpose Room 324
8:00 Registration Opens and Catered Breakfast
9:00 Visualization of NGS data Raisins & Rabbit Turds: NGS Quality Control with Galaxy Galaxy Internals: Flow control within Galaxy Galaxy installation and administration Training with Galaxy: a Genome Assembly Example
11:30 Lunch
Sponsored by EMC Isilon
EMC Isilon
12:30 Galaxy on a Cluster - User and Project Management Galaxy Automation: Using the API Tool Development from bright idea to toolshed - Designing a Galaxy Tool RNA-Seq Analysis with Galaxy and the Tuxedo Suite 3D Genome Analysis with Galaxy
3:00 Break
3:30 RNA-Seq Analysis with Galaxy and Alternative Tools Tool Development from bright idea to toolshed - Data Managers Visualization of NGS data Scriptable Bioinformatics Cloud Infrastructures with Cloud BioLinux, CloudMan & Galaxy Galaxy on a Cluster - User and Project Management
6:00 Break
6:15 Dinner (on your own) / Birds-of-a-Feather Flock 1
10:00 Finish


Biologist-centric Developer-centric

Prerequisites and Technology

Each topic lists prerequisites for what you should know, and what hardware and software you will need for each.

Hardware

In almost every workshop participants will need a wifi-enabled laptop.

Software

In general:

All Workshops: Require a web browser such as Chrome, Firefox, or Safari. The latest version of Internet Explorer should also work.

Deployment and Development Workshops: Require the VirtualBox virtual machine (VM) software on your laptop. See the Training Day VMs page for more. This should be installed before you arrive at the conference.

See each individual workshop's list of prerequisite for specifics.

Topics

Using Galaxy Topics

Raisins and Rabbit Turds: NGS Quality Control with Galaxy

Instructors Tom Bair, University of Iowa
Jennifer Jackson, Penn State University
Content Often the first step in next generation sequencing data analysis is quality control. How reliable is the data? Does it have GC bias, or inaccuracies at the read ends, or contamination, or barcode corruption, or any number of other conditions that need to be detected and dealt with before the science begins. This workshop will provide hands-on experience performing quality control checks and how to get your data analysis-ready using Galaxy.

This workshop is also a good introduction to Galaxy for those who are not familiar with it.

Slides, Page with Histories on UseGalaxy.org for continued learning
 Amazon Web Services 
This workshop uses AWS-based compute infrastructure
Prerequisites * A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

This title was inspired by Richard Smith's talk on "Experimental design: the importance of filtering" at the Iowa Iowa Institute for Human Genetics' Bioinformatics Short course

Training with Galaxy: a Genome Assembly Example

Instructors Simon Gladman, VLSCI
Andrew Lonie, University of Melbourne
Content The Australian Genomics Virtual Laboratory (GVL) has developed a range of online tutorials based on Galaxy to aid in training and dissemination of bioinformatics expertise. The tutorials are completely self contained (data, workflows, rationale and background) and cover a range of introductory and advanced topics including genome assembly, variant detection and RNA-seq. This workshop will provide an overview of the available tutorials followed by a hands-on session based on a microbial genome assembly tutorial. To perform the analysis, participants will use cloud instances of the GVL platform.

Slides
 Genomics Virtual Lab 
This workshop uses GVL-based compute infrastructure
Prerequisites A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101), or attendance at the "Raisins and Rabbit Turds: NGS Quality Control with Galaxy" session. A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

RNA-Seq Analysis with Galaxy and the Tuxedo Suite

Instructors Saskia Hiltemann, Erasmus Medical Center
Youri Hoogstrate, Erasmus Medical Center
Hailiang (Leon) Mei, Leiden University Medical Center
Content This hands-on workshop will demonstrate basic RNA-Seq transcript level comparison analysis using the Tophat (Bowtie), Cufflinks, Cuffmerge and Cuffdiff tools in Galaxy. We will compare the expression of genes under two conditions.

We will demonstrate this analysis both with an installed reference genome and with a non-installed organism.

Sample datasets small enough to be successfully processed during the course of the seminar will be provided. Participants will perform the analyses themselves on the provided cloud instance of Galaxy.

Handouts, Slides
 Amazon Web Services 
This workshop uses AWS-based compute infrastructure
Prerequisites A general knowledge of Galaxy and NGS quality control issues and tools, or attendance at the "Raisins and Rabbit Turds: NGS Quality Control with Galaxy" session. A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

RNA-Seq Analysis with Galaxy and Alternative Tools

Instructors Saskia Hiltemann, Erasmus Medical Center
Youri Hoogstrate, Erasmus Medical Center
Hailiang (Leon) Mei, Leiden University Medical Center
Content The Tuxedo suite of RNA-Seq tools (Cuff, Tophat, ...) are installed on many Galaxy instances, including /src/Main/index.md and /src/CloudMan/index.md installs. However, many other options are available. For example, Htseq, EdgeR and DESeq are also widely used, take a different approach to RNA-Seq analysis and return different results from the Tuxedo suite.

This workshop would introduce alternative methods for RNA-Seq analysis, cover how to install them from the /src/Tool Shed/index.md and to test they are properly installed. The workshop could finish by comparing results from these tools with those from the Tuxedo suite.

Handouts, Slides, Handout Answers
 Amazon Web Services 
This workshop uses AWS-based compute infrastructure
Prerequisites A general knowledge of Galaxy and NGS quality control issues and tools, or attendance at the "Raisins and Rabbit Turds: NGS Quality Control with Galaxy" session. Familiarity with the Tuxedo suite or attendance at the RNA-Seq Analysis with Galaxy and the Tuxedo Suite session A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Visualization of NGS data

Instructors Jeremy Goecks, George Washington University
Aysam Guerler, Johns Hopkins University
Content Different ways of visualizing NGS data more on downstream analysis such as heat maps, pathway networks and R based charts and graphs. This workshop will cover both primary NGS analyses --alignments, variants, annotations -- as well as downstream options.

Slides
Video: Create Trackster (genome browser) visualization and explore data
Video: Visual Analysis in Trackster and Sweepster
 Amazon Web Services 
This workshop uses AWS-based compute infrastructure
Prerequisites A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101), or attendance at the "Raisins and Rabbit Turds: NGS Quality Control with Galaxy" session. A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

3D Genome Analysis with Galaxy

Instructors Jonas Paulsen, University of Oslo
Tonje Lien Gulbrandsen, University of Oslo
Morten Johansen, University of Oslo
Karen Reddy, Johns Hopkins University
Content The session will introduce the basics of tracks and track types, and how these relate to hypothesis formulation and statistical analysis, using the Galaxy-based Genomic HyperBrowser. The emphasis will be on analysing, interpreting and integrating 3D genomic data (such as Hi-C), using the HiBrowse system. In addition to introducing the general concepts, the session will show examples on how 3D genome analyses can be combined with other HyperBrowser and Galaxy tools, in order to go from initial hypotheses to final results.

Slides
Prerequisites A general knowledge of Galaxy (for example, you should be familiar with the material in Galaxy 101), or attendance at the "Raisins and Rabbit Turds: NGS Quality Control with Galaxy" session. A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Deployment and Development Topics

Galaxy Installation and Administration

Instructors Nate Coraor, Penn State University
John Chilton, Penn State University
Content Topics:
• Installing Galaxy on a standalone system
• Installing Galaxy in a cluster environment
• Common administrative tasks
• Tool installation (using Tool Shed and manually)
• Reference genome installation and configuration
• Misc. (user authentication, data libraries, other...)
• Upgrading
• Troubleshooting

Workshop Walkthrough
Virtual Machine Images
This workshop will require that you have the VirtualBox player installed on your laptop.
Prerequisites Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop. Secure Shell (SSH) client software such as PuTTY for Windows, or the Terminal Application that comes with Mac OS. Virtual machine (VM) player software, VirtualBox is recommended and has been tested with the conference virtual machine images. The virtual machine image for this workshop. * A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Galaxy on a Cluster - User and Project Management

Instructors Nikolay Vazov, University of Oslo
Katerina Michalickova, University of Oslo
Content Galaxy is more and more often used as a front-end to huge HPC resources. At the same time, the HPC facilities require solid user authentication procedures and accounting mechanisms allowing to control the use of HPC resources. We will provide an overview of issues and several possible approaches the problem. Participants will then install a specific third party solution (GOLD) into a test Galaxy.

Slides
Virtual Machine Images
This workshop will require that you have the VirtualBox player installed on your laptop.
Prerequisites Experience maintaining a production Galaxy server (recommended) Secure Shell (SSH) client software such as PuTTY for Windows, or the Terminal Application that comes with Mac OS. Virtual machine (VM) player software, VirtualBox is recommended and has been tested with the conference virtual machine images. The virtual machine image for this workshop. * A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Galaxy Automation: Using the API

Instructors Dannon Baker, Johns Hopkins University
Carl Eberhard, Johns Hopkins University
Content Galaxy has a growing API that allows for external programs to control the system, search the resources, and issue work requests. The session would cover programmatic access of the API either by direct REST web calls or by using the BioBlend/blend4j APIs.

Slides and Scripts
Virtual Machine Images
This workshop will require that you have the VirtualBox player installed on your laptop.
Prerequisites Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop. A knowledge of Python programming. Secure Shell (SSH) client software such as PuTTY for Windows, or the Terminal Application that comes with Mac OS. A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Tool Development from bright idea to toolshed - Designing a Galaxy Tool

Instructors Greg Von Kuster, Penn State University
Björn Grüning, University of Freiburg
Peter Cock, James Hutton Institute
Content Galaxy provides an easy way to create reproducible, sharable, easy-to-use analytical workflows… if every step of the analysis has a galaxy tool available to perform that application.

The Galaxy Toolshed offers a place to share tools that can be imported into a Galaxy Server to complete an analysis workflow. Installation of a well-designed tool can be as simple as a couple button clicks by a Galaxy administrator.

This session covers development process and the design considerations for stocking the toolshed with well-designed, easy-to-install tools. We will design a couple tools, determining how to lay out the inputs and parameters, generate the command line with the cheetah template, and add test cases. Then we’ll submit them to a toolshed, and install them in our galaxy server.

Slides: PDF, SlideShare
Virtual Machine Images
This workshop will require that you have the VirtualBox player installed on your laptop.
Prerequisites Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop. Secure Shell (SSH) client software such as PuTTY for Windows, or the Terminal Application that comes with Mac OS. Virtual machine (VM) player software, VirtualBox is recommended and has been tested with the conference virtual machine images. The virtual machine image for this workshop. * A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Tool Development from bright idea to toolshed - Data Managers

Instructors JJ Johnson, University of Minnesota
Dan Blankenberg, Penn State University
Content Galaxy tools can require installed reference data in order to be used effectively. For example, Bowtie requires prebuilt indexes in order to efficiently map sequences to a genome.

Data Managers enable a Galaxy administrator to add reference data to a Galaxy server via the admin webpage.

This session covers the tool and toolshed requirements for using reference data within galaxy tools, and the design and development of tool data managers to install reference data on a Galaxy server.

Tutorial
Virtual Machine Images
This workshop will require that you have the VirtualBox player installed on your laptop.
Prerequisites Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop. Secure Shell (SSH) client software such as PuTTY for Windows, or the Terminal Application that comes with Mac OS. Virtual machine (VM) player software, VirtualBox is recommended and has been tested with the conference virtual machine images. The virtual machine image for this workshop. * A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Galaxy Internals: Flow control within Galaxy

Instructors James Taylor, Johns Hopkins University
Content Galaxy deployers often face problems in customizing the galaxy instance because of the lack of documentation that talks about how the control flows within Galaxy when job is run. This workshop will help deployers understand the Galaxy's internals.

Slides, SlideShare
Prerequisites * Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop.

Scriptable Bioinformatics Cloud Infrastructures with Cloud BioLinux, CloudMan & Galaxy

Instructors Ntino Krampis, JCVI
Enis Afgan, Ruđer Bošković Institute (RBI)
Ravi Sanka, JCVI
Brad Chapman, Harvard University
Content This workshop will provide instruction on building bioinformatics infrastructures with Galaxy as front-end, combined with Cloud BioLinux for standardization and CloudMan for scalability in the back-end. It will be a technically-oriented workshop targeted to software developers, and will provide a tutorial how to jointly leverage the three systems for building bioinformatics applications on various cloud platforms including Amazon, OpenStack and Eucalyptus.

The basics of deploying bioinformatics tools and pipelines on Galaxy running pre-configured on a Virtual Machine will be demonstrated. We will then move onto methods for standardizing deployment of complex bioinformatics pipelines through Galaxy by leveraging the Python Fabric scripts of Cloud BioLinux, in order to achieve interoperability and easy deployment across the various cloud platforms. The software blueprint of CloudMan for instantiating and using virtualized clusters connected to the Galaxy back-end will be presented, in addition to best practices for designing bioinformatics applications that leverage the distributed computing capabilities offered by the CloudMan framework.

All concepts will be demonstrated through hands-on sessions where users will deploy tools through Galaxy, build VMs through Cloud BioLinux, instantiate clusters and data volumes and run distributed computing through CloudMan, using Amazon or Eucalyptus clouds.

CloudMan Slides, CloudBioLinux Slides, Viral Cloud Slides
 Amazon Web Services 
This workshop uses AWS-based compute infrastructure
Prerequisites Knowledge and comfort with the Unix/Linux command line interface and a text editor. If you don't know what cd, mv, rm, mkdir, chmod, grep and so on can do then you will struggle in this workshop. Secure Shell (SSH) client software such as PuTTY for Windows, or the Terminal Application that comes with Mac OS. An account on Amazon Web Services. These can be setup for free, but it does require a credit card. A wi-fi enabled laptop with a modern web browser. Google Chrome, Firefox and Safari will work best.

Nomination, Voting and Topic Selection

Training Day topics were selected by the Galaxy Community. Topic were first nominated and then voted on by the community. The schedule above is the direct result of that process.

Feedback from the GCC2014 Training Day

"Have attended many training days/tutorials in my 15yr career. This was for me the most fruitful (and thus the best) so far."
"I could definitely tell a lot of time and energy went into planning this training day/conference, and for that I want to say thank you."
"Intense but great"
"All the instructors were amazing."
"Overall a great experience"
  • "Have attended many training days/tutorials in my 15yr career. This was for me the most fruitful (and thus the best) so far."
  • "I could definitely tell a lot of time and energy went into planning this training day/conference, and for that I want to say thank you."
  • "Intense but great"
  • "All the instructors were amazing."
  • "Overall a great experience"

Training Day Sponsor

Amazon Web Services

Questions? Contact the Organizers.