October 2013 Galaxy Update

Galaxy Updates

Welcome to the October 2013 Galaxy Update, a monthly summary of what is going on in the Galaxy community. Galaxy Updates complement the Galaxy Development News Briefs which accompany new Galaxy releases and focus on Galaxy code updates.

New Public Servers

A record six new servers joined the list of over 40 publicly accessible Galaxy servers in September.



CNIC.DarwinTree is a molecular data analysis and application environment for DarwinTree, GSQCS, iDNAbar, NANNO, IPIP, and many other tools for phylogenetic analysis.

CNIC.DarwinTree is supported by the Chinese Academy of Sciences Computer Network Information Center (中国科学院计算机网络信息中心), and has [Email support](mailto:support AT cnic DOT cn) as well.

Fast UniFrac

Fast UniFrac

Fast UniFrac provides a suite of tools for the comparison of microbial communities using phylogenetic information. See "Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data" by M Hamady, C Lozupone and R Knight, The ISME Journal (2010) 4, 17–27; doi:10.1038/ismej.2009.97

No login is required to use Fast UniFrac and the site has [email support](mailto:MicrobiomeHelp AT colorado DOT edu). Fast UniFrac is sponsored by the Knight Lab at the University of Colorado at Boulder.



kmer-SVM is "a tool suite designed to aid in analysis of next-generation sequencing (NGS) data. Our suite uses a support vector machine (SVM) with kmer sequence features to identify predictive combinations of short transcription factor binding sites which determine the tissue specificity of the original NGS assay. Information gained from kmer-SVM can be used as an additional source of confidence in genomic experiments by recovering known binding sites, and can also reveal novel sequence features and possible cooperative mechanisms to be tested experimentally."

The kmer-SVN server is describeded in "kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets." Christopher Fletez-Brant; Dongwon Lee; Andrew S. McCallion; Michael A. Beer Nucleic Acids Research 2013; doi: 10.1093/nar/gkt519

A tutorial on using the web server and a Galaxy Tool Shed repository are also available. [Email support](mailto:kmersvm DOT team AT gmail DOT com) is provided, and the project is a collaboration between Christopher Fletez-Brant and Dongown Lee respectively of the McCallion Lab of the McKusick-Nathans Institute of Genetic Medicine at the Johns Hopkins University School of Medicine and the Beer Lab of the Johns Hopkins University Department of Biomedical Engineering.

NGS-QC Generator

Gronemeyer Lab Galaxy and NGS-QC Generator

NGS-QC Generator evaluates the quality of ChIP-seq and enrichment-related NGS data using NGS-QC Generator, which also includes a database with preprocessed profiles and a tutorial on how to analyze sequencing profiles yourself.

See A quality control system for profiles obtained by ChIP sequencing by Mendoza-Parra, et al., in Nucl. Acids Res. (2013)

The server is hosted by the Gronemeyer lab and email support is available. You must have an account to use the server; anyone can create an account. "Due to storage space constraints, uploaded datasets into the Galaxy instance may not be available for more than 24hours, thus we strongly suggest users to download their processed files as early as possible."


Laboratory of Biological Physics at USP-FCFRP

The Laboratory of Biological Physics at USP-FCFRP Galaxy server has these tools: ProtPred-GROMACS: Ab initio protein structure prediction framework which uses evolutionary algorithms to optimize objective functions such as potential energy, number of hydrogen bonds and solvent accessible surface area. 2PG: Faster and simpler algorithm in which one objective function is optimized
3PG: A more robust approach in which multiple objective functions are optimized at the same time Protein Validation: Check the integrity of a PDB file
Protein Desolvation: Remove a solvation layer around your protein.

The Laboratory of Biological Physics, part of the Faculty of Pharmaceutical Sciences of Ribeirão Preto (FCFRP) at the University of São Paulo (USP). "You can use tools developed at our laboratory and by our collaborators for free. You just need to sign up to the server using your academic e-mail."


ValidatorMAX at The University of Chicago

This server implements ValidatorMAX: confident identification of stable isotope-labeled peptide pairs and associated abundance ratios. The server is sponsored by the Department of Molecular Genetics and Cell Biology, Ludwig Center for Metastasis Research, Department of Pediatrics, Center for Research Informatics, Computation Institute, The University of Chicago

New Papers

# Tag    # Tag    # Tag
2 Cloud 2 RefPublic - UseLocal
2 HowTo - Reproducibility 10 UseMain
4 IsGalaxy 1 Shared 1 UsePublic
21 Methods 8 Tools 1 Visualization
- Project 2 UseCloud 14 Workbench

44 new papers were added to the Galaxy CiteULike Group in September. In addition to papers featuring the kmer-SVM and NGS-QC Generator (see above) these papers may be particularly interesting to the Galaxy community:

And the Galaxy Project now has it's own CITATION file. CITATION files "tell readers how best to cite that software." Also see the Citing Galaxy wiki page.

Who's Hiring

Please Help! Yes you!

The Galaxy is expanding! Please help it grow.

Got a Galaxy-related opening? Send it to outreach@galaxyproject.org and we'll put it in the Galaxy News feed and include it in next month's update.


There is a lot going on in the next three months, *with half of it happening in the Southern Hemisphere.* Also see the [Galaxy Events Google Calendar](http://bit.ly/gxycal) for details on other events of interest to the community.

## Australia!
The Genomic Bioinformatics Workshop AMATA early career researcher workshop eResearch Australasia 2013 QFAB Workshops

In the next 3 months there are no less than 13 talks and workshops at at least 5 different venues on at least 7 distinct dates. Those 13 known events include 8 workshops. If you don't find some Galaxy training this spring then you aren't trying.
Date Topic/Event Venue/Location Contact
September 28 - October 1 Galaxy Workshop The Genomic Bioinformatics Workshop, Sydney, Australia Ross Lazarus, Dan Blankenberg
October 13-16 AMATA early career researcher workshop AMATA 2013, Gold Coast, Queensland, Australia Mark Crowe, Annette McGrath
October 20-25 Genomics Virtual Laboratory Workshop eResearch Australasia 2013, Brisbane, Australia Mark Crowe, Ron Horst, Andrew Lonie
Image Analysis and Processing in the Clouds using Scalable eResearch Workflow-Based System Project Team
MILXCloud: A faster, smarter way to process Medical imaging data in the cloud Neil Burdett
Lessons from developing the Genomics Virtual Lab Ron Horst
Early Outcomes of the Characterisation Virtual Laboratory Wojtek Goscinski, Anitha Kannan
The Human Communication Science Virtual Lab Peter Sefton
November 4 RNA-Seq Analysis using Galaxy workshop The University of Queensland, St Lucia, Queensland, Australia Mark Crowe
November 6 Workshop: An Introduction to Variant Detection using Galaxy Children’s Medical Research Institute, Westmead, New South Wales, Australia Mark Crowe
November 7 RNA-Seq Analysis using Galaxy workshop Children’s Medical Research Institute, Westmead, New South Wales, Australia Mark Crowe
November 22 Variant Detection using Galaxy workshop The University of Queensland, St Lucia, Queensland, Australia Mark Crowe
December 9 Workshop : An Introduction to De Novo Genome Assembly using Galaxy The University of Queensland, St Lucia, Queensland, Australia Mark Crowe

All Those Other Continents

And don't worry, you won't have to travel to Australia during its spring just to learn about Galaxy. There are also upcoming events in North America, Europe, and Africa.

Individualizing Medicine Conference   Beyond the Genome   Analisi dati Next Generation Sequencing in Galaxy: exome, RNA-Seq, metagenomica   Galaxy Training Days 2013 South Africa Galaxy Workshop Tour
Date Topic/Event Venue/Location Contact
September 30 - October 2 Analysis of Genomic Sequence Data With Galaxy, part of the Cancer Care session Individualizing Medicine Conference, Mayo Clinic, Rochester, Minnesota, United States James Taylor
October 1-3 Galaxy as a platform for High-throughput Genomics Beyond the Genome 2013, San Francisco, California, United States Jeremy Goecks
Development of a genomic region database and analysis tool for the Galaxy platform Matloob Khushi
October 7-8 Next-Generation Sequencing Data Interpretation: Enhancing Reproducibility and Accessibility NGS & Bioinformatics Summit Europe, Berlin, Germany Anton Nektutenko
Using Galaxy to Provide a NGS Analysis Platform Hans-Rudolf Hotz
October 8-11 Analisi dati Next Generation Sequencing in Galaxy: exome, RNA-Seq, metagenomica CRS4, Loc. Pixinamanna, Pula CA, Italy Paolo Uva
October 9-11 Galaxy Training Days GenoToul bioinformatics facility, INRA, Toulouse Auzeville, France Sarah Maman
October 14-18 NGS Data Analysis and Galaxy Workshop
University of Pretoria
2013 South Africa Galaxy Workshop Tour
Application deadline is 17 September
Dave Clements
October 21-25 NGS Data Analysis and Galaxy Workshop
University of Cape Town
October 22 Analyzing NGS Data with Galaxy Dana-Farber Cancer Institute, Boston, Massachusetts, United States Anton Nekrutenko, Jennifer Jackson, Anushka Brownley
October 22-26 Introduction to Integrative Analysis with GenomeSpace ASHG 2013, Boston, Massachusetts, United States Michael Reich
High Throughput Data Analysis and Visualization with Galaxy
This workshop is sold out.
Anton Nekrutenko, Jennifer Jackson
Poster 1633T: Globus Genomics: Enabling high-throughput analysis and management of NGS data for neurodevelopmental disorders Ravi Madduri, Dinanath Sulakhe, Alex Rodriguez and Paul Dave
And see the Globus Genomics Booth
Poster 1530W: Implementing a High Performance, Reusable Consensus Calling Pipeline for Next Generation Sequencing using Globus Genomics
Poster 1510T: Consensus Genotyper for Exome Sequencing: Improving the Quality of Exome Variant Genotypes
October 31 - November 1 Advanced NGS Course: RNA-Seq Data Analysis Leiden, The Netherlands Celia van Gelder, Hoen, 't Peter-Bram
November 2-6 A zebrafish cloud computational resource for environmental health science research 141st American Public Health Association (APHA) Annual Meeting, Boston, Massachusetts, United States Peter Tonellato
November 6-12 Computational and Comparative Genomics Course
Application Deadline: July 15, 2013
Cold Spring Harbor Laboratory, New York, United States James Taylor
January 11-15 Plant and Animal Genome XXII (PAG 2014) San Diego, California, United States Dave Clements
January 16-17 2014 GMOD Meeting San Diego, California, United States Dave Clements

Online Materials from Past Events

2013 Galaxy Community Conference (GCC2013)

Introduction to Galaxy Boot Camp @ UC Davis Bioinformatics Core

First, all videos from GCC2013 are now also available on Vimeo. They are still available on the Galaxy web site as well, but having them on Vimeo allows you to do all the things an online video provider supports, such as automatically embedding and linking videos from popular web sites.

Two other resources (at least) became available in September:

See the list of other tutorials on the Learn hub page for more.

Galaxy Distributions

The most recent Galaxy distribution was August 12.

A new version of CloudMan was released in July.

Tool Shed Contributions

Galaxy Tool Shed

There were many...

  • naive_variant_caller: process aligned reads, produce VCF file containing per position variant calls
  • barrnap: ribosomal RNA finder for Bacterial genome sequences
  • prokka (and a second wrapper): rapid annotation of prokaryotic genomes
  • coverage_report: Generate Detailed Coverage Report from BAM file
  • mugsy: multiple whole genome aligner
  • edena: a de novo short reads assembler
  • edge_pro: efficient gene expression level estimation in prokaryotic genomes from RNA-seq
  • glimmer: find genes in microbial DNA, especially bacteria, archaea, and viruses
  • get_fasta_from_taxon: Get FASTA from NCBI taxonomy ID
  • sniploid2: compare SNPs detected from a polyploid to SNPs derived from its parental genomes
  • ssake: de novo assembly of millions of very short DNA sequences
  • sspace: scaffolding pre-assembled contigs using paired-read data
  • annovar: Functional annotation of genetic variants from high-throughput sequencing data
  • merge_fna_qual: Merges 454 Fasta and Quality files into a Fastq file.
  • ncbi_sra_toolkit: NCBI Sequence Read Archive toolkit utilities
  • sql_tools and sparql_tools: In-memory SQL and SPARQL runners and related tools
  • compute_motifs_frequency, ctd_batch, compute_motif_frequencies_for_all_motifs, indels_3way, t_test_two_samples, xy_plot, divide_pg_snp, draw_stacked_barplots, split_paired_reads, mutate_snp_codon, hgv_hilbertvis, hgv_fundo, categorize_elements_satisfying_criteria, microsatellite_birthdeath, multispecies_orthologous_microsats, fasta_nucleotide_changer, fasta_clipping_histogram, fasta_formatter, fastq_quality_boxplot, fastq_quality_converter fastq_quality_filter fastq_to_fasta, and the fastx_toolkit all from devteam.