January 2015 Galaxy Update
Welcome to the January Galaxy Update, a summary of what is going on in the Galaxy community. Galaxy Updates complement the Galaxy Development News Briefs which accompany new Galaxy releases and focus on Galaxy code updates.
63 papers referencing, using, extending, and implementing Galaxy were added to the Galaxy CiteULike Group in December, bringing the total to over 2000 papers. Some of those new papers are:
Workflow4Metabolomics: A collaborative research infrastructure for computational metabolomics Bioinformatics (19 December 2014), btu813, doi:10.1093/bioinformatics/btu813, by Franck Giacomoni, Gildas Le Corguillé, Misharl Monsoor, et al.
MetaNET - a web-accessible interactive platform for biological metabolic network analysis BMC Systems Biology, Vol. 8, No. 1. (5 December 2014), doi:10.1186/s12918-014-0130-2, by Pankaj Narang, Shawez Khan, Anmol J. Hemrom, Andrew M. Lynn
ImmunoGlobulin galaxy (IGGalaxy) for simple determination and quantitation of immunoglobulin heavy chain rearrangements from NGS BMC Immunology, Vol. 15, No. 1. (13 December 2014), 59, doi:10.1186/s12865-014-0059-7, by Michael J. Moorhouse, David van Zessen, Hanna IJspeert, et al.
Also see the Immunoglobulin Galaxy VM
VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases Nucleic Acids Research (15 December 2014), doi:10.1093/nar/gku1117, by Gloria I. Giraldo-Calderón, Scott J. Emrich, Robert M. MacCallum, et al.
The new papers covered these topics:
|Nominate a topic now!|
GCC2015 Training Day topics are nominated by you, the Galaxy Community. Please take a few minutes to nominate a topic. Any topic of interest to the Galaxy Community can be nominated and you are encouraged to nominate more than one topic. If you are looking for ideas, see what topics were nominated in 2013 and 2014, and the Events and the Events Archives.
Nominated topics will be published on the Training Day page as they come in. Nominations close 6 January. Topics will be compiled into a uniform list by the GCC2015 Organising Committee, and topics will be voted on by the Galaxy Community 12-23 January.
Topics will then be selected and scheduled based on topic interest, and the organisers' ability to confirm instructors for each session. Some very popular sessions may be scheduled more than once. The final schedule will be posted before registration opens.
The 2015 Galaxy Community Conference (GCC2015) is now accepting Sponsorships. Your organisation can play a prominent part in the Galaxy community by sponsoring GCC2015. Sponsorship is an excellent way to raise your organization’s visibility.
Several sponsorship levels are available, including two levels of premier sponsorships that include presentations. Premium sponsorships are limited, however, so you are encouraged to act soon.
Please let the GCC2015 Organising Committee (gcc2015-org AT lists DOT galaxyproject DOT org) know if you are interested in helping make this event a success.
|January 10-14||Galaxy for SNP and Variant Data Analysis||Plant and Animal Genome XXIII (PAG2014), San Diego, California, United States||Dave Clements|
|January 15||Galaxy Workshops||San Diego State University, San Diego, California, United States||Dave Clements|
|January 15-16||Accessible and Reproducible Genomics at Scale with Galaxy||Revolutionizing Next-Generation Sequencing: Tools and Technologies, Leuven, Belgium||James Taylor|
|January 19-20||NGS pipelines with Galaxy||e-Infrastructures for Massively Parallel Sequencing, SciLifeLab, Uppsala, Sweden||Luca Pireddu|
|February 9-13||Analyse bioinformatique de séquences sous Galaxy||Montpellier, France||J.F. Dufayard|
|February 16-18||Accessible and Reproducible Large-Scale Analysis with Galaxy||Genome and Transcriptome Analysis, part of Molecular Medicine Tri-Conference, San Francisco, California, United States||James Taylor|
|Large-Scale NGS data Analysis on Amazon Web Services Using Globus Genomic||Genomics & Sequencing Data Integration, Analysis and Visualization, part of Molecular Medicine Tri-Conference, San Francisco, California, United States||Ravi Madduri|
|iReport: An Integrative “omics” Reporting and Visualisation Platform||Andrew Stubbs|
|May 25-29||MIPRO||Opatija, Croatia||Enis Afgan|
|July 6-8||2015 Galaxy Community Conference (GCC2015)||The Sainsbury Lab, Norwich, United Kingdom||Galaxy Outreach|
We just don't know exactly when, yet ...
Except for a meetup at GCC2014, the GalaxyAdmins group has been hiatus for a while. Well, thanks to the efforts of Hans-Rudolf Hotz the GalaxyAdmins group is back. Starting this month, we will resume our bi-monthly conference calls.
Galaxy-Admins is a discussion group for Galaxy community members who are responsible for large Galaxy installations.
In an effort to better serve the global community we will rotate the time of the call in different months to be convenient for different parts of the world. To help us pick the best set of times for the calls, please take a few minutes and fill out this Doodle poll. It covers every hour in a 72 hour period. Please don't forget to select the timezone before you start filling it out.
The likely presentation for the January meetup will be a summary of the Galaxy Community Survey last fall, presented by Dave Clements. The survey covered both Galaxy usage and deployment. The survey results will be published online before the call.
|Take the Doodle poll|
The Galaxy is expanding! Please help it grow.
- Systems Administrator / Information Technologist, McArthur Lab in the McMaster University Department of Biochemistry & Biomedical Sciences
- Senior Development Engineer - Bioinformatics, and Bioinformatician II, University of Massachusetts Medical School
- Searching for bioinformaticians, post-docs, PhD students and software engineers in Freiburg, Germany at Max Planck Institute of Immunobiology and Epigenetics, and the Bioinformatics Group at the University of Freiburg
- Statistical Genomics Postdoc opening in the Makova lab at Penn State
- The Galaxy Project is hiring software engineers and post-docs
2 new public Galaxy servers were added in DecemberMetaNET server MetaNET - a web-accessible interactive platform for biological metabolic network analysis, by Pankaj Narang, Shawez Khan, Anmol J. Hemrom, Andrew M. Lynn, BMC Systems Biology, Vol. 8, No. 1. (5 December 2014), doi:10.1186/s12918-014-0130-2 Domain/Purpose: MetaNET is a web-accessible interactive platform for biological metabolic network analysis. Comments: From the User Manual: MetaNET is designed to provide a user-friendly rich interface for the analysis of genome-scale metabolic networks under various genetic and environmental conditions. The framework is built with a set of tools for data management including data upload/download, file format conversion, file operations and data extraction capabilities from SBML files, optimizing network using flux balance analysis, flux variability analysis, perturbation analysis via single or pairwise genes/reactions/catalysts knock-out. The tools can also be interconnected through workflows to perform simulations of higher order. User Support: User Manual Metanet User Group MetaNET Team Sponsor(s): * School of Computational and Integrative Sciences, Jawaharlal Nehru University ## VectorBase Galaxy
- A full Galaxy server that includes reference information and workflows focusing on invertebrate vectors of human pathogens.
- from VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases, Gloria I. Giraldo-Calderón et al., Nucleic Acids Ressearch (2014) doi: 10.1093/nar/gku1117: "VectorBase has also made available the latest relevant canonical data in this Galaxy instance. Examples of workflow analyses include alignment of Next Generation Sequence (NGS) data sets, calculation of expression values, or predicting single nucleotide polymorphisms (SNPs). Registered users can store their raw data, intermediate files and final analysis results for download or direct sharing via the BRC, which greatly improves the ability to collaborate with colleagues and VectorBase developers."
- User Support:
|Share your training resources and experience now||Share your experience now||Describe your instance now|
Several new Training Resources were added in December:
- UC Davis RNA-Seq and ChIP-Seq Analysis with Galaxy Workshop from the UC Davis Bioinformatics Core
- Running your own Galaxy instance (in the cloud) from the Center for Health Bioinformatics at the Harvard School of Public Health
- Introduction to Galaxy from the Center for Health Bioinformatics at the Harvard School of Public Health
Here are new contributions for the past month.
In no particular order:
- rnabob: Fast pattern searching for RNA structural motifs RNABOB is an implementation of D. Gautheret's RNAMOT, but with a different underlying algorithm using a nondeterministic finite state machine with node rewriting rules.
- frp_tool: Scripts to create a fragment recruitment plot Python scripts using matlibplot to create scatter plots of metagenomic reads aligned against a reference genome.
- ngsaligners: NGS aligners Aligners for NGS sequence analysis
trtr: Version 1.0. TRTR Trim Reads of Tandem Repeats. Recommended before calling SNPs. This tool removes tandem repeats from ends of unaligned sequencing reads (leaving one copy). This prevents reads that don't span the repeated region from overlapping, leading to innaccurate SNPs calls.
The maximum repeat length is adjustable (use 1 to trim only homopolymers).
The "aggressive" option should not be touched in general. Setting to 0 will prevent the program from trimming to exactly 1 copy of the repeat, instead leaving between 1 and 2 copies.
This could also be a useful first step before assembly. More testing needs to be done.
- mimodd_fileinfo: use MiModD to explore metadata in various NGS file formats install this tool from the suite_mimodd_0_1_5 repository
- mimodd_bamsort: use MiModD to sort a BAM file by coordinates (or names) of the mapped reads install this tool from the suite_mimodd_0_1_5 repository
- mimodd_deletion_prediction: use MiModD to predict deletions in one or more samples of aligned paired-end reads install this tool from the suite_mimodd_0_1_3 repository
- mimodd_convert: use MiModD to convert between NGS reads sequence formats install this tool from the suite_mimodd_0_1_5 repository
- mimodd_reheader: use MiModD to reheader a BAM file install this tool from the suite_mimodd_0_1_5 repository
- mimodd_vcf_filter: use MiModD to extract lines from a vcf variant file based on sample- and field-specific filters install this tool from the suite_mimodd_0_1_5 repository
- mimodd_snpeff_genomes: use MiModD to list installed SnpEff genomes install this tool from the suite_mimodd_0_1_5 repository
- mimodd_extract_variants: use MiModD to extract variant sites from BCF input generated with mimodd_variant_calling and report them in VCF install this tool from the suite_mimodd_0_1_5 repository
- mimodd_ngs_run_annotation: use MiModD to generate a SAM format header from an NGS run description install this tool from the suite_mimodd_0_1_5 repository
- mimodd_variant_calling: use MiModD to call variants from an aligned reads BAM file install this tool from the suite_mimodd_0_1_5 repository
- mimodd_coverage_stats: use MiModD to obtain a coverage report for a bcf file generated with mimodd_variant_calling install this tool from the suite_mimodd_0_1_5 repository
- mimodd_cloudmap_prepare: use MiModD to generate CloudMap-compatible output from a vcf file install this tool from the suite_mimodd_0_1_5 repository
- mimodd_read_alignment: use MiModD to align NGS reads to a reference genome install this tool from the suite_mimodd_0_1_5 repository
- mimodd_annotate_variants: use MiModD to annotate a vcf variant file with information about the affected genes install this tool from the suite_mimodd_0_1_5 repository
- cuffquant: Cuffquant is part of Cufflinks. Cuffquant allows precalculation of gene expression levels. Output can be used in cuffdiff and cuffnorm.
- bwa: Uploaded Wrapper for bwa mem. aln, sampe, and samse Al collection of Galaxy bwa wrapper based on version 0.7.10 (039ea206392ada2542bc41ff2581c53fa2fe2bf2).
cuffnorm: Cuffnorm is part of Cufflinks. Cuffnorm is similar to cuffdiff, but does not perform differential expression testing. It provides normalized gene expression tables for use in downstream tools (R/matlab/...).
Please cite: Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and switching among isoforms. Nature Biotechnology doi:10.1038/nbt.1621
- cummerbund: Initial commit with version 1.0.0 of the cummeRbund wrapper. Wrapper for the Bioconductor cummeRbund library Allows for persistent storage, access, exploration, and manipulation of Cufflinks high-throughput sequencing data. In addition, provides numerous plotting functions for commonly used visualizations.
cummerbund_to_tabular: Initial commit with version 1.0.0 of the tool. Regenerate the tabular files generated by cuffdiff from a cummeRbund SQLite database. This tool extracts one or more of the original tabular data files from a cummeRbund SQLite database.
- pal_finder: Find microsatellite repeat elements sequencing reads and design PCR primers to amplif y them Runs the pal_finder Perl script and PRIMER3 to find microsatellite repeat elements sequencing reads and design PCR primers to amplify them
- trimmomatic: A flexible read trimming tool for Illumina NGS data Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.
- gsaf_downloader: Download data from GSAF Easy download utility for fastq.gz files provided by GSAF (Genomic Sequencing and Analysis Facility)
- ceas: CEAS - Cis-regulatory Element Annotation System A tool designed to characterize genome-wide protein-DNA interaction patterns from ChIP-chip and ChIP-Seq of both sharp and broad binding factors.
- voom_rnaseq: Perform RNA-Seq analysis using limma voom pipeline
- From wolma:
- suite_mimodd_0_1_5: This metapackage should be used to install the MiModD suite of tools for the analysis of genome-wide sequencing data from model organisms along with its Galaxy tool wrappers.
- package_mimodd_0_1_5: dependency package for the MiModD suite of tools
- package_python3_zlib_dependent_1_0: a lean build of Python3.4.1 including the zlib module - based on package_python_3_4 by jankanis Several modules of the Python standard library depend on external libraries being installed. Of these modules, this package forces only the installation of the zlib module, so the zlib library version 1.2.8 is its only requirement.
- package_zlib_1_2_8: zlib library dependency definition
- package_cummerbund_2_8_2: Contains a tool dependency definition that downloads and installs version 2.8.2 of the cummeRbund R library. Allows for persistent storage, access, exploration, and manipulation of Cufflinks high-throughput sequencing data. In addition, provides numerous plotting functions for commonly used visualizations.
- package_r_3_1_2: Contains a tool dependency definition that downloads and compiles version 3.0.3 of the the R package.
- package_bowtie_2_2_4: tool dependency definition that downloads and compiles version 2.2.4 of the Bowtie package Contains a tool dependency definition that downloads and compiles version 2.2.4 of the Bowtie package
- package_cufflinks_2_2_1: tool dependency definition that downloads and compiles version 2.2.1 of the cufflinks RNA-Seq suite. This repository is intended to be defined as a complex repository dependency within a separate repository.
- package_picard_1_126_0: tool dependency definition that downloads and compiles version 1.126.0 of the Picard package. This repository is intended to be defined as a complex repository dependency within a separate repository.
- package_freebayes_0_9_18_0059bdf: tool dependency definition that downloads and compiles version 0.9.18 of FreeBayes. Program: freebayes (Bayesian haplotype-based polymorphism discovery and genotyping.) Version: 0.9.18 (0059bdf)
- package_trtr_0_1: Trim Reads of Tandem Repeat in a fastq file. This tool removes tandem repeats from ends of unaligned sequencing reads (leaving one copy). This prevents reads that don't span the repeated region from overlapping and leading to inaccurate SNPs calls.
- From peterjc:
- ncbi_blast_plus: Uploaded v0.1.01 - Requires blastdbd datatype (blast_datatypes v0.0.19). Support for makeprofiledb to create protein domain databases and use them in RPS-BLAST and RPS-TBLASTN. Tools now support GI and SeqID filters, and embed the citations.
- Bioinformatics WikiBook Collaborative Workshop 24-26 February at TGAC. And we encourage contributions about Galaxy to the NGS WikiBook.
- PDACS: a portal for data analysis services for cosmological simulations, Chard et al. PDACS is a Galaxy implementation for cosmology. And PDACS slides are here.
- Just scheduled: Intro to variant calling for pathologists and laboratory managers, Sydney, 22 June. Ross Lazarus & Andrew Lonie
- ASH: A Galaxy VM from Erasmus Medical Centre for Automated Selection of Hotspots in cancer.