April 2014 Galaxy Update
Welcome to the April 2014 Galaxy Update, a monthly summary of what is going on in the Galaxy community. Galaxy Updates complement the Galaxy Development News Briefs which accompany new Galaxy releases and focus on Galaxy code updates.
63 papers (a new monthly record) were added to the Galaxy CiteULike Group in March. Some papers that may be particularly interesting to the Galaxy community:
"Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach," Benjamin Dickins, Boris Rebolledo-Jaramillo, Marcia Shu-Wei S. Su, Ian M. Paul, Daniel Blankenberg, Nicholas Stoler, Kateryna D. Makova, Anton Nekrutenko, BioTechniques, Vol. 56, No. 3. (2014)
"Orione, a web-based framework for NGS analysis in microbiology," Gianmauro Cuccuru, Massimiliano Orsini, Andrea Pinna, Andrea Sbardellati, Nicola Soranzo, Antonella Travaglione, Paolo Uva, Gianluigi Zanetti, Giorgio Fotia, Bioinformatics (Oxford, England) (10 March 2014), doi:10.1093/bioinformatics/btu135
"Galaxy as a Platform for Identifying Candidate Pathogen Effectors," Peter J. Cock, Leighton Pritchard, In Plant-Pathogen Interactions, Vol. 1127 (2014), pp. 3-15, doi:10.1007/978-1-62703-986-4_1
"GigaDB: promoting data dissemination and reproducibility," Tam P. Sneddon, Xiao S. Zhe, Scott C. Edmunds, Peter Li, Laurie Goodman, Christopher I. Hunter, Database, Vol. 2014 (01 January 2014), bau018, doi:10.1093/database/bau018
"Prediction of Gene Activity in Early B Cell Development Based on an Integrative Multi-Omics Analysis," Mohammad Heydarian, Teresa Romeo Luperchio, Jevon Cutler, Christopher J. Mitchell1, Min-Sik Kim, Akhilesh Pandey, Barbara Sollner-Webb, Karen Reddy, Journal of Proteomics & Bioinformatics, Vol. 07, No. 02. (2014), doi:10.4172/jpb.1000302
The new papers covered:
Abstract submission for oral presentations closes April 4, which is this Friday. Poster submission closes April 25. Poster authors will be notified of acceptance status within two weeks of submission, while presentation authors will be notified no later than May 2. Please consider presenting your work. If you are dealing with big biological data, then this meeting wants to hear about it.
Accepted talks and selected posters from GCC2014 are also eligible for consideration to appear in the GigaScience "Galaxy: Data Intensive and Reproducible Research" series.
Early registration is now open. Early registration saves more than 70% on registration costs, and Training Day registration is an additional 55% off if you register for both at the same time. This is by far the most affordable option, with early registration fees starting at less than $50 per day. When you register you can also reserve lodging at Charles Commons, a very affordable housing option in the same building as the conference.
Training Day is an opportunity to learn about all things Galaxy including using Galaxy, deploying and managing Galaxy, extending Galaxy, and Galaxy internals. There are 5 parallel tracks, each with 3 sessions, with each of those sessions two and half hours long. That's 15 sessions and over 37 hours of workshop material.
In 2014 we are also adding non-sponsor exhibit spaces in addition to the sponsor exhibits. This will significantly increase the size of the exhibit floor. Please contact the Organizers if your organization would like to have an exhibit space at GCC2014.
GlobusWorld is this year’s biggest gathering of all things Globus. GlobusWorld 2014 features a features a Using Globus Genomics to Accelerate Analysis Tutorial, and a full half day on Globus Genomics in the main meeting, including a keynote by Nancy Cox and these accepted talks:
- Globus Genomics: Enabling high-throughput cloud-based analysis and management of NGS data for Translational Genomics research at Georgetown, by Yuriy Gusev,
- Improving next-generation sequencing variants identification in cancer genes using Globus Genomics, by Toshio Yoshimatsu
- Globus Genomics: A Medical Center's Bioinformatics Core Perspective, by Anoop Mayampurath
- Building a Low-budget Public Resource for Large-scale Proteomic Analyses, by Rama Raghavan
Registration is now open for the Using Galaxy for Analysis of High Throughput Sequence Data Workshop being held at UC Davis, June 16-20, 2014 from 9-5 each day. The workshop will cover modern high throughput sequencing technologies, applications, and ancillary topics, including:
- Illumina HiSeq / MiSeq, and PacBio RS technologies
- Read Quality Assessment & Improvement
- Genome assembly
- SNP and indel discovery
- RNA-Seq differential expression analysis
- Experimental design
- Hardware and software considerations
- Cloud Computing
The workshop will include a rich collection of lectures and hands-on sessions, covering both theory and tools. We will cover the basics of several high throughput sequencing technologies, but will focus on Illumina and PacBio data for hands-on exercises. Participants will explore software and protocols, create and modify workflows, and diagnose/treat problematic data. Workshop exercises will be performed using the popular Galaxy platform (http://usegalaxy.org) on the Amazon Cloud which allows for powerful web-based data analyses. There are no prerequisites other than basic familiarity with genomic concepts.
A similar workshop, using command line interfaces to perform the analysis, is being offered September 15-19, 2014.events of interest in the next few months. Also see the Galaxy Events Google Calendar for details on other events of interest to the community.
The Galaxy is expanding! Please help it grow.
- Statistical Genomics Postdoc opening in the Makova lab at Penn State
- The Galaxy Project is hiring software engineers and post-docs
Three public Galaxy servers were added to the published list in March:
- Link: Biomina Galaxy
- Domain/Purpose: A general purpose Galaxy instance that includes most "standard" tools for DNA/RNA sequencing, plus extra tools for panel resequencing, variant annotation and some tools for Illumina SNParray analysis.
- Includes a number of workflows, including workflow from "A SWI/SNF-related autism syndrome caused by de novo mutations in ADNP," by Helsmoortel, et al., Nature Genetics (2014) doi:10.1038/ng.2899
- User Support: Email support
- Registered users : 50Gb. Can be increased up to 3Tb in collaborative projects.
- There is NO backup of data inside this galaxy server.
- Collaboration partner jobs have higher priority on the system.
- Domain/Purpose: Pylogenetics
- Comments: "This server aims to demonstrate Osiris, a set of phylogenetics tools for the Galaxy Bioinformatics platform. Because it is only a demo, some computationally intensive tools are disabled. Other tools will be slow because this is a public, shared resource."
- User Support:
- Sponsor(s): Oakley Lab at UC Santa Barbara
The most recent release of Galaxy was February 10, 2014.
The most recent version of CloudMan was released in January 2014.
| Share your experience now
The Community Log Board and Deployment Catalog Galaxy community hubs were launched in December. If you have a deployment, or experience you want to share then please publish them. There was one new Community Log Board entry in March: * Basic Galaxy Puppet Module (work by Olivier Inizan, Mikael Loaec of INRA-URGI) # ToolShed Contributions
- regex_find_replace: Use python regular expressions to find and replace text
- samtools_phase: Call and phase heterozygous SNPs
- sample_seqs: Sub-sample sequences files (e.g. to reduce coverage)
- transpose: Transposes tabular-delimited data
- proteomics_rnaseq_reduced_db_workflow: Filter Proteomics Search DB by RNA-seq transcript expression analysis
- proteomics_rnaseq_sap_db_workflow: Create Proteomics Search DB from RNA-seq Single amino acid Polymorphism detection
- proteomics_novel_peptide_filter_workflow: filter a Proteomics Search DB for novel peptides
- proteomics_rnaseq_splice_db_workflow: create Proteomics Search DB from RNA-seq novel splice detection
- rsem_datatypes: Custom galaxy datatypes definitions for use with RSEM
- varscan_wrapper: Fork of fcaramia package correcting errors and additional options
- align_back_trans: Thread nucleotides onto a protein alignment (back-translation)
- dna_visualizer: convert DNA sequence into a PNG image by representing each base with one colored pixel
- bwa_mem: a software package for mapping low-divergent sequences against a large reference genome
- samtool_filter2: Filter BAM/SAM on FLAG,MAPQ,RG,LB or by region & produce a BAM/SAM on demand
- Galaxy reached a milestone of 100 contributors to our codebase! Thank you all!
- Poster: "ChemicalToolBoX and its application on the study of the drug like and purchasable space," by Lucas. et al., Journal of Chemoinformatics
- New tools available in Galaxy @ URGI: SnpEff, Mapsembler2, BLAST+, Blast2GO, Peak predictor, ...