June 2014 Galaxy Update
Welcome to the June 2014 Galaxy Update, a monthly summary of what is going on in the Galaxy community. Galaxy Updates complement the Galaxy Development News Briefs which accompany new Galaxy releases and focus on Galaxy code updates.
Join the conversation! Learn how here.
Note: Galaxy's support forum has moved to help.galaxyproject.org.
The Galaxy Biostar online forum was launched April 23 as a replacement for the Galaxy-User mailing list.
During the past 5 weeks, Galaxy Biostar has been wildly successful with over 125 active threads, more than 5 times the number of active threads on Galaxy-user in the 5 weeks before the switch. Galaxy-User has remained open during the transition, but now it's time to retire it. All new posting to Galaxy-User will be stopped on Friday, June 6, some 101 months, and 8,100 postings after it was launched. All those postings will remain available both in Galaxy Biostar (where they have been imported), and in the online list archives. Thanks for making Galaxy Biostar, and Galaxy-User before it, such a great resource. # Events ## GCC2014: June 30 - July 2, Baltimore
The 2014 Galaxy Community Conference (GCC2014) will start one month from today, June 30, and run through July 2, at the Homewood Campus of Johns Hopkins University, in Baltimore, Maryland, United States.
Registration closes the month. Registering now saves 100% of late registration fees. When you register you can also reserve lodging at Charles Commons, a very affordable housing option in the same building as the conference.
Training Day is an opportunity to learn about all things Galaxy including using Galaxy, deploying and managing Galaxy, extending Galaxy, and Galaxy internals. There are 5 parallel tracks, each with 3 sessions, with each of those sessions two and half hours long. That's 15 sessions and over 37 hours of workshop material.
The conference is still accepting late abstract submissions. These will be considered as cancellations occur and space frees up (and we have always had a few cancellations).
There are still Silver and Bronze sponsorships available for the GCC2014 and Giga sponsorships for the Hackathon. Please contact the Organizers if your organization would like to help sponsor these events.
In 2014 we are also adding non-sponsor exhibit spaces in addition to the sponsor exhibits. This will significantly increase the size of the exhibit floor. Please contact the Organizers if your organization would like to have an exhibit space at GCC2014.
Do you have a feature you've always wanted to implement? Just want to hack on Galaxy (or CloudMan!) with other folks? The Galaxy Hackathon will be a great opportunity to meet and work closely with other community and Galaxy Team members over the course of three days, culminating in some really great improvements and new features to show off at the Galaxy Community Conference afterward.
Participation in the hackathon itself is completely free, but there's limited space so if you're interested and would like to participate please go ahead and book both your lodging and hackathon seat at EventBrite. Register now. As of this writing there are only 11 spots left.
To help organize ideas and people into more concrete projects, we've also set up a hackathon-specific Trello board that we'd love for everyone to go ahead and start using it. The board is public and open to commentary and voting, but to create new cards you’ll need to be added as a member so please note the instructions on the board for that.
Finally, we are happy to have Amazon Web Services on board as the Cloud Infrastructure sponsor for the GCC2014 Hackathon!
There are at least 13 other Galaxy related events in the next two months in Thailand, Canada, France, the United States, Italy, the Netherlands, Australia, and Brazil. Also see the Galaxy Events Google Calendar for details on other events of interest to the community.
49 papers were added to the Galaxy CiteULike Group in May. Some papers that may be particularly interesting to the Galaxy community:
Implementation of Cloud based Next Generation Sequencing data analysis in a clinical laboratory, by Getiria Onsongo, Jesse Erdmann, Michael D Spears, John Chilton, Kenneth B Beckman, Adam Hauge, Sophia Yohe, Matthew Schomaker, Matthew Bower, Kevin A T Silverstein and Bharat Thyagarajan, BMC Research Notes, Vol. 7, No. 1. (2014), 314, doi:10.1186/1756-0500-7-314
deepTools: a flexible platform for exploring deep-sequencing data, by Fidel Ramírez, Friederike Dündar, Sarah Diehl, Björn A. Grüning, Thomas Manke, Nucleic Acids Research (05 May 2014), gku365, doi:10.1093/nar/gku365
Galaxy + Hadoop: Toward a Collaborative and Scalable Image Processing Toolbox in Cloud, by Shiping Chen, Tomasz Bednarz, Piotr Szul, Dadong Wang, Yulia Arzhaeva, Neil Burdett, Alex Khassapov, John Zic, Surya Nepal, Tim Gurevey, John Taylor, In "Service-Oriented Computing – ICSOC 2013 Workshops", Vol. 8377 (2014), pp. 339-351, doi:10.1007/978-3-319-06859-6_30
The new papers were tagged in many different areas:
The Galaxy is expanding! Please help it grow.
- Two postdoc positions in integrative genomics available in Oslo, Norway
- Statistical Genomics Postdoc opening in the Makova lab at Penn State
- The Galaxy Project is hiring software engineers and post-docs
One new public Galaxy servers was added to the published list in May:
- deepTools server
- deepTools home page at GitHub
- Fidel Ramírez, Friederike Dündar, Sarah Diehl, Björn A. Grüning, and Thomas Manke. deepTools: a flexible platform for exploring deep-sequencing data Nucl. Acids Res. first published online May 5, 2014 doi:10.1093/nar/gku365
- deepTools: a flexible platform for exploring deep-sequencing data presentation by Sarah Diehl at GCC2014
- deepTools deployment description
- Domain/Purpose: deepTools is a suite of user-friendly tools for the visualization, quality control and normalization of data from high-throughput DNA sequencing experiments.
- Comments: deepTools offers multiple methods for highly-customizable data visualization that immensely aid hypothesis generation and data interpretation. It also offers all the tools needed to create coverage files in standard bedGraph and bigWig file formats allowing various normalization procedures and comparisons between two files (for example, treatment and control).
- User Support:
- Quotas: 20 GB for unregistered users, 30 GB for registered users
- Sponsor(s): Bioinformatics and Deep-Sequencing Units at the Max Planck Institute for Immunobiology and Epigenetics.
The most recent Galaxy Distribution was released on April 14, 2014.
BioBlend 0.4.3 was released on April 11, 2014.
The most recent version of CloudMan was released in January 2014.
One new deployment description was added in May: deepTools The Community Log Board and Deployment Catalog Galaxy community hubs* were launched last your. If you have a Galaxy deployment, or experience you want to share then please publish them.
In no particular order:
- crispr_recognition_tool: automatic detection of clustered regularly interspaced palindromic repeats
- minced: MinCED is a program to find Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such as metagenomes, in which sequence size can be anywhere from 100 to 800 bp.
- primo_multiomics: Multi-omics module of Plant Research International's Mass Spectrometry (PRIMS) toolsuite
- blockclust: Non-coding RNA clustering from deep sequencing read profiles
- ged_bowtie: Bowtie wrapper for small RNA sequencing analysis
- metaphlan: MetaPhlAn (Metagenomic Phylogenetic Analysis) is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data
- lefse: LDA Effect Size (LEfSe) (Segata et. al 2010) is an algorithm for high-dimensional biomarker discovery and explanation that identifies genomic features (genes, pathways, or taxa) characterizing the differences between two or more biological conditions.
- micropita: microPITA is a computational tool enabling sample selection in tiered studies.
- maaslin: MaAsLin is a multivariate statistical framework that finds associations between clinical metadata and microbial community abundance or function
- fasta_filter_by_length: Outputs sequences between Minimal length and Maximum length.
- rmap: RMAP for Solexa Short Reads Alignment
- convert_solid_color2nuc: Convert Color Space to Nucleotides
- megablast_xml_parser: processes the XML output of any NCBI blast tool
- fastqsolexa_to_fasta_qual: extracts sequences and quality scores from FASTQ data (Solexa variant), producing a FASTA dataset and a QUAL dataset.
- fasta_concatenate_by_species: attempts to parse FASTA headers to determine the species for each sequence in a multiple FASTA alignment. It then linearly concatenates the sequences for each species in the file, creating one sequence per determined species.
- blat_mapping: takes BLAT pslx output and returns a wig-like file showing the number of reads (coverage) mapped at each chromosome location.
- annotation_profiler: Profile Annotations for a set of genomic intervals
- megablast_wrapper: Megablast compare short reads against htgs, nt, and wgs databases
- blat_coverage_report: Calculate the percentage of reads supporting each nucleotide at each location
- mapping_to_ucsc: Format mapping data as UCSC custom track
- tabular_to_fasta: Converts tab delimited data into FASTA formatted sequences.
- fasta_compute_length: counts the length of each fasta sequence in the file. The output file has two columns per line (separated by tab): fasta titles and lengths of the sequences.
- short_reads_trim_seq: Select high quality segments
- fasta_to_tabular: converts FASTA formatted sequences to TAB-delimited format.
- rmapq: runs rmapq, searching against a genome build with sequence qualities.
- kernel_principal_component_analysis: uses functions from 'kernlab' library from R statistical package to perform Kernel Principal Component Analysis (kPCA)
- short_reads_figure_high_quality_length: Histogram of high quality score reads
- principal_component_analysis: performs Principal Component Analysis on the given numeric input data using functions from R statistical package
- canonical_correlation_analysis: uses functions from 'yacca' library from R statistical package to perform Canonical Correlation Analysis (CCA)
- kernel_canonical_correlation_analysis: uses functions from 'kernlab' library from R statistical package to perform Kernel Canonical Correlation Analysis (kCCA)
- short_reads_figure_score: Build base quality distribution
- nepenthes_3dpca: Tools for Principal Component Analysis
- abyss: abyss de novo assembler
- protein_funcional_analysis_similarities: Provides common characteristics among similar proteins.
- package_python3_4: The Python language version 3.4.1
- archer: Gene Mutation Identification
- usearch_dereplication: Removal of duplicate sequences - GVL
- usearch_cluster_otus: OTU clustering using the UPARSE-OTU algorithm - GVL
- usearch_uchime: Detecting chimeric sequences - GVL
- usearch_map_reads_to_otus: Maps read sequences to OTUs - GVL
- rdp_multiclassifier: Rapid assignment of rRNA sequences into the new bacterial taxonomy - GVL
- softsearch_tool: Sensitive Structural Variant detection(SV)
- package_eden_1_1: tool dependency definition that downloads and compiles version 1.1.x of the EDeN package.
- package_blast_plus_2_2_26: tool dependency definition that downloads and installs version 2.2.26+ of the NCBI BLAST+ package.
- package_rmap_2_05: Tool dependency definition that downloads and compiles version 2.05 of the rmap package
- package_cran_kernlab_0_1_4: tool dependency definition that downloads and installs version 0.1-4 of the kernlab R library.
- package_fontconfig_2_11_1: tool dependency definition that downloads and compiles version 2.11.1 of the fontconfig package
- package_cran_yacca_1_0: tool dependency definition that downloads and installs version 1.0 of the yacca R library.
- package_pydoop_0_11: tool dependency definition that downloads and compiles version 0.11.1 of the pydoop package
- mira_assembler: updated to v0.0.10
- macs2: updated to include option to call broad peaks
Access to all TACC-based resources and services, including usegalaxy.org and the Galaxy Project Test Server, will be unavailable from 9 a.m. to 5 p.m., central US time, on Saturday, May
24 31, 2014. TACC staff will be performing an upgrade to the networking infrastructure. During this time, jobs will continue to run.
During this time you are encouraged to use any of the 60+ public Galaxy servers.