June 2014 Galaxy Update
Welcome to the June 2014 Galaxy Update, a monthly summary of what is going on in the Galaxy community. Galaxy Updates complement the Galaxy Development News Briefs which accompany new Galaxy releases and focus on Galaxy code updates.
Galaxy-User Being Retired June 6
Note: Galaxy's support forum has moved to help.galaxyproject.org.
The Galaxy Biostar online forum was launched April 23 as a replacement for the Galaxy-User mailing list.
During the past 5 weeks, Galaxy Biostar has been wildly successful with over 125 active threads, more than 5 times the number of active threads on Galaxy-user in the 5 weeks before the switch.
Galaxy-User has remained open during the transition, but now it's time to retire it. All new posting to Galaxy-User will be stopped on Friday, June 6, some 101 months, and 8,100 postings after it was launched. All those postings will remain available both in Galaxy Biostar (where they have been imported), and in the online list archives.
Thanks for making Galaxy Biostar, and Galaxy-User before it, such a great resource.
Events
GCC2014: June 30 - July 2, Baltimore
The 2014 Galaxy Community Conference (GCC2014) will start one month from today, June 30, and run through July 2, at the Homewood Campus of Johns Hopkins University, in Baltimore, Maryland, United States.
Registration Closes June 6
Registration closes the month. Registering now saves 100% of late registration fees. When you register you can also reserve lodging at Charles Commons, a very affordable housing option in the same building as the conference.
Training Day is an opportunity to learn about all things Galaxy including using Galaxy, deploying and managing Galaxy, extending Galaxy, and Galaxy internals. There are 5 parallel tracks, each with 3 sessions, with each of those sessions two and half hours long. That's 15 sessions and over 37 hours of workshop material.
Abstracts and Program
The program has been published and all titles and abstracts for accepted talks, are now online. Accepted poster abstracts will posted within a week.
The conference is still accepting late abstract submissions. These will be considered as cancellations occur and space frees up (and we have always had a few cancellations).
Sponsorships and Exhibitors
We are happy to have Penguin Computing as a GCC2014 Silver Sponsor. Penguin, like all Platinum, Gold, and Silver Sponsors will have a table at GCC2104.
There are still Silver and Bronze sponsorships available for the GCC2014 and Giga sponsorships for the Hackathon. Please contact the Organizers if your organization would like to help sponsor these events.
In 2014 we are also adding non-sponsor exhibit spaces in addition to the sponsor exhibits. This will significantly increase the size of the exhibit floor. Please contact the Organizers if your organization would like to have an exhibit space at GCC2014.
Galaxy Hackathon at GCC2014
The very first Galaxy Project Hackathon will take place at Johns Hopkins immediately preceding GCC2014 from starting June 28th.
Do you have a feature you've always wanted to implement? Just want to hack on Galaxy (or CloudMan!) with other folks? The Galaxy Hackathon will be a great opportunity to meet and work closely with other community and Galaxy Team members over the course of three days, culminating in some really great improvements and new features to show off at the Galaxy Community Conference afterward.
Participation in the hackathon itself is completely free, but there's limited space so if you're interested and would like to participate please go ahead and book both your lodging and hackathon seat at EventBrite. Register now. As of this writing there are only 11 spots left.
To help organize ideas and people into more concrete projects, we've also set up a hackathon-specific Trello board that we'd love for everyone to go ahead and start using it. The board is public and open to commentary and voting, but to create new cards you’ll need to be added as a member so please note the instructions on the board for that.
Finally, we are happy to have Amazon Web Services on board as the Cloud Infrastructure sponsor for the GCC2014 Hackathon!
Other Events
There are at least 13 other Galaxy related events in the next two months in Thailand, Canada, France, the United States, Italy, the Netherlands, Australia, and Brazil. Also see the Galaxy Events Google Calendar for details on other events of interest to the community.
New Papers
49 papers were added to the Galaxy CiteULike Group in May. Some papers that may be particularly interesting to the Galaxy community:
- Implementation of Cloud based Next Generation Sequencing data analysis in a clinical laboratory, by Getiria Onsongo, Jesse Erdmann, Michael D Spears, John Chilton, Kenneth B Beckman, Adam Hauge, Sophia Yohe, Matthew Schomaker, Matthew Bower, Kevin A T Silverstein and Bharat Thyagarajan, BMC Research Notes, Vol. 7, No. 1. (2014), 314, doi:10.1186/1756-0500-7-314
- deepTools: a flexible platform for exploring deep-sequencing data, by Fidel Ramírez, Friederike Dündar, Sarah Diehl, Björn A. Grüning, Thomas Manke, Nucleic Acids Research (05 May 2014), gku365, doi:10.1093/nar/gku365
- Galaxy + Hadoop: Toward a Collaborative and Scalable Image Processing Toolbox in Cloud, by Shiping Chen, Tomasz Bednarz, Piotr Szul, Dadong Wang, Yulia Arzhaeva, Neil Burdett, Alex Khassapov, John Zic, Surya Nepal, Tim Gurevey, John Taylor, In "Service-Oriented Computing – ICSOC 2013 Workshops", Vol. 8377 (2014), pp. 339-351, doi:10.1007/978-3-319-06859-6_30
The new papers were tagged in many different areas:
# | Tag | # | Tag | # | Tag | # | Tag | |||
---|---|---|---|---|---|---|---|---|---|---|
4 | Cloud | - | Project | 7 | Tools | 2 | UsePublic | |||
1 | HowTo | 1 | RefPublic | - | UseCloud | 1 | Visualization | |||
2 | IsGalaxy | 3 | Reproducibility | 2 | UseLocal | 14 | Workbench | |||
21 | Methods | 1 | Shared | 10 | UseMain |
Who's Hiring
The Galaxy is expanding! Please help it grow.
- Two postdoc positions in integrative genomics available in Oslo, Norway
- Statistical Genomics Postdoc opening in the Makova lab at Penn State
- The Galaxy Project is hiring software engineers and post-docs
Got a Galaxy-related opening? Send it to outreach@galaxyproject.org and we'll put it in the Galaxy News feed and include it in next month's update.
New Public Servers
One new public Galaxy servers was added to the published list in May:
deepTools
-
Links:
- deepTools server
- deepTools home page at GitHub
- Fidel Ramírez, Friederike Dündar, Sarah Diehl, Björn A. Grüning, and Thomas Manke. deepTools: a flexible platform for exploring deep-sequencing data Nucl. Acids Res. first published online May 5, 2014 doi:10.1093/nar/gku365
- deepTools: a flexible platform for exploring deep-sequencing data presentation by Sarah Diehl at GCC2014
- deepTools deployment description
- Domain/Purpose: deepTools is a suite of user-friendly tools for the visualization, quality control and normalization of data from high-throughput DNA sequencing experiments.
- Comments: deepTools offers multiple methods for highly-customizable data visualization that immensely aid hypothesis generation and data interpretation. It also offers all the tools needed to create coverage files in standard bedGraph and bigWig file formats allowing various normalization procedures and comparisons between two files (for example, treatment and control).
-
User Support:
- Quotas: 20 GB for unregistered users, 30 GB for registered users
- Sponsor(s): Bioinformatics and Deep-Sequencing Units at the Max Planck Institute for Immunobiology and Epigenetics.
Galaxy Distributions
The most recent Galaxy Distribution was released on April 14, 2014.
BioBlend 0.4.3 was released on April 11, 2014.
The most recent version of CloudMan was released in January 2014.
Galaxy Community Hubs
Share your experience now |
One new deployment description was added in May:
- deepTools
The Community Log Board and Deployment Catalog Galaxy community hubs were launched last your. If you have a Galaxy deployment, or experience you want to share then please publish them.
ToolShed Contributions
Galaxy Project ToolShed Repos
In no particular order:
Tools
-
From bgruening
- crispr_recognition_tool: automatic detection of clustered regularly interspaced palindromic repeats
- minced: MinCED is a program to find Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) in full genomes or environmental datasets such as metagenomes, in which sequence size can be anywhere from 100 to 800 bp.
-
From pieterlukasse
- primo_multiomics: Multi-omics module of Plant Research International's Mass Spectrometry (PRIMS) toolsuite
-
From rnateam
- blockclust: Non-coding RNA clustering from deep sequencing read profiles
-
From drosofff
- ged_bowtie: Bowtie wrapper for small RNA sequencing analysis
-
*From george-weingart *
- metaphlan: MetaPhlAn (Metagenomic Phylogenetic Analysis) is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data
- lefse: LDA Effect Size (LEfSe) (Segata et. al 2010) is an algorithm for high-dimensional biomarker discovery and explanation that identifies genomic features (genes, pathways, or taxa) characterizing the differences between two or more biological conditions.
- micropita: microPITA is a computational tool enabling sample selection in tiered studies.
- maaslin: MaAsLin is a multivariate statistical framework that finds associations between clinical metadata and microbial community abundance or function
-
From devteam
- fasta_filter_by_length: Outputs sequences between Minimal length and Maximum length.
- rmap: RMAP for Solexa Short Reads Alignment
- convert_solid_color2nuc: Convert Color Space to Nucleotides
- megablast_xml_parser: processes the XML output of any NCBI blast tool
- fastqsolexa_to_fasta_qual: extracts sequences and quality scores from FASTQ data (Solexa variant), producing a FASTA dataset and a QUAL dataset.
- fasta_concatenate_by_species: attempts to parse FASTA headers to determine the species for each sequence in a multiple FASTA alignment. It then linearly concatenates the sequences for each species in the file, creating one sequence per determined species.
- blat_mapping: takes BLAT pslx output and returns a wig-like file showing the number of reads (coverage) mapped at each chromosome location.
- annotation_profiler: Profile Annotations for a set of genomic intervals
- megablast_wrapper: Megablast compare short reads against htgs, nt, and wgs databases
- blat_coverage_report: Calculate the percentage of reads supporting each nucleotide at each location
- mapping_to_ucsc: Format mapping data as UCSC custom track
- tabular_to_fasta: Converts tab delimited data into FASTA formatted sequences.
- fasta_compute_length: counts the length of each fasta sequence in the file. The output file has two columns per line (separated by tab): fasta titles and lengths of the sequences.
- short_reads_trim_seq: Select high quality segments
- fasta_to_tabular: converts FASTA formatted sequences to TAB-delimited format.
- rmapq: runs rmapq, searching against a genome build with sequence qualities.
- kernel_principal_component_analysis: uses functions from 'kernlab' library from R statistical package to perform Kernel Principal Component Analysis (kPCA)
- short_reads_figure_high_quality_length: Histogram of high quality score reads
- principal_component_analysis: performs Principal Component Analysis on the given numeric input data using functions from R statistical package
- canonical_correlation_analysis: uses functions from 'yacca' library from R statistical package to perform Canonical Correlation Analysis (CCA)
- kernel_canonical_correlation_analysis: uses functions from 'kernlab' library from R statistical package to perform Kernel Canonical Correlation Analysis (kCCA)
- short_reads_figure_score: Build base quality distribution
-
From mb2013
- nepenthes_3dpca: Tools for Principal Component Analysis
-
From jade
- abyss: abyss de novo assembler
-
From fernando
- protein_funcional_analysis_similarities: Provides common characteristics among similar proteins.
-
From jankanis
- package_python3_4: The Python language version 3.4.1
-
From plus
- archer: Gene Mutation Identification
-
From qfab
- usearch_dereplication: Removal of duplicate sequences - GVL
- usearch_cluster_otus: OTU clustering using the UPARSE-OTU algorithm - GVL
- usearch_uchime: Detecting chimeric sequences - GVL
- usearch_map_reads_to_otus: Maps read sequences to OTUs - GVL
- rdp_multiclassifier: Rapid assignment of rRNA sequences into the new bacterial taxonomy - GVL
-
From plus91-technologies
- softsearch_tool: Sensitive Structural Variant detection(SV)
Datatypes
-
From qfab
- metagenomics_datatypes: Galaxy datatypes required by the Metagenomics Workflow - GVL
Workflows
-
From rnateam
- blockclust_workflow: a workflow for BlockClust.
Packages
-
From iuc
- package_minced_0_1_5: tool dependency definition that downloads version 0.1.5 of minced, a CRISPR finder.
- package_mummer_3_23: tool dependency definition for MUMmer, a system for rapidly aligning entire genomes
-
From rnateam
- package_eden_1_1: tool dependency definition that downloads and compiles version 1.1.x of the EDeN package.
-
From devteam
- package_blast_plus_2_2_26: tool dependency definition that downloads and installs version 2.2.26+ of the NCBI BLAST+ package.
- package_rmap_2_05: Tool dependency definition that downloads and compiles version 2.05 of the rmap package
- package_cran_kernlab_0_1_4: tool dependency definition that downloads and installs version 0.1-4 of the kernlab R library.
- package_fontconfig_2_11_1: tool dependency definition that downloads and compiles version 2.11.1 of the fontconfig package
- package_cran_yacca_1_0: tool dependency definition that downloads and installs version 1.0 of the yacca R library.
-
From crs4
- package_pydoop_0_11: tool dependency definition that downloads and compiles version 0.11.1 of the pydoop package
Tool Updates:
-
From peterjc
- mira_assembler: updated to v0.0.10
-
From stemcellcommons
- macs2: updated to include option to call broad peaks
Other News
usegalaxy.org offline May 31, 2014
Access to all TACC-based resources and services, including usegalaxy.org and the Galaxy Project Test Server, will be unavailable from 9 a.m. to 5 p.m., central US time, on Saturday, May 24 31, 2014. TACC staff will be performing an upgrade to the networking infrastructure. During this time, jobs will continue to run.
During this time you are encouraged to use any of the 60+ public Galaxy servers.