May 2014 Galaxy Update
Welcome to the May 2014 Galaxy Update, a monthly summary of what is going on in the Galaxy community. Galaxy Updates complement the Galaxy Development News Briefs which accompany new Galaxy releases and focus on Galaxy code updates.
Note: Galaxy's support forum has moved to help.galaxyproject.org.
Galaxy Biostar is a space where researchers using Galaxy come together and share both scientific advice and practical tool help. Whether on usegalaxy.org, a CloudMan instance, or any other Galaxy (public or local), if you have something to say about Using Galaxy, this is the place to do it.
Current integration with usegalaxy.org
- The whole history of the `email@example.com` mailing list was imported into Galaxy Biostar. Your prior posts are automatically claimed when you login!
- If you access Galaxy Biostar from usegalaxy.org (Menu: Help → Galaxy Biostar) you will be automatically logged in. A Galaxy Biostar account will be created for you if it did not previously exist. To obtain this account’s password please use the password reset feature of Galaxy Biostar.
- When you have a question, search Galaxy Biostar directly from any Galaxy tool page.
- Galaxy Biostar is available at biostar.usegalaxy.org and will be our primary avenue for end-user support
- The firstname.lastname@example.org mailing list will continue to be supported during the transition but starting now please use the Galaxy Biostar forum to ask all questions about using Galaxy.
- Please do not double post to both Galaxy Biostar and email@example.com
- Send us feedback in this Biostar post to tell us what you think. We care.
- Notice will be given when the firstname.lastname@example.org mailing list is retired.
- Archives of email@example.com will remain accessible.
Galaxy Biostar was launched on April 23. We hope you like the change and look forward to any feedback you may have.
47 papers were added to the Galaxy CiteULike Group in April. Some papers that may be particularly interesting to the Galaxy community:
"Analysis of Next-Generation Sequencing Data Using Galaxy," Daniel Blankenberg and Jennifer Hillman-Jackson, Stem Cell Transcriptional Networks, Vol. 1150 (2014), pp. 21-43, doi:10.1007/978-1-4939-0512-6_2
"Support for data-intensive computing with CloudMan," Yousef Kowsar and Enis Afgan, Information & Communication Technology Electronics & Microelectronics (MIPRO), 2013 36th International Convention on, (May 2013), pp. 243-248
"Chromatin states reveal functional associations for globally defined transcription start sites in four human cell lines," Morten Rye, Geir Kjetil Sandve, Carsten O Daub, Hideya Kawaji, Piero Carninci, Alistair RR Forrest, Finn Drabløs and the FANTOM consortium, BMC Genomics, Vol. 15, No. 1. (26 March 2014), 120, doi:10.1186/1471-2164-15-120
"A Model-Based Approach to Identify Binding Sites in CLIP-Seq Data," Tao Wang, Beibei Chen, Minsoo Kim, Yang Xie, and Guanghua Xiao, PloS ONE, Vol. 9, No. 4. (2014)
The new papers were tagged in 14 different areas (the most diverse month we've had):
Early registration closes the month. Early registration saves more than 70% on registration costs, and Training Day registration is an additional 55% off if you register for both at the same time. This is by far the most affordable option, with early registration fees starting at less than $50 per day. When you register you can also reserve lodging at Charles Commons, a very affordable housing option in the same building as the conference.
Training Day is an opportunity to learn about all things Galaxy including using Galaxy, deploying and managing Galaxy, extending Galaxy, and Galaxy internals. There are 5 parallel tracks, each with 3 sessions, with each of those sessions two and half hours long. That's 15 sessions and over 37 hours of workshop material.
We are pleased to announce that Steven Salzberg will be the keynote speaker at GCC2014. Steven is a Professor of Medicine, Biostatistics, and Computer Science at the Johns Hopkins University School of Medicine where he is also Director of the Center for Computational Biology at the McKusick-Nathans Institute of Genetic Medicine. Steven has made many prominent contributions to open source software, including several of the most popular tools used on Galaxy Platforms. Recently he was awarded the 2013 Benjamin Franklin Award for Open Access in the Life Sciences, and the 2012 Balles Prize in Critical Thinking for his science column at Forbes.
Steve's GCC2014 talk will be on "Transcriptomes and Exomes: Computational Challenges of NGS Data."
Do you have a feature you've always wanted to implement? Just want to hack on Galaxy (or CloudMan!) with other folks? The Galaxy Hackathon will be a great opportunity to meet and work closely with other community and Galaxy Team members over the course of three days, culminating in some really great improvements and new features to show off at the Galaxy Community Conference afterward.
Participation in the hackathon itself is completely free, but there's limited space so if you're interested and would like to participate please go ahead and book both your lodging and hackathon seat at EventBrite.
To help organize ideas and people into more concrete projects, we've also set up a hackathon-specific Trello board that we'd love for everyone to go ahead and start using it. The board is public and open to commentary and voting, but to create new cards you’ll need to be added as a member so please note the instructions on the board for that.
Finally, we are very happy to have Curoverse on board as the exclusive Peta level sponsor of the hackathon. If you know of any other group that might be interested in sponsoring at the Giga level please let us know.
The deadlines for both oral and poster presentations were in April. Oral presentation submitters have been contacted and we've heard back from over half of them, and we will continue to update the Talk Abstracts page as we hear from the rest. If you submitted a poster abstract, they you will be notified at the end of this week if your poster was accepted. We'll start posting those abstracts online then too.
The conference is still accepting late abstract submissions. These will not be considered for constructing the initial list of accepted abstracts, but will be reviewed as cancellations occur and space frees up (and we have always had a few cancellations).
Look for a more detailed draft program to be posted later this month, once we have heard from all presenters.
There are still Silver and Bronze sponsorships available for the GCC2014 and Giga sponsorships for the Hackathon. Please contact the Organizers if your organization would like to help sponsor these events.
In 2014 we are also adding non-sponsor exhibit spaces in addition to the sponsor exhibits. This will significantly increase the size of the exhibit floor. Please contact the Organizers if your organization would like to have an exhibit space at GCC2014.
A Galaxy Tour is happening in the United Kingdom in early May 2014. If you are anywhere close to Norwich or Edinburgh, then it might be worth your while to attend an event.
First, there will be a talk on Scaling Galaxy for Big Data at the NGS Data after the Gold Rush meeting, being held 6-7 May, at The Genome Access Centre (TGAC) in Norwich. This will be followed by a hands-on Introduction to Galaxy Workshop on 9 May, also at TGAC.
After that, there will be 3 events in Edinburgh the following week, starting on Monday with a hands on Introduction to Galaxy Workshop at the University of Edinburgh in the morning and a Galaxy Project Update talk at the 5th Edinburgh Bioinformatics Meeting, in the afternoon, and also at the University. Finally, on Tuesday 13 May, there will be a all-day hands-on Galaxy Workshop at the Institute of Genetics and Molecular Medicine (IGMM) at Western General Hospital.
You must be affiliated with the University of Edinburgh or the IGMM to register for either of those workshops, but all other events are open to anyone.
And don't worry if you are not near Norwich or Edinburgh in May. There are at least 17 other Galaxy related events in the next 70 days in Norway, France, online, Croatia, Thailand, Canada, the US, the Netherlands, and Australia. Also see the Galaxy Events Google Calendar for details on other events of interest to the community.
The Galaxy is expanding! Please help it grow.
- Post-doctoral position in computational mass spectrometry at CEA Saclay, France, Paris area
- Bioinformatics specialist/Scientific programmer, Ettema Lab, Uppsala University
- Bioinformatician / Computational Biologist, EMBL Heidelberg, Germany
- Bioinformatics Technician, Bioinformatics Unit, Core Facilities, CRG, Barcelona, Spain
- Experimental Officer in Bioinformatics, NERC Metabolomics Facility, University of Birmingham, UK
- Statistical Genomics Postdoc opening in the Makova lab at Penn State
- The Galaxy Project is hiring software engineers and post-docs
Two public Galaxy servers were added to the published list in April:
- Domain/Purpose: The Globus Genomics Project demonstration Galaxy server. It has a strong emphasis on proteomics
- Comments: Includes Globus tools and over 20 proteomics tools, as well as many NGS analysis tools and the usual data manipulation tool set.
- Sponsor(s): Globus Genomics, The Computation Institute, The University of Chicago, Argonne National Lab, and Amazon Web Services.
- Links: SunLab Galaxy Server
- Domain/Purpose: Provides access to computational tools developed by Fengzhu Sun's group at University of Southern California, notably tools for local similarity analysis (LSA).
- User Support: Email
- Quotas: "Due to the limited computational resources, we refer users not using the tools developed by SunLab to the main public Galaxy site. We also encourage user applying SunLab tools to large data sets to install their standalone version of the specific tools, or install this version of Galaxy server with SunLab tools integrated.
- Sponsor(s): The SunLab at the University of Southern California.
News Brief Highlights: Visualization framework and Trackster display enhancements Tool Shed upgrades for repos, installs, tests, and docs Over 100 genomes with new content on our rsync server UI unification of design plus expanded dataset action access API functionality additions including new job control/admin abilities More features for admin functions, config options, and job controls 18 new community contributions incorporated (big thanks!)
## CloudMan and BioBlend BioBlend 0.4.3 was released on April 11, 2014.
The most recent version of CloudMan was released in January 2014. # Galaxy Community Hubs
The Community Log Board and Deployment Catalog Galaxy community hubs* were launched in December. If you have a deployment, or experience you want to share then please publish them.
The Dutch Techcentre for Life Sciences (DTL) has made its Galaxy ToolShed publicly available. The DTL ToolShed has almost 70 tools in it, from ANNOVAR to VCF-2-VariantList. This ToolShed was originally started at NBIC.
- infernal: Inference of RNA Alignments search DNA sequence DBs for RNA structure/sequence similarities
- msa_datatypes: Galaxy applicable data formats for Multiple Sequence Alignments
- taxonomy_krona_chart: convert metagenomic profiling results into zoomable pie chart using Krona
- deeptools_workflows: deepTools workflows to visualize large datasets in a meaningful way
- mosaik2: reference-guided aligner for next-generation sequencing technologies.
- suite_gops_1_0: Metarepository for the gops tool suite - will install the gops tool suite
- suite_gatk_1_4: A suite of Galaxy utilities associated with version 1.4 of the GATK package.
- join: Join the intervals of two datasets side-by-side
- compute_q_values: Compute q-values based on multiple simultaneous tests p-values
- charts: Enable advanced visualization options in Galaxy Charts, a visualization plugin for Galaxy
- concat: Concatenate two datasets into one dataset
- merge: Merge the overlapping intervals of a dataset
- coverage: Coverage of a set of intervals on second set of intervals
- basecoverage: count total bases covered by a set of intervals
- intersect: Intersect the intervals of two datasets
- flanking_features: Fetch closest non-overlapping feature for every interval
- subtract: Subtract the intervals of two datasets
- quality_filter: filter nucleotides in every alignment block of MAF file based on quality/PHRED scores
- rcve: Compute RCVE (Relative Contribution to Variance) for all possible variable subsets
- microsats_mutability: Estimate microsatellite mutability by specified attributes
- partialr_square: Compute partial R square
- linear_regression: uses R 'lm' function to perform linear regression
- getindels_2way: Fetch Indels from pairwise alignments
- getindelrates_3way: Estimate Indel Rates for 3-way alignments
- cluster: Cluster the intervals of a dataset
- complement: Complement intervals of a dataset
- subtract_query: Subtract Whole Dataset from another dataset
- featurecounter: find the coverage of intervals in the first dataset on intervals in the second dataset
- logistic_regression_vif: Perform Logistic Regression with vif
- maf_cpg_filter: Mask CpG/non-CpG sites from MAF file
- get_flanks: find the upstream and/or downstream flanking region(s)
- count_covariates: Count Covariates on BAM files
- depth_of_coverage: Depth of Coverage on BAM files at different levels of partitioning and aggregation
- substitutions: Fetch substitutions from pairwise alignments
- microsats_alignment_level: Extract Orthologous Microsatellites from pair-wise alignments
- variant_combine: Combines VCF records from different sources; supports full merges & set unions
- best_regression_subsets: use regsubsets R function for regression subset selection
- variant_filtration: Filter variant calls using user-selectable, parameterizable criteria
- windowsplitter: splits intervals into smaller intervals based on the specified window-size and type
- variants_validate: Validates a variants file.
- variant_select: Select Variants from VCF files
- table_recalibration: Second pass in a two-pass BAM processing step, doing a by-read traversal
- realigner_target_creator: Realigner Target Creator for use in local realignment
- variant_eval: General tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, ...)
- variant_recalibrator: learns a Gaussian mixture model over variant annotations and evaluates the variant
- unified_genotyper: Variant caller which unifies approaches of several disparate callers
- substitution_rates: Estimate substitution rates for non-coding regions using Jukes-Cantor JC69 model
- variant_annotator: Annotate variant calls with context information.
- weightedaverage: Assign weighted-average of the values of features overlapping an interval
- tables_arithmetic_operations: Arithmetic Operations on tables
- print_reads: Dynamically merge multiple BAM files, resulting in merged output sorted in coordinate order
- variant_apply_recalibration: Cut vcf to get novel FDR levels specified during Variant Recalibration
- indel_realigner: local realignment of reads based on misalignments due to the presence of indels
- analyze_covariates: Create collapsed recal csv files, call R to plot residual error vs covariates
- data_manager_gatk_picard_index_builder: Generate GATK-sorted Picard indexes
- chartskit: Enables advanced visualization options in Galaxy Charts
- column_join: Join tabular files
- The meeting report and slides for the Galaxy Australasia Workshop are now online.
- Galaxy has a new status interface. Check status.galaxyproject.org if you suspect downtime.