April 2017 Galaxy News
Welcome to the April 2017 Galactic News, a summary of what is going on in the Galaxy community. If you have anything to add to next month's newsletter then please send it to firstname.lastname@example.org.
Thanks for using Galaxy, all 100,000 of you.
Note: The submission deadline for oral presentations is 15 April.
GCC2017 will be in Montpellier, France, 26-30 June and will feature two days of presentations, discussions, poster sessions, lightning talks, computer demos, keynotes, and birds-of-a-feather meetups, all about data-intensive biology and the tools and infrastructure that support it. GCC2017 also includes data and coding hacks, and two days of training covering 16 different topics.
GCC2017 will be held at Le Corum Conference Centre in the heart of Montpellier, just 10km from the Mediterranean. This event will gather several hundred researchers addressing diverse questions and facing common challenges in data intensive life science research. GCC participants work across the tree of life, come from around the world, and work at universities, research organizations, industry, medical schools and research hospitals.
GCC2017 is a great opportunity to discuss your research with others that are working in and facing similar challenges in data-intenisve life science research. But only if you can get there. The Galaxy Community Fund is offering fellowships to early career researchers that would be coming from afar to attend GCC2017.
See the application for full details on what's covered and what we ask you to provide.
We hope to see you at GCC2017!
Galaxy Community Fund Board
We are pleased to welcome ELIXIR and GigaScience as returning GCC sponsors.
ELIXIR is a Gold level sponsor of the conference and a Giga sponsor of the GCC2017 Hackathons.
ELIXIR unites Europe’s leading life science organisations in managing and safeguarding the increasing volume of data being generated by publicly funded research. It coordinates, integrates and sustains bioinformatics resources across its member states and enables users in academia and industry to access services that are vital for their research. More...
GigaScience is a GCC Sponsor for the 5th year in a row. GigaScience is an online open access, open data, open peer-review journal. Our focus covers ‘big data’ research from the life and biomedical sciences, including the growing range of work that uses difficult-to-access large-scale data, such as imaging, neuroscience, ecology, cohort, systems biology, and other new types of sharable data.
GigaScience is also offering 15% off the article-processing charge for papers submitted and published in GigaScience from the conference
A collaboration between the Genomics Education Partnership (GEP) and the Galaxy project, the G-OnRamp Project aims to enable educators and researchers to create genome browsers for collaborative annotations of eukaryotic genomes. G-OnRamp uses Galaxy workflows to construct evidence tracks (e.g., protein sequence similarity, gene predictions, RNA-Seq, repeats) and display the results on the UCSC Genome Browser or JBrowse. Educators can use G-OnRamp for hands-on learning in data-intensive biology; researchers can use G-OnRamp for best-practice annotation of novel genomes.
We will hold two beta testers workshops this summer on June 20-22 and July 25-27, 2017 at Washington University in St. Louis that will demonstrate how you can use G-OnRamp to create genome browsers for your favorite genomes. To receive future announcements regarding registration for these G-OnRamp workshops, please register your interest. Travel and local costs are supported by NIH BD2K grant 1R25GM119157. Questions can be directed to Jeremy Goecks (Galaxy), or Sarah Elgin (GEP).
There are a plenitude of Galaxy related events coming up in the next few months:
See the Galaxy Events Google Calendar for details on other events of interest to the community.
275 new publications referencing, using, extending, and implementing Galaxy were added to the Galaxy CiteULike Group in March. This includes papers that were added as part of the 2016 year end review of alerts. (Often a paper isn't added when an alert first comes out because the DOI isn't working yet, or because the paper is embargoed.)
Some highlights from the papers added in March:
High-resolution TADs reveal DNA sequences underlying genome organization in flies Fidel Ramirez, Vivek Bhardwaj, Jose Villaveces, et al. bioRxiv (08 March 2017), 115063, doi:10.1101/115063
Enhancing Knowledge Discovery from Cancer Genomics Data with Galaxy Marco A. Albuquerque, Bruno M. Grande, Elie J. Ritch, et al. GigaScience (09 March 2017), doi:10.1093/gigascience/gix015
Tools for cluster analysis of data from genome-wide association studies Johanne H. Horn (2016)
Secure Genomic Data Processing on the Cloud using TrustStore C. Wise, C. Friedrich, S. Nepal, S. Kanwal, R. Sinnott. In 21st International Congress on Modelling and Simulation (MODSIM2015) (December 2015)
Utilisation de Docker en bioinformatique dans le cloud de IFB Sandrine Perrin, Bryan Branco, Jonathan Lorenzo, et al. In Journées Ouvertes en Biologie, Informatique et Mathématiques (JOBIM) (2016)
I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets K. Chard, M. D'Arcy, B. Heavner, et al. In 2016 IEEE International Conference on Big Data (Big Data) (Dec 2016), pp. 319-328, doi:10.1109/BigData.2016.7840618
Functional and Evolutionary Genomics in Aphids Denis Tagu, Federica Calevro, Stefano Colella, Toni Gabaldón, Akiko Sugio. In Biology and Ecology of Aphids (2016), pp. 52-88
Enhancing Access to Media Collections and Archives Using Computational Linguistic Tools James Pustejovsky, Marc Verhagen, Nancy Ide, Keith Suderman. In Corpora in the Digital Humanities (CDH), Bloomington, Indiana (2017)
DNApod: DNA polymorphism annotation database from next-generation sequence read archives Takako Mochizuki, Yasuhiro Tanizawa, Takatomo Fujisawa, et al. PLOS ONE, Vol. 12, No. 2. (24 February 2017), e0172269, doi:10.1371/journal.pone.0172269
Deep and Surface Causality: Global Teaching and Access to HPC Social Science Douglas W. Uci, Paul Rodriguez, Eric Blau, et al. (2015)
Blood-based omic profiling supports female susceptibility to tobacco smoke-induced cardiovascular diseases Aristotelis Chatziioannou, Panagiotis Georgiadis, Dennie G. Hebels, et al. Scientific Reports, Vol. 7 (22 February 2017), 42870, doi:10.1038/srep42870
Architectural models for deploying and running virtual laboratories in the cloud E. Afgan, A. Lonie, J. Taylor, K. Skala, N. Goonasekera. In 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (May 2016), pp. 282-286, doi:10.1109/MIPRO.2016.7522153
The Galaxy is expanding! Please help it grow.
- Automated deployment of Galaxy environments using Ansible, Institut de Biologie Paris Seine, France
- BioInformaticien, HM.Clause, Business Unit du Groupe Limagrain, Maine et Loire, La Bohalle, France
- Scientific Research Programmer, Sethuraman Lab, California State University San Marcos. Develop model-based population genomics pipelines.
- Development Scientist (Bioinformatics), New England Biolabs, Ipswich, Massachusetts, United States.
The The Language Applications (LAPPS) Grid team intends to seek funding for a project that would create customizable NLP applications that can be used to mine scientific literature, in response to requests from scientists in several disciplines who want to extract entities, relations, networks, and ontologies from scientific publications and identify articles in the scientific literature that have treated particular topics or entities.
Interested in participating? See the full call for details.
LAPPS Grid provides an infrastructure for rapid development of natural language processing applications (NLP) that uses the Galaxy platform as its workflow engine. The LAPPS Grid has integrated a wide range of NLP tools and resources into Galaxy and provided for using them interoperably in a “plug-and-play” environment.
New online training for the PhenoMeNal Galaxy server is available from EMBL-EBI. "PhenoMeNal (Phenome and Metabolome aNalysis) is a standardised e-infrastructure that supports the data processing and analysis pipelines for molecular phenotype data generated by metabolomics applications. This course will give you an overview of PhenoMeNal, how to create your cloud research environment, and how to access Galaxy workflows for metabolomics data."
- User Support:
- You must be part of an academic research lab and create an account on GenOuest.
- The default quota for new users is 50GB.
- Insect genomics (aphids, parasitoïd wasps, lepidopterans)
- User Support:
- Default quota is low (<10Go), but can be increased on request.
Technically, the all-new Galaxy CloudLaunch service has been in public beta since February but keep in mind that it will replace the current CloudLaunch service eventually so give it a try and let us know how it performs for you.
A new image of Galaxy with Galaxy release 17.01 is available on the NSF-sponsored academic cloud Jetstream. The updated image comes with an updated list of tools and all the reference genomes available on Galaxy Main. Remember that the Jetstream cloud is free to use. Instructions on how to get started are available here.
CloudBridge aims to provide a simple layer of abstraction over different cloud providers, reducing or eliminating the need to write conditional code for each cloud. It is currently under development and is in an Alpha state. Release 0.2.0 includes several fixes and enhancements.
See GitHub for details.
Other packages that have been released in the prior 4 months.
- Conda auto initialization is enabled by default
- New interface for user preferences
- Support for compressed FASTQ formats
For full details on all of the enhancements and fixes in this release, please see the full release notes.
Starforge is a collection of scripts that supports the building of components for Galaxy. Specifically, with Starforge you can:
- Build Galaxy Tool Shed dependencies
- Build Python Wheels (e.g. for the Galaxy Wheels Server)
- Rebuild Debian or Ubuntu source packages (for modifications)
These things will be built in Docker. Additionally, wheels can be built in QEMU/KVM virtualized systems.
Documentation can be found at starforge.readthedocs.org.
A Pulsar update was released in February. Pulsar is a Python server application that allows a Galaxy server to run jobs on remote systems (including Windows) without requiring a shared mounted file systems. Unlike traditional Galaxy job runners - input files, scripts, and config files may be transferred to the remote system, the job is executed, and the results are transferred back to the Galaxy server - eliminating the need for a shared file system.
This release contains Conda and flake8 updates.
Galaxy's sequence utilities are a set of Python modules for reading, analyzing, and converting sequence formats.
Other Galaxy packages that haven't had a release in the past four months can be found on GitHub.
- A new in-depth RNAseq tutorial is available.
- Greatly improved Cytoscape visualizations are now available in Galaxy, thanks to Anup Kumar.
- EtherCalc integrated into Galaxy as an interactive environment thanks to Saskia Hiltemann.
- From Rob Davey
- Over 400000 jobs have been run on our Galaxy server at Earlaham Iinstitute since 2015. Infrastructure win.
- Getting started with Galaxy on the cloud
- New step by step guide on howto upgrade a Galaxy Docker image from Rafa Hernández.