Galaxy: the first 5,000 pubs
The Galaxy Publication Library hits a milestone
By Dave Clements
October 16th 2017
We reached 5,000 publications in the Galaxy Publication Library last week. The library tracks publications that use, extend, implement or reference Galaxy or a Galaxy server. It includes journal articles, theses, and a couple of odds and ends. This milestone is a good opportunity to look at what the library tells us about where the Galaxy project has been, and maybe where it's going too.
The library was started December 2011, when the first 168 galaxy related publications were added and classified using 8 tags. This included all project publications plus every pub that ad hoc literature searches could find at the time. The library started on CiteULike and stayed there until September 2017, when we moved it to Zotero. The library grew to 4500 papers during that time.
The library uses tags to indicate how the publication relates to Galaxy. See below for an explanation and history of the tags.
Trends in the publication library reflect the trajectory of the Galaxy Project over the last 6 years.
The most obvious "trend" is that there are a lot of pubs using Galaxy in their methods. Just over half of all the publications mention Galaxy in their methods section. This trend doesn't show any sign of slowing down.
Not all Methods paper say which Galaxy instance(s) they used. But starting in 2013, papers that do mention this are also tagged with UseMain, UsePublic, UseLocal, and/or UseCloud tags (see Tags below for an explanation of all tags).
The relative number of UseMain and UsePublic pubs highlights the increasing availability of publicly accessible Galaxy servers. In 2013-2014, there are 2 1/2 times as many UseMain pubs as UsePublic pubs. In 2015 they were about the same, and in 2016-2017, there are nearly twice as many UsePublic pubs as there are UseMain pubs. This rise reflects the increase in available public servers from 21 servers at the start of 2012 to over 90 servers, and 6 services today.
The last trend I want to highlight is about Reproducibility. Reproducibility has been a core value of Galaxy since at least 2011. The Reproducibilty topic has seen a nearly 3 fold increase since then. There were 21 pubs in all of 2011-2013, compared to 53 pubs in 2017 thus far and Reproducibility has gone from 2.1% of papers to 5.7% of papers in the same time.
The number of publications that reference Galaxy each year has increased every year since the project started. It took over three and a half years to reach 2,500 publications but only a little over two more years to add the next 2,500 publications.
The library can also tell us which journals are most popular. Here's the top 20:
|3||Nucleic Acids Research||181|
|12||Briefings in Bioinformatics||46|
|13||Proceedings of the National Academy of Sciences||44|
|16||Future Generation Computer Systems||34|
|17||PLOS Computational Biology||32|
|20||Concurrency and Computation: Practice and Experience||29|
Eight of the top ten journals are open access. If you are curious about the remaining 1,045 journals they can be found here.
The 5,000th pub is
Variations in oral microbiota associated with oral cancer, Hongsen Zhao, Min Chu, Zhengwei Huang, Xi Yang, Shujun Ran, Bin Hu, Chenping Zhang & Jingping Liang. Scientific Reports 7, Article number: 11773 (2017) doi:10.1038/s41598-017-11779-9
Which is an exemplar 5,000th publication: It's a Methods paper, by far the most popular topic tag; a UsePublic paper, an ascendant topic tag; and a >Huttenhower paper, the most frequently referenced public server tag. And it's open access too. See the paper's zotero entry for more.
If current trends continue we'll hit 10,000 publications sometime in 2021. Look for the update.
Thanks for using Galaxy,
We've used Topic Tags since the beginning of the library to track how publications relate to Galaxy. Since the move to Zotero, we've also added Galaxy Featured Tags, Public Server Tags, and Publisher Tags. They are all explained here.
Topic tags indicate how the publication relates to Galaxy. Here's the current set and when each tag was added:
|+HowTo||Papers about how to use Galaxy for specific analyses. These are tutorials.||2011|
|+IsGalaxy||Publications about Galaxy itself or installations of Galaxy.||2011|
|+Methods||Uses Galaxy in their methods.||2011|
|+Other||Publications that don't fit well under any other tag.||2011|
|+Project||Publications with a Galaxy team member as an author.||2011|
|+Reproducibility||Reproducibility and persistence in science.||2011|
|+Shared||Publications that have published workflows, histories, datasets, pages, or visualizations in a Galaxy instance.||2011|
|+Workbench||Publication mentions Galaxy as a platform.||2011|
|+Tools||Tools that run in, have been ported to, or interact with Galaxy||2012|
|+Cloud||Publications referencing / extending / discussing Galaxy in a cloud context.||2013|
|+RefPublic||References a publicly accessible Galaxy instance or a Galaxy service. This is distinct from the +UsePublic tag.||2013|
|+Unknown||Publications that we know refer to Galaxy, but we aren't sure how because they are behind a paywall we don't have access to. These are revisited periodically.||2013|
|+UseCloud||Uses a custom built cloud based instance of Galaxy in its methods.||2013|
|+UseLocal||Uses a local installation of Galaxy in its methods.||2013|
|+UseMain||Uses the project's public server, usegalaxy.org (a.k.a. Main, in its methods.||2013|
|+UsePublic||Uses a publicly accessible Galaxy instance or a Galaxy service in its methods.||2013|
|+Visualization||Publications referencing Galaxy in a visualization and/or visual analytics context.||2013|
With the move to Zotero we added two new sets of tags. The first set is used to highlight publications that feature Galaxy prominently:
|+Galactic||Publication is about Galaxy.|
|+Stellar||Publication features Galaxy prominently.|
The second set of new tags show which public Galaxy server or service is used or discussed in publications. These are tagged with the server's name, preceded by a ">". For example, the >RepeatExplorer tag lists all papers that use or reference the RepeatExplorer public server.
Zotero is configured to also add any keywords it can detect automatically when the publication is added. These tags are not rationalized in any way, and tend to describe the research topic or domain. Prosapip1 and Genome evolution are examples.
These tags were added over a 6 year period. Are older papers back-tagged when new tags are added? Mostly not, but there are some exceptions:
- Galaxy Featured Tags exist back to the beginning of time. (These were converted from CiteULike's priority feature.)
- Topic and Public Server/Service tags have been applied to older publications on a selected basis.
Therefore, don't look for a lot of +UseMain or +Cloud tagged papers from before 2013.