Members of the Galaxy team and the Language Applications (LAPPS) Grid are planning to submit a proposal to NIH that would focus on fully integrating LAPPS Grid natural language processing (NLP) tools into Galaxy. The aim is to augment Galaxy in order to provide a "one-stop" platform for searching scientific publications, importing results to Galaxy, and using NLP technologies to extract information from publications. Further, tools to transduce extracted results into formats suitable for ingestion by Galaxy analytic tools will be developed.
A key feature is interoperability among tools at each step, from publication database query through NLP analysis through potential analysis. Imagine that from within Galaxy, the user can query a publication database and select relevant articles, then effectively "push a button" to import the texts (extracted from pdf) and extract information from them, apply any of a variety of NLP tools ad/or pre-defined workflows, transform output of NLP tools formats usable by Galaxy analytic tools, etc. We recognize that some of the individual capabilities we hope to provide are available from other sources; our goal is to enable Galaxy users from any scientific domain, to easily perform any or all of the various steps without wrestling with different applications/tools, formats, etc.
Galaxy already includes many LAPPS Grid NLP tools and has a set of generic Machine Learning functions, but the capabilities would be significantly expanded to provide additional functionality, such as creation of custom corpora addressing a specific topic, etc.; support for creation of derived data such as term and relation databases; transduction of output of NLP tools to formats common to Galaxy analytic tools; and, crucially, ability to store and share corpora, extracted data, and similar resources within Galaxy.
To help prepare the proposal, please complete this survey if you are interested in applying NLP to your own work in Galaxy.
Thanks for participating - your ideas will help us make Galaxy better.
Nancy Ide and Dan Blankenberg