← Back to news

The Language Applications (LAPPS) Grid Seeks Input from Scientists Who Want to Mine Scientific Literature


The Language Applications (LAPPS) Grid provides an infrastructure for rapid development of natural language processing applications (NLP) that uses the Galaxy platform as its workflow engine. The LAPPS Grid has integrated a wide range of NLP tools and resources, including popular public tools such as StanfordNLP, OpenNLP, NLTK, LingPipe, etc., into Galaxy and provided for using them interoperably in a “plug-and-play” environment.

The LAPPS Grid team intends to seek funding for a project that would create customizable NLP applications that can be used to mine scientific literature, in response to requests from scientists in several disciplines who want to extract entities, relations, networks, and ontologies from scientific publications and identify articles in the scientific literature that have treated particular topics or entities. While some text mining facilities for the biomedical domain have been developed over the past few years, there is far less support for text mining in the other sciences. Furthermore, most available software requires considerable skill to use effectively, especially where customization to a particular task or topic is required.

The project we plan to propose will develop several out-of-the-box workflows for information extraction and provide facilities to adapt them to data for specific disciplines. We will also upload major scientific publication databases (PubMed, BioMed, PLoS, etc. as permissions allow) to Jetstream on a daily or weekly basis and enable queries with Apache Solr. Finally, we will provide online tutorials as well as one-day workshops to bring scientists up to speed on which tools are best suited to their tasks and how to use them within Galaxy. We hope to include members of the scientific community as (funded) partners in the project to suggest and test use cases and ultimately demonstrate the use of the LAPPS Grid in their own research.

To support our effort, the LAPPS Grid project seeks input from members of the scientific community, in order to ascertain the extent of the need for the capabilities we hope to provide and, eventually, to justify our funding request. To that end, we would like to ask anyone who has an interest in the capabilities described here, or would be interested in being a partner in the project, to send an email indicating your support and, if possible, your reasons for considering this to be a worthy project. Please send the email to ide@cs.vassar.edu.

Many thanks for your help!

Nancy Ide
Professor of Computer Science
Vassar College