August 2017 Tool Shed contributions

[Galaxy ToolShed](http://toolshed.g2.bx.psu.edu/)

Tools contributed to the Galaxy Project ToolShed in August 2017.

New Tools

  • From lecorguille:

  • From galaxyp:

    • openms_featurelinkerunlabeledkd: Wrapper for the OpenMS suite tool: FeatureLinkerUnlabeledKD. OpenMS is an open-source software C++ library for LC/MS data management and analyses. It offers an infrastructure for the rapid development of mass spectrometry related software. https://www.openms.de/.
    • openms_spectrastsearchadapter: Wrapper for the OpenMS suite tool: SpectraSTSearchAdapter. OpenMS is an open-source software C++ library for LC/MS data management and analyses. It offers an infrastructure for the rapid development of mass spectrometry related software. https://www.openms.de/.
    • openms_databasefilter: Wrapper for the OpenMS suite tool: DatabaseFilter. OpenMS is an open-source software C++ library for LC/MS data management and analyses. It offers an infrastructure for the rapid development of mass spectrometry related software. https://www.openms.de/.
    • openms_targetedfileconverter: Wrapper for the OpenMS suite tool: TargetedFileConverter. OpenMS is an open-source software C++ library for LC/MS data management and analyses. It offers an infrastructure for the rapid development of mass spectrometry related software. https://www.openms.de/.
    • openms_rnpxlsearch: Wrapper for the OpenMS suite tool: RNPxlSearch. OpenMS is an open-source software C++ library for LC/MS data management and analyses. It offers an infrastructure for the rapid development of mass spectrometry related software. https://www.openms.de/.
    • idpquery: Bumbershoot IDPicker idpQuery. idpQuery creates customizable text reports from idpDB files.
  • From saskia-hiltemann:

    • ireport: create HTML reports for Galaxy workflows. create interactive HTML reports from galaxy outputs.
  • From davidvanzessen:

  • From mingchen0919:

    • rmarkdown_mirdeep2: R Markdown based tool wrapper for the mirdeep2. R Markdown based tool wrapper for the mirdeep2.
    • rmarkdown_i_adhore: Generate i-adhore configure file. R Markdown based tool wrapper for generating i-adhore configure file.
    • rmarkdown_fastqc_report: R Markdown based Fastqc wrapper. Implements FastQC analysis and display results in R Markdown html.
    • rmarkdown_deseq2: R Markdown based DESeq2 wrapper. Implements DESeq2 analysis and display results in R Markdown html.
    • rmarkdown_fastqc_site: R Markdown based Fastqc wrapper. Implements FastQC analysis and display results in R Markdown website.
    • rmarkdown_wgcna: R Markdown based WGNCA wrapper. Implements WGNCA analysis and display results in R Markdown html.
  • From greg:

    • insect_phenology_model: Contains a tool that provides an agent-based stochastic model expressing stage-specific phenology and population dynamics for an insect species across geographic regions. Contains a tool that provides an agent-based stochastic model expressing stage-specific phenology and population dynamics for an insect species across geographic regions.
  • From artbio:

    • repenrich: Repeat element profiling. RepEnrich is a method to estimate repetitive element enrichment using high-throughput sequencing data.
    • small_rna_signatures: Computes the tendency of small RNAs to overlap with each other. Compute the tendency of small RNAs to overlap with each others for detailed information, see C. Antoniewski, “Computing siRNA and piRNA Overlap Signatures.,” Methods Mol. Biol., vol. 1173, no. 12, pp. 135–146, 2014.
    • fishertest: Fisher's exact test on two-column hit lists. Fisher's exact test on two-column hit lists.
    • fetch_fasta_from_ncbi: Fetch fasta sequences from NCBI using eutils wrappers. Fetch fasta sequences from NCBI using eutils wrappers.
  • From md-anderson-bioinformatics:

    • heat_map_creation_advanced: Initial Version of Advanced Heat Map Tool. Advanced Tool for creating Next-Generation Clustered Heat Maps (NG-CHM). This tool is similar to our heat_map_creation tool but has many advanced options not provided in the basic tool. Similar to the basic tool, heat_map_creation_advanced is used to generate a clustered Heat Map from a data matrix, with many options for clustering. The input matrix is required to have labels in the first column and the first row containing. For example, column headers could be patient IDs and row headers (first column) could contain gene symbols. Covariate files add additional information bars to the heat map. For example, a patients smoking status could be provided as a covariate file. Any input covariate bar files must have the same row or column labels as in the input matrix to associate the covariate information with the appropriate row or column. The output is a compressed ngchm file that can be displayed in the NG-CHM viewer. To access the viewer in Galaxy, use the visualize icon at the bottom of the Galaxy History NG-CHM tool output file. Expand the History output file, then at the bottom are several icons the order being -- the save icon, information "I", rerun, then the Visualization (a chart looking icon). Hover over it, and select the 'NG-CHM Heat Map Viewer' option. The Heat Map will display in the Galaxy middle pane. Additionally, the advanced tool allows for definition of row/column data types (e.g. Hugo Gene Symbol) and map attributes that enable the viewer to provide link outs from row/column labels to external resources with specific information about the selected row/columns (e.g. Gene Cards). The advanced tool is also capable of generating a covariate bar that identifies primary groupings based on the clustering dendrograms. Please see our YouTube tutorial video for an overview of creating NG-CHM heat maps in Galaxy https://www.youtube.com/watch?v=v1VCJJti8GM\&list=PLIBaINv-Qmd05G3Kj7SbBbSAPZrG-H5bq\&index=3  Full Documentation: http://bioinformatics.mdanderson.org/main/NG-CHM-V2:Overview The Galaxy visualization component does not install automatically. If you have not already installed the NG-CHM viewer with the heat_map_creation tool, you will need to run the following commands in terminal mode: NOTE: The following assumes /galaxy-central is the home directory for your Galaxy instance, otherwise replace /galaxy-central with your Galaxy instance's root directory. 1) mv /galaxy-central/../shed_tools/toolshed.g2.bx.psu.edu/repos/md-anderson-bioinformatics/heat_map_creation/*/heat_map_creation/mda_heatmap_viz.zip  /galaxy-central/config/plugins/visualizations/ 2) cd /galaxy-central/config/plugins/visualizations/ 3) unzip mda_heatmap_viz.zip Then you must restart Galaxy for the visualization portion to take effect. 4) cd /galaxy-central 5) sh run.sh    Or if using a docker instance 4b) docker stop  \ 5b) docker start  \ Remember you MUST be logged in to be able to see the heat map visualization component icon.   The NG-CHM Generator will run even if you are not logged into Galaxy.
  • From eschen42:

    • w4mkmeans: [W4M][Metabolomics]Calculate k-means for samples or features from data matrix. Using the intensities in the dataMatrix, calculate the k-means clusters for samples or features and add them as columns to sampleMetadata or featureMetadata, respectively.
  • From clifinder:

    • clifinder: Identification of L1 Chimeric Transcripts in RNA-seq data. L1 Chimeric Transcripts (LCTs) are transcribed from LINE 1 antisense promoter and include the L1 5’UTR sequence in antisense orientation followed by the adjacent genomic region. CLIFinder is a Galaxy tool, specifically designed to identify potential LCTs from one or several oriented RNA-seq paired-end reads in the human genome. CLIFinder is customizable to detect transcripts initiated by different types of repeat elements.
  • From dktanwar:

    • dktanwar: Scripts converted to tools to be used for Sperm Histone ChIP-Seq data analysis.
  • From iuc:

    • iwtomics_testandplot: Interval-Wise Testing for Omics Data. Implementation of the Interval-Wise Testing (IWT) for omics data. This inferential procedure tests for differences in "Omics" data between two groups of genomic regions (or between a group of genomic regions and a reference center of symmetry), and does not require fixing location and scale at the outset.
    • humann2_rna_dna_norm: Wrapper for the humann2 tool suite: Normalize. HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question "What are the microbes in my community-of-interest doing (or capable of doing)?".
    • enasearch_search_data: Wrapper for the ENASearch tool suite: Search ENA data. ENASearch is a Python library for interacting with ENA's API.
    • idr: Galaxy wrappers for the IDR package from Nathan Boleu. idr is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
    • iwtomics_loadandplot: Interval-Wise Testing for Omics Data. Implementation of the Interval-Wise Testing (IWT) for omics data. This inferential procedure tests for differences in "Omics" data between two groups of genomic regions (or between a group of genomic regions and a reference center of symmetry), and does not require fixing location and scale at the outset.
    • ggplot2_point: Scatterplot w ggplot2 tool from the ggplot package. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • kallisto_quant: Kallisto quant function from the kallisto package. kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
    • kallisto_pseudo: Kallisto pseudo function from the kallisto package. kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
    • umi_tools_group: Wrapper for the UMI-tools suite tool: UMI-tools group. Extract UMI barcode from a read and add it to the read name, leaving any sample barcode in place. Can deal with paired end reads and UMIs split across the paired ends.
    • enasearch_retrieve_data: Wrapper for the ENASearch tool suite: Retrieve ENA data. ENASearch is a Python library for interacting with ENA's API.
    • reshape2_cast: cast tool from the reshape2 package. Flexibly restructure and aggregate data using just the two functions melt and dcast.
    • humann2_unpack_pathways: Wrapper for the humann2 tool suite: Unpack pathway abundances to show genes included. HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question "What are the microbes in my community-of-interest doing (or capable of doing)?".
    • humann2_barplot: Wrapper for the humann2 tool suite: Barplot. HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question "What are the microbes in my community-of-interest doing (or capable of doing)?".
    • ggplot2_histogram: Histogram w ggplot2 tool from the ggplot package. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • tsne: T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation. T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation.
    • circos: Build Circos Plots in Galaxy. Allows for creation of circos plots from a Galaxy tool, offering the ability to create novel circos plots depending on the needs of the researcher and their workflow.
    • iwtomics_plotwithscale: Interval-Wise Testing for Omics Data. Implementation of the Interval-Wise Testing (IWT) for omics data. This inferential procedure tests for differences in "Omics" data between two groups of genomic regions (or between a group of genomic regions and a reference center of symmetry), and does not require fixing location and scale at the outset.
    • humann2_genefamilies_genus_level: Wrapper for the humann2 tool suite: Create a genus level gene families file. HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question "What are the microbes in my community-of-interest doing (or capable of doing)?".
    • humann2_strain_profiler: Wrapper for the humann2 tool suite: Make strain profiles. HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question "What are the microbes in my community-of-interest doing (or capable of doing)?".
    • codeml: Detects positive selection. codeml is from the paml package which gathers tools for phylogenetic analysis by maximum likelihood.
    • join_files_by_id: This tool will join datasets according to a column with identifier. This tool can be used to join count tables of different libraries by using an identifier column which could be a sequence or GeneID.
    • enasearch_retrieve_analysis_report: Wrapper for the ENASearch tool suite: Retrieve an analysis report. ENASearch is a Python library for interacting with ENA's API.
    • enasearch_retrieve_taxons: Wrapper for the ENASearch tool suite: Retrieve ENA taxon data. ENASearch is a Python library for interacting with ENA's API.
    • humann2_split_stratified_table: Wrapper for the humann2 tool suite: Split stratified table. HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question "What are the microbes in my community-of-interest doing (or capable of doing)?".
    • humann2_associate: Wrapper for the humann2 tool suite: Associate. HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question "What are the microbes in my community-of-interest doing (or capable of doing)?".
    • ggplot2_violin: Violin plot w ggplot2 tool from the ggplot package. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • qiime_summarize_taxa: Wrapper for the qiime tool suite: Summarize taxa. "QIIME: Quantitative Insights Into Microbial Ecology QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data".
    • picrust_metagenome_contributions: Wrapper for picrust application: Metagenome Contributions. PICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.
    • umi_tools_extract: Wrapper for the UMI-tools suite tool: UMI-tools extract. Extract UMI barcode from a read and add it to the read name, leaving any sample barcode in place. Can deal with paired end reads and UMIs split across the paired ends.
    • enasearch_retrieve_run_report: Wrapper for the ENASearch tool suite: Retrieve a run report. ENASearch is a Python library for interacting with ENA's API.
    • data_manager_kallisto_index_builder: Pre-generate indexes for kallisto. kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate as existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools.
    • reshape2_melt: melt tool from the reshape2 package. Flexibly restructure and aggregate data using just the two functions melt and dcast.
    • meme_psp_gen: MEME psp-gen tool from the meme package. The MEME Suite supports motif-based analysis of DNA, RNA and protein sequences. It provides motif discovery algorithms using both probabilistic (MEME) and discrete models (MEME), which have complementary strengths. It also allows discovery of motifs with arbitrary insertions and deletions (GLAM2). In addition to motif discovery, the MEME Suite provides tools for scanning sequences for matches to motifs (FIMO, MAST and GLAM2Scan), scanning for clusters of motifs (MCAST), comparing motifs to known motifs (Tomtom), finding preferred spacings between motifs (SpaMo), predicting the biological roles of motifs (GOMo), measuring the positional enrichment of sequences for known motifs (CentriMo), and analyzing ChIP-seq and other large datasets (MEME-ChIP). The MEME Suite is comprised of a collection of tools that work together.
    • barrnap: Contains the Barrnap tool for finding ribosomal RNAs in FASTA sequences. Barrnap predicts the location of 5S, 16S and 23S ribosomal RNA genes in Bacterial genome sequences. Barrnap now supports Archaea, Eukaryota and Mitochondria.
    • qiime_extract_barcodes: Wrapper for the qiime tool suite: Format Fastq sequences and barcode data. "QIIME: Quantitative Insights Into Microbial Ecology QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data".
    • ggplot2_heatmap2: heatmap2 tool from the ggplot package. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • From pravs:

    • remove_fasta_subsequences: Removes sequences that are subsequence in a reference Fasta File. This program removes the sequences from the query fasta file that are present as subsequence in a reference fasta file. Input: Reference and Query Fasta Files Output: Fasta File with sequences that are unique to Query Fasta File.