November 2017 Tool Shed contributions

Galaxy ToolShed

Tools contributed to the Galaxy Project ToolShed in November 2017.

New Tools

  • From dktanwar:

  • From proteore:

  • From ylebrascnrs:

    • test: Tous les outils a tester.
  • From nml:

    • combine_assemblystats: Combine multiple Assemblystats datasets into a single tabular report. Will combine multiple reports from Assemblystats and convert into a single tabular result where each row represents a strain.
    • assemblystats: Summarise an assembly (e.g. N50 metrics). This version of assemblystats shows the input file name in the output. Also gives the user the option to choose how many files to output.
    • getmlst: Download MLST datasets by species from pubmlst.org.
  • From sandeep:

    • cluster: cluster. clustering directory.
  • From iuc:

    • onto_tk_get_parent_terms_by_relationship_type: Wrapper for the ONTO-toolkit suite: Get the terms. ONTO-Toolkit is a collection of Galaxy tools for managing ontologies that are represented in the OBO flat file format (spec 1.2 and 1.4).
    • onto_tk_get_ancestor_terms: Wrapper for the ONTO-toolkit suite: Get the ancestor terms of a given OBO term.
    • onto_tk_obo2rdf: Wrapper for the ONTO-toolkit suite: Convert OBO to RDF.
    • onto_tk_get_terms_by_relationship_type: Wrapper for the ONTO-toolkit suite: Get the terms that are related.
    • onto_tk_get_subontology_from: Wrapper for the ONTO-toolkit suite: Get subontology.
    • onto_tk_get_child_terms: Wrapper for the ONTO-toolkit suite: Get child terms.
    • onto_tk_get_relationship_id_vs_relationship_namespace: Wrapper for the ONTO-toolkit suite: Get all the relationship IDs and namespaces.
    • onto_tk_get_relationship_types: Wrapper for the ONTO-toolkit suite: Get all the relationship types.
    • onto_tk_get_descendent_terms: Wrapper for the ONTO-toolkit suite: Get the descendent terms.
    • onto_tk_term_id_vs_term_name: Wrapper for the ONTO-toolkit suite: Get all the term IDs and term names.
    • onto_tk_get_terms: Wrapper for the ONTO-toolkit suite: Get all terms.
    • onto_tk_get_term_synonyms: Wrapper for the ONTO-toolkit suite: Get all term synonyms.
    • onto_tk_get_relationship_id_vs_relationship_def: Wrapper for the ONTO-toolkit suite: Get all the relationship IDs and definitions.
    • onto_tk_get_root_terms: Wrapper for the ONTO-toolkit suite: Get the root terms.
    • onto_tk_get_relationship_id_vs_relationship_name: Wrapper for the ONTO-toolkit suite: Get all the relationship IDs and names.
    • onto_tk_term_id_vs_term_def: Wrapper for the ONTO-toolkit suite: Get all the term IDs and term definitions.
    • onto_tk_obo2owl: Wrapper for the ONTO-toolkit suite: Convert OBO to OWL.
    • onto_tk_get_parent_terms: Wrapper for the ONTO-toolkit suite: Get the parent terms.
    • data_manager_mothur_toolsuite: Data Manager for mothur reference data. Download reference data for Mothur tools.
    • minimap2: A fast pairwise aligner for genomic and spliced nucleotide sequences. Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full-genome alignment between two closely related species with divergence below ~15%. For ~10kb noisy reads sequences, minimap2 is tens of times faster than mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and GMAP. It is more accurate on simulated long reads and produces biologically meaningful alignment ready for downstream analyses. For >100bp Illumina short reads, minimap2 is three times as fast as BWA-MEM and Bowtie2, and as accurate on simulated data.
    • bctools_merge_pcr_duplicates: Merge PCR duplicates tool from the bctools package. bctools is a set of tools for handling barcodes and UMIs in NGS data. bctools can be used to merge PCR duplicates according to unique molecular barcodes (UMIs), to extract barcodes from arbitrary positions relative to the read starts, to clean up readthroughs into UMIs with paired-end sequencing and handles binary barcodes as used with uvCLAP and FLASH. License: Apache License 2.0.
    • bctools_remove_tail: Remove 3'-end nts tool from the bctools package. bctools is a set of tools for handling barcodes and UMIs in NGS data.
    • bctools_extract_crosslinked_nucleotides: Get crosslinked nucleotides tool from the bctools package. bctools is a set of tools for handling barcodes and UMIs in NGS data.
    • bctools_extract_barcodes: Extract barcodes tool from the bctools package. bctools is a set of tools for handling barcodes and UMIs in NGS data.
    • bctools_convert_to_binary_barcode: Create binary barcodes tool from the bctools package. bctools is a set of tools for handling barcodes and UMIs in NGS data.
    • bctools_remove_spurious_events: Remove spurious tool from the bctools package. bctools is a set of tools for handling barcodes and UMIs in NGS data.
    • bctools_extract_alignment_ends: Extract alignment ends tool from the bctools package. bctools is a set of tools for handling barcodes and UMIs in NGS data.
    • structure: for using multi-locus genotype data to investigate population structure. The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPS, microsatellites, RFLPs and AFLPs. https://web.stanford.edu/group/pritchardlab/structure.html.
    • valet: A pipeline for detecting mis-assemblies in metagenomic assemblies. VALET is a de novo pipeline for detecting all types of mis-assemblies in metagenomic data sets.
    • iqtree: Efficient phylogenomic software by maximum likelihood.
    • gdcwebapp: GDCWebApp automatically filter, extract, and convert genomic data from the Genomic Data Commons portal to BED format. GDCWebApp is a web service to automatically query, filter, extract and convert genomic data and clinical information from the Genomic Data Commons portal (GDC) to BED format. It is able to operate on all data types for each programs (TCGA and TARGET) available on GDC. The service is available at http://bioinf.iasi.cnr.it/gdcwebapp/.
    • disco: DISCO is a overlap-layout-consensus (OLC) metagenome assembler. DISCO, Distributed Co-assembly of Overlap graphs, is a multi-threaded and multi-process distributed memory overlap-layout-consensus (OLC) metagenome assembler capable of assembling various types of Illumina Reads.
    • edger: Perform RNA-Seq differential expression analysis using edgeR pipeline. Apply edgeR pipeline on a table of tab separated count data to generate HTML report for differential expression analysis. Report includes BCV, MDS and smear plots as well as summarised table of statistics on each gene.
    • colibread_discosnp_rad: DiscoSnpRAD (from the Colib'read tool suite). Kissplice, DiscoSNP and TakeABreak perform de novo variant identification and quantification. For these tools the general approach consists in 1) defining a model for the seeked elements; 2) detecting in one or several NGS datasets those elements that fit the model; 3) outputting those together with a score and their genomic neighborhood. Mapsembler focuses on sequences of interest within a micro targeted assembly, LorDec uses short reads for correcting third generation long reads, and finally Commet is dedicated to the comparison of numerous metagenomic read sets.
    • nonpareil: Estimate average coverage in metagenomic datasets. Nonpareil uses the redundancy of the reads in metagenomic datasets to estimate the average coverage and predict the amount of sequences that will be required to achieve "nearly complete coverage".
  • From romaingred:

  • From pjbriggs:

    • amplicon_analysis_pipeline: Analyse paired-end 16S rRNA data from Illumina Miseq. A Galaxy tool wrapper to Mauro Tutino's Amplicon_analysis pipeline at https://github.com/MTutino/Amplicon_analysis The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq (Casava >= 1.8) and performs: QC and clean up of input data; removal of singletons and chimeras and building of OTU table and phylogenetic tree; beta and alpha diversity analysis.
  • From wolma:

    • mimodd_snpeff: SnpEff-dependent functionality of MiModD. These tools require SnpEff installed and MiModD configured to use it. They enable the annotation of variants identified with the MiModD core tools with functional effects.
    • mimodd_core: The core tools of the MiModD suite of tools. These tools provide the core mapping-by-sequencing functionality of MiModD. Note that sequence reads have to be aligned to the corresponding reference genome before they can be analyzed. This can be done with any modern aligner of your choice or through the MiModD Read Alignment tool available from the separate repository mimodd_aln. Functional annotation of identified variants can be performed using SnpEff. MiModD-specific wrappers for SnpEff are available from the separate repository mimodd_snpeff though more general wrappers should be compatible, too.
    • mimodd_aln: The MiModD Read Alignment tool. This tool provides access to the SNAP read aligner integrated into MiModD.
  • From galaxyp:

    • msi_qualitycontrol: MSI Qualitycontrol. Creates a PDF file with qualitycontrol plots for mass spectrometry imaging data.
    • eggnog_mapper: eggnog-mapper fast functional annotation of novel sequences. eggnog-mapper is a tool for fast functional annotation of novel sequences \ (genes or proteins) using precomputed eggNOG-based orthology assignments https://github.com/jhcepas/eggnog-mapper.
    • msi_spectra_plot: MSI spectra plot. Creates mass-spectra plots for pixels of interest and provides an optional zoom in function for mass ranges of interest on mass spectrometry imaging data.
    • msi_preprocessing: MSI Preprocessing. Creates a Cardinal MSImageSet saved as RData with preprocessed mass spectra for mass spectrometry imaging data.
    • msi_ion_images: msi ion images. Creates intensity heatmaps for the distribution of ions/masses of interest in mass spectrometry imaging data.
  • From bioitcore:

    • transcriptomics_easy_for_discovery_toolkit: A comprehensive and standardized approach for transcriptomic profiling as a clinically-oriented application. A comprehensive and standardized approach for transcriptomic profiling as a clinically-oriented application.
  • From lijing:

    • bubio: dnapars. Customized tools for CETI projects. Customized tools for CETI projects.
  • From richfrommich:

  • From bgruening:

    • glimmer_build_icm: Glimmer ICM builder (from the Glimmer tool suite). Glimmer makes gene predictions based on an interpolated context model (ICM).
    • glimmer_glimmer_to_gff: Convert Glimmer to GFF (from the Glimmer tool suite). Glimmer makes gene predictions based on an interpolated context model (ICM).
    • glimmer_extract: Extract sequence regions (from the Glimmer tool suite). Glimmer makes gene predictions based on an interpolated context model (ICM).
    • protein_properties: Calculation of various properties from given protein sequences. This tool will calculate several properties for each given input sequence.
    • hicexplorer_hicmergematrixbins: Wrapper for HiCExplorer: hicMergeMatrixBins. Sequencing techniques that probe the 3D organization of the genome generate large amounts of data whose processing, analysis and visualization is challenging. Here, we present Hi-C Explorer, a set of tools for the analysis and visualization of chromosome conformation data. Hi-C explorer facilitates the creation of contact matrices, correction of contacts, TAD detection, merging, reordering or chromosomes, conversion from different formats and detection of long-range contacts. Moreover, it allows the visualization of multiple contact matrices along with other types of data like genes, compartments, ChIP-seq coverage tracks (and in general any type of genomic scores) and long range contacts. doi: 10.5281/zenodo.159780 Repository-Maintainer: Björn Grüning https://github.com/maxplanck-ie/HiCExplorer.
    • glimmer_not_knowledge_based: Glimmer3 (from the Glimmer tool suite). Glimmer makes gene predictions based on an interpolated context model (ICM).
    • glimmer_knowledge_based: Glimmer3 (from the Glimmer tool suite). Glimmer makes gene predictions based on an interpolated context model (ICM).
    • glimmer_acgt_content: ACGT Content (from the Glimmer tool suite). Glimmer makes gene predictions based on an interpolated context model (ICM).
    • join_files_on_column_fuzzy: Join two files on a common column, allowing a certain difference. Join two files on a common column. You can provide the allowed difference between both values (currently only numbers) as the absolute differece or as PPM.
    • glimmer_gbk_to_orf: Extract ORF (from the Glimmer tool suite). Glimmer makes gene predictions based on an interpolated context model (ICM).
    • glimmer_long_orfs: Glimmer long ORFs (from the Glimmer tool suite). Glimmer makes gene predictions based on an interpolated context model (ICM).