September 2017 Tool Shed contributions

Galaxy ToolShed

Tools contributed to the Galaxy Project ToolShed in September 2017.

New Tools

  • From cristian:

    • notos: Notos suite. Notos is a suite that calculates CpN o/e ratios (e.g., the commonly used CpG o/e ratios) for a set of nucleotide sequences and uses Kernel Density Estimation (KDE) to model the obtained distribution. It consists of two programs, CpGoe.pl is used to calculate the CpN o/e ratios and KDEanalysis.r estimates the model.
  • From ylebrascnrs:

    • structure: structure. model-based clustering method for inferring population structure using genotype data.
  • From lnguyen:

    • topgo: A tool for enrichment analysis based on topGO R package.
    • venn_diagram_plotter: A tool for comparing up to 6 ID lists with Venn Diagrams (based on Jvenn).
    • filter_keywords_values: ProteoRE - A tool filters a file by keywords or values.
    • id_converter: A tool converts identifiers which are of a different type/source to another type of identifiers and create the identifier lists.
    • link2reactome: A tool maps your IDs list (UniProt, Gene name) via analysis tools from Reactome database and visualize directly pathways in which your proteins are involved.
    • sort_by_tissue: A tool for selecting/discarding proteins according to their expression profiles (absence/presence) in a list of tissues/organs using Human Protein Atlas.
    • proteore_goprofiles: A tool for identifying enriched biological themems, GO terms from your protein list.
  • From rhpvorderman:

    • data_manager_select_index_by_path: A data manager to register already build indexes in Galaxy's data tables. Provide a link to a already build index and register it in galaxy. This tool aims to provide the same functionality for all indexes that have a data table that follows the value,dbkey,name,path layout. This is a fork of the data_manager_all_fasta_by_path data manager by Cristian-B . The all_fasta_by_path data manager was forked on 2017-09-07 from Christian-B's galaxy_shedtools repository at commit d9f5343.
  • From saskia-hiltemann:

    • ireport: create HTML reports for Galaxy workflows. create interactive HTML reports from galaxy outputs.
  • From bioitcore:

    • splicetrap: SpliceTrap A statistic tool for quantifying exon inclusion ratios in paired-end RNA-seq data, with broad applications for the study of alternative splicing.
    • chimerascan: A tool for identifying chimeric transcription in sequencing data.
  • From caleb-easterly:

    • prepare_revigo: cut columns from tabular data and define as the revigo datatype.
  • From mingchen0919:

    • rmarkdown_collection_builder: R Markdown based dataset collection builder. Create different types of dataset collection from files in Galaxy history.
    • rmarkdown_fastq_dump: R Markdown based fastq-dump wrapper. Download and extract reads in fastq/fasta format from NCBI SRA and output a list or list:paired dataset collection.
  • From nml:

    • smalt: SMALT aligns DNA sequencing reads with a reference genome. SMALT employs a hash index of short words up to 20 nucleotides long and sampled at equidistant steps along the reference genome. For each sequencing read, potentially matching segments in the reference genome are identified from seed matches in the index and subsequently aligned with the read using dynamic programming.
    • smalt_index: SMALT aligns DNA sequencing reads with a reference genome.
    • smalt_map: SMALT aligns DNA sequencing reads with a reference genome.
    • seqtk_nml: Tool to downsample fastq reads. Tool to downsample fastq reads to a certain times coverage based on a given reference fasta file. Tool uses Seqtk sample for actually downsampling.
    • bio_hansel: Heidelberg and Enteritidis SNP Elucidation. bio_hansel - Heidelberg And eNteritidis Snp ELucidation - Subtype Salmonella enterica subsp. enterica serovar Heidelberg and Enteritidis genomes using in-silico 33 bp k-mer SNP subtyping schemes developed by Genevieve Labbe et al.
  • From galaxyp:

    • validate_fasta_database: runs Compomics database identification tool on any FASTA database, and separates valid and invalid entries based on a series of checks.
  • From artbio:

    • probecoverage: computes and plots read coverage of genomic regions by sequencing datasets. Computes read coverage of genomic regions by sequencing datasets using bedtools multicov. Plot data as cumulative distribution of regions with coverage > x This tool is adapted to quality control of sequencing of libraries enriched with capture probes.
    • sr_bowtie: bowtie wrapper tool to align small RNA sequencing reads. Bowtie wrapper tool to align small RNA sequencing reads. This tool belongs to the mississippi tool suite.
    • sequence_format_converter: various fasta to tabular conversions. sequence_format_converter performs all pairwise conversions between sequence formats fasta, fastaw and tabular. sequence_format_converter is also able to convert fastq format in any of the formats fasta, fastaw and tabular.
    • sr_bowtie_dataset_annotation: Maps iteratively small RNA sequencing datasets to reference sequences. Maps iteratively small RNA sequencing datasets to reference sequences, in order to generate annotation (i.e. number of aligned reads for each reference) of these datasets.
    • cap3: cap3 wrapper. cap3 wrapper. CAP3: A DNA sequence assembly program. Huang, X. and Madan, A. (1999) CAP3: A DNA sequence assembly program. Genome Res., 9, 868-877. http://seq.cs.iastate.edu/.
    • justdiff: Unix diff. Returns the output of the unix diff command between two files.
  • From iuc:

    • ucsc_fasplit: faSplit is a tool to split a single FASTA file into several files.
    • flash: Fast Length Adjustment of SHort reads. FLASH is an accurate and fast tool to merge paired-end reads that were generated from DNA fragments whose lengths are shorter than twice the length of reads.
    • crossmap_gff: Wrapper for the CrossMap tool suite: CrossMap GFF. CrossMap is versatile tool to convert genome coordinates or annotation files between genome assemblies. It supports mostly commonly used file types, including BAM, BED,BigWig, GFF, GTF, SAM, Wiggle, and VCF formats. For large plain text file types, such as BED, GFF, GTF and VCF, reading from remote servers and file compression are supported.
    • fraggenescan: Tool for finding (fragmented) genes in short read. FragGeneScan is an application for finding (fragmented) genes in short reads. It can also be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.
    • pizzly: Pizzly is a program for detecting gene fusions from RNA-Seq data of cancer samples. Pizzly is a program for detecting gene fusions from RNA-Seq data of cancer samples. It uses pseudoalignment and requires running Kallisto with the --fusion parameter (available in version 0.43.1 or later) on paired-end reads. Pizzly also requires the reference transcriptome in FASTA format as well as a GTF file describing the transcriptome. The Ensembl transcriptomes are recommended.
    • megahit: Galaxy wrapper for Megahit version 1.1.2. MEGAHIT is a single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly. MEGAHIT can optionally utilize a CUDA-enabled GPU to accelerate its SdBG contstruction. The GPU-accelerated version of MEGAHIT has been tested on NVIDIA GTX680 (4G memory) and Tesla K40c (12G memory) with CUDA 5.5, 6.0 and 6.5. MEGAHIT v1.0 or greater also supports IBM Power PC and has been tested on IBM POWER8.
  • From rlegendre:

    • ribo_tools: A Galaxy toolbox for the analysis of ribosome profiling (Ribo-seq) data. Ribosome profiling provides genome-wide information about translational regulation. However, there is currently no standard tool for the qualitative analysis of Ribo-seq data. We present here RiboTools, a Galaxy toolbox for the analysis of ribosome profiling (Ribo-seq) data. It can be used to detect translational ambiguities, stop codon readthrough events and codon occupancy. It provides a large number of plots for the visualisation of these events.
  • From gga:

tool_dependency_definition

  • From iuc:

    • package_blast_plus_2_6_0: via website. NCBI BLAST+ 2.6.0 (binaries only). This Tool Shed package is intended to be used as a dependency of the Galaxy wrappers for NCBI BLAST+ and any other tools which call the BLAST+ binaries internally. Note that for compatibility with BioConda, internally this is now called "blast" rather than "blast+" as in the older Galaxy BLAST+ packages.