March 2017 Tool Shed contributions

Tools contributed to the Galaxy Project Tool Shed in March 2017.

New Tools

unrestricted

  • From bgruening:

  • From mvdbeek:

    • damidseq_core: An automated pipeline for processing DamID sequencing datasets. Processing DamID-seq data involves extending single-end reads, aligning the reads to the genome and determining the coverage, similar to processing regular ChIP-seq datasets. However, as DamID data is represented as a log2 ratio of (Dam-fusion/Dam), normalisation of the sample and Dam-only control is necessary and adding pseudocounts to mitigate the effect of background counts is highly recommended. damidseq_pipeline is a single script that automatically handles sequence alignment, read extension, binned counts, normalisation, pseudocount addition and final ratio file generation. The script uses FASTQ or BAM files as input, and outputs the final log2 ratio files in bedGraph (or optionally GFF) format. The output ratio files can easily be converted to TDF for viewing in IGV using igvtools. The files can be processed for peak calling using find_peaks or, if using RNA pol II DamID, transcribed genes can be determined using polii.gene.call.
  • From rnateam:

    • rnalien: RNAlien unsupervized RNA family model construction. Determining the function of a non-coding RNA requires costly and time-consuming wet-lab experiments. For this reason, computational methods which ascertain the homology of a sequence and thereby deduce functionality and family membership are often exploited. In this fashion, newly sequenced genomes can be annotated in a completely computational way. Covariance models are commonly used to assign novel RNA sequences to a known RNA family. However, to construct such models several examples of the family have to be already known. Moreover, model building is the work of experts who manually edit the necessary RNA alignment and consensus structure. Our method, RNAlien, starting from a single input sequence collects potential family member sequences by multiple iterations of homology search. RNA family models are fully automatically constructed for the found sequences. We have tested our method on a subset of the Rfam RNA family database. RNAlien models are a starting point to construct models of comparable sensitivity and specificity to manually curated ones from the Rfam database. RNAlien Tool and web server are available at.
    • selectsequencesfrommsa: SelectSequences - selects representative entries from a multiple sequence alignment in clustal format. Tool to select representative sequences from a multiple sequence alignment in clustal format. Useful before running RNAz, RNAcode, RNAalifold on alignments with many entries.
  • From earlhaminst:

    • ensembl_longest_cds_per_gene: Select longest CDS per gene from Ensembl CDS FASTA.
    • gstf_preparation: GeneSeqToFamily preparation converts data for the workflow. Converts a set of GFF3 and/or JSON gene feature information datasets into SQLite format and modify the header lines of a corresponding CDS FASTA to be used with the GeneSeqToFamily workflow.
  • From ethevenot:

    • profia: [W4M][Metabolomics][FIA-HRMS] Preprocessing of Flow Injection Analysis coupled to High-Resolution Mass Spectrometry (FIA-HRMS) data. "Flow Injection Analysis coupled to High-Resolution Mass Spectrometry (FIA-HRMS)" is a promising approach for "high-throughput metabolomics". FIA- HRMS data, however, cannot be preprocessed with current software tools which rely on liquid chromatography separation, or handle low resolution data only. The "proFIA module" is a workflow allowing to preprocess FIA-HRMS raw data in "centroid" mode and open format (netCDF, mzData, mzXML, and mzML), and generates the table of peak intensities ("peak table"). The workflow consists in "peak detection and quantification" within individual sample files, followed by "alignment" between files in the mz dimension, and "imputation" of the missing values in the final peak table (Delabriere et al., submitted). For each ion, the graph representing the intensity as a function of time is called a "flowgram". A flowgram can be modeled as I = kP + ME(P) + B + e, where k is the response factor (corresponding to the ionization properties of the analyte), P is the "sample peak" (normalized profile which is common for all analytes from a sample and depends on the flow injection conditions only), ME is the "matrix effect", B is the "solvent baseline", and e is the heteroscedastic noise. The generated peak table is available in the "3 table" W4M tabular format ("dataMatrix", "sampleMetadata", and "variableMetadata") for downstream statistical analysis and annotation with W4M modules. A figure provides "diagnostics" and visualization of the preprocessed data set.
  • From xuebing:

    • kplogo: kpLogo. kpLogo: k-mer probability logo for positional k-mer analysis.
  • From nml:

    • sistr_cmd: SISTR in silico serotyping tool.
  • From iuc:

    • metaphlan_hclust_heatmap: Wrapper for the metaphlan2 tool suite: Generate heatmap. MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data with species level resolution.
    • humann2_reduce_table: Wrapper for the humann2 tool suite: Reduce. HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). This process, referred to as functional profiling, aims to describe the metabolic potential of a microbial community and its members. More generally, functional profiling answers the question "What are the microbes in my community-of-interest doing (or capable of doing)?".
    • humann2: Wrapper for the humann2 tool suite: HUMAnN2.
    • export2graphlan: export2graphlan is a conversion software tool for producing both annotation and tree file for GraPhlAn.
    • metaphlan2krona: Wrapper for the metaphlan2 tool suite: Format MetaPhlAn2. MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data with species level resolution.
    • graphlan_annotate: Wrapper for the GraPhlAn tool suite: Generation, personalization and annotation of tree.
    • bayescan: Detecting natural selection from population-based genetic data.
    • data_manager_metaphlan2_database_downloader: MetaPhlAn for Metagenomic Phylogenetic Analysis. MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data with species level resolution.
    • data_manager_humann2_database_downloader: HUMAnN2 for functionally profiling metagenomes and metatranscriptomes at species-level resolution.
    • humann2_split_table: Wrapper for the humann2 tool suite: Split.
    • humann2_regroup_table: Wrapper for the humann2 tool suite: Regroup.
    • multigps: MultiGPS is a framework for analyzing collections of multi-condition ChIP-seq datasets and characterizing differential binding events between conditions. MultiGPS is a framework for analyzing collections of multi-condition ChIP-seq datasets and characterizing differential binding events between conditions. In analyzing multiple-condition ChIP-seq datasets, MultiGPS encourages consistency in the reported binding event locations across conditions and provides accurate estimation of ChIP enrichment levels at each event.
    • humann2_rename_table: Wrapper for the humann2 tool suite: Rename.
    • humann2_renorm_table: Wrapper for the humann2 tool suite: Renormalize.
    • graphlan: Wrapper for the GraPhlAn tool suite: GraPhlAn.
    • humann2_join_tables: Wrapper for the humann2 tool suite: Join.
    • trinity_contig_exn50_statistic: Compute contig Ex90N50 statistic and Ex90 transcript count (from the Trinity tool suite). Trinity represents a method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data https://github.com/trinityrnaseq/trinityrnaseq.
    • metaphlan2: Wrapper for the metaphlan2 tool suite: MetaPhlAn2. MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data with species level resolution.
    • merge_metaphlan_tables: Wrapper for the metaphlan2 tool suite: Merge. MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data with species level resolution.
  • From antmarge:

    • regionfitness: Calculates fitness over a region for Tn-Seq data.
    • dataoverview: Tn-Seq analysis tool for getting data overview.
    • singlefitness: single fitness aggregate for Tn-Seq data. Calculates fitness for single mutation aggregated across several libraries. Developed for Tn-Seq data.
    • compgenes: Tn-Seq tool to compare gene fitness from two aggregate fitness files.
    • compstrains: Tn-Seq tool for comparing strains from aggregate fitness.
    • compregions: Tn-Seq analysis for comparing regions created by regionfitness.
  • From jamille:

  • From mingchen0919:

  • From marie-tremblay-metatoul:

  • From drosofff:

    • fishertest: Fisher's exact test on two-column hit lists. Fisher's exact test on two-column hit lists.
  • From chaimae_eljaouhari:

    • basicplot: Graphics. Take on tabular file of numerical data as input and produces pairwise plots of numerical data, in log-log scale.
  • From jjohnson:

    • split_tabular_columns: Split list colomns to normalize tabular files. Normalize tabular files which have columns with lists. Rows with lists in a column will be duplicated for each item in the list. The target use case is for proteomics Peptide Spectrum Match search outputs with a list of protein accessions in a column.
  • From galaxyp:

tool_dependency_definition