September 2018 Tool Shed contributions

Tools contributed to the Galaxy Project ToolShed in September 2018.

All monthly summaries

New Tools

From greg:
- multigps: MultiGPS is a framework for analyzing collections of multi-condition ChIP-seq datasets and characterizing differential binding events between conditions. MultiGPS is a framework for analyzing collections of multi-condition ChIP-seq datasets and characterizing differential binding events between conditions. In analyzing multiple-condition ChIP-seq datasets, MultiGPS encourages consistency in the reported binding event locations across conditions and provides accurate estimation of ChIP enrichment levels at each event.
From blankenberg:
- column_regex_substitution: Substitute columns using regex. Substitute columns using regex / regular expressions.
From galaxyp:
- thermo_raw_file_converter: Thermo RAW file converter. Thermo RAW file converter based on the great ThermoRawFileParser project.
- quantp: Correlation between protein and transcript abundance. This tool evaluates the correlation between protein and transcript abundances and presents result in HTML format.
From waqas:
- macaron: A python framework to identify and re-annotate multi-base affected codons in whole genome/exome sequence data. Predicted deleteriousness of coding variants is a frequently used criterion to filter out variants detected in next-generation sequencing projects and to select candidates impacting on the risk of human diseases. Most available dedicated tools implement a base-to-base annotation approach that could be biased in presence of several variants in the same genetic codon. We here proposed the MACARON program that, from a standard VCF file, identifies, re-annotates and predicts the amino acid change resulting from multiple single nucleotide variants (SNVs) within the same genetic codon.
From lecorguille:
- xcms_plot_chromatogram: [Metabolomics][W4M][LC-MS] XCMS R Package - Preprocessing - Plot base peak intensity chromatogram (BPI) and total ion chromatogram (TIC). Part of the W4M project: http://workflow4metabolomics.org XCMS: http://www.bioconductor.org/packages/release/bioc/html/xcms.html Filtration and Peak Identification using xcmsSet function from xcms R package to preprocess LC/MS data for relative quantification and statistical analysis.
- msnbase_readmsdata: [Metabolomics][W4M][LC-MS] MSnbase R Package - Preprocessing - Imports Mass-Spectrometry Data Files. Part of the W4M project: http://workflow4metabolomics.org MSnbase: https://bioconductor.org/packages/release/bioc/html/MSnbase.html Reads XML-based mass-spectrometry data files. Can be chained with the W4M xcms.findChromPeaks tool.
From stemcellcommons:
- qualimap_bamqc_workflow: Quality control of mapped reads.
- deeptools_bamqc_workflow: Compare aligned samples via pairwise correlation heatmap and PCA plot.
- kallisto_rnaseq_workflows: Rapid transcript quantification via pseudoalignment.
From bgruening:
- flye: Assembly of long and error-prone reads. Flye is an assembler for long and error-prone reads. It supports data produced by both Pacific Biosciences and Oxford Nanopore Technologies.
- plotly_parallel_coordinates_plot: parallel coordinates plot produced with plotly. Produce a parallel coordinates plot from a tabular file. Multiple columns are chosen for dimensions and a single column for coloring. The plot is buried in a html file which provides rich interactive features. Image can be saved in various format, such as ‘png’, ‘svg’, ‘jpeg’ and so on.
- graphmap_align: graphmap suite: Mapper. GraphMap is a novel mapper targeted at aligning long, error-prone third-generation sequencing data. It is designed to handle Oxford Nanopore MinION 1d and 2d reads with very high sensitivity and accuracy, and also presents a significant improvement over the state-of-the-art for PacBio read mappers.
- graphmap_overlap: graphmap suite: Owler. GraphMap is a novel mapper targeted at aligning long, error-prone third-generation sequencing data. It is designed to handle Oxford Nanopore MinION 1d and 2d reads with very high sensitivity and accuracy, and also presents a significant improvement over the state-of-the-art for PacBio read mappers.
From artbio:
- bigwig_to_wig: Converts a bigWig file to Wiggle (WIG) format. Converts a bigWig file to Wiggle (WIG) format.
From martasampaio:
- phage_promoters: Get promoters of phage genomes.
From matnguyen:
- ngsweep: Primary version. NGSweep, a preprocessing pipeline for bacterial Next-Generation Sequencing Data. This repository contains the in-house developed tools for NGSweep. These tools detect outliers from entire NGS datasets, and cleans the remaining samples. NGSweep removes contaminated reads from sequences in order to provide clean, high quality data for downstream analysis.
From iuc:
- samtools_view: convert and filter alignments. Convert between SAM, BAM, and CRAM format and optionally filtering by various criteria (e.g. flags, quality) of the alignments.
- filtlong: Filtlong - Filtering long reads by quality. Filtlong is a tool for filtering long reads by quality. It can take a set of long reads and produce a smaller, better subset. It uses both read length (longer is better) and read identity (higher is better) when choosing which reads pass the filter.
- bandage: Bandage - A Bioinformatics Application for Navigating De novo Assembly Graphs Easily. Bandage is a program for visualising de novo assembly graphs. By displaying connections which are not present in the contigs file, Bandage opens up new possibilities for analysing de novo assemblies.
- rcorrector: Rcorrector (RNA-seq error CORRECTOR) is a kmer-based error correction method for RNA-seq data. Rcorrector can also be applied to other type of sequencing data where the read coverage is non-uniform, such as single-cell sequencing.
- zerone: ChIP-seq discretization and quality control. Zerone discretizes several ChIP-seq replicates simultaneously and resolves conflicts between them. After the job is done, Zerone checks the results and tells you whether it passes the quality control.
- idba_ud: Wrapper for the idba_ud assembler. IDBA-UD is a iterative De Bruijn Graph De Novo Assembler for Short Reads Sequencing data with Highly Uneven Sequencing Depth. It is an extension of IDBA algorithm. IDBA-UD also iterates from small k to a large k. In each iteration, short and low-depth contigs are removed iteratively with cutoff threshold from low to high to reduce the errors in low-depth and high-depth regions. Paired-end reads are aligned to contigs and assembled locally to generate some missing k-mers in low-depth regions. With these technologies, IDBA-UD can iterate k value of de Bruijn graph to a very large value with less gaps and less branches to form long contigs in both low-depth and high-depth regions. This tool is a wrapper for the idba_ud assembler.
- plasflow: PlasFlow - Prediction of plasmid sequences in metagenomic contigs. PlasFlow is a set of scripts used for prediction of plasmid sequences in metagenomic contigs. It relies on the neural network models trained on full genome and plasmid sequences and is able to differentiate between plasmids and chromosomes with accuracy reaching 96%. It outperforms other available solutions for plasmids recovery from metagenomes and incorporates the thresholding which allows for exclusion of incertain predictions.
- porechop: Porechop - Finding and removing adapters from Oxford Nanopore reads. Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity. Porechop also supports demultiplexing of Nanopore reads that were barcoded with the Native Barcoding Kit, PCR Barcoding Kit or Rapid Barcoding Kit.
- ruvseq: Remove Unwanted Variation from RNA-Seq Data. Remove unwanted variation (RUV) methods of Risso et al. (2014) for the normalization of RNA-Seq read counts between samples.
- intervene: Create pairwise and upset plots. Intervene provides three types of plots to visualize intersections of genomic regions and list sets. These are pairwise heatmap of N genomic region sets, classic Venn diagrams of genomic regions and list sets of up to 6-way and UpSet plots.
From genouest:
- get_pairs: Separate paired and unpaired reads from two fastq files.
From marie-tremblay-metatoul:
- asca: [Metabolomics][W4M][LC-MS][GC-MS][NMR] A-SCA - Splitting of the total variance into independent blocks according to the experimental factors and multivariate analysis (SCA) of each block. Part of the W4M project: http://workflow4metabolomics.org.
From jowong:
- add_sample_as_first_line: add name of the sample as the first line.
From proteore:
- proteore_kegg_pathways_coverage: give KEGG pathway(s) concerned by a given list of proteins or genes.
- proteore_ms_observation_pepatlas: Retrieve number of MS/MS observations in a tissue from Peptide Atlas.
- proteore_filter_keywords_values: ProteoRE - Filter a file by keywords or values. Filter a file by keywords or values.