May, June, and July 2018 Tool Shed contributions

Galaxy ToolShed

Tools contributed to the Galaxy Project ToolShed in May, June, and July 2018.

New Tools

  • From in_silico:
    • cravat_vcf_convert: A tool to convert a VCF format file into a Cravat format file. This tools takes a VCF formatted line as its input. It processes each row of VCF data, and converts it to a row of cravat values. The tool will automatically adjust the position, reference, and alternate attributes of the VCF encoded information while processing, including breaking a single VCF row into as many needed cravat rows. The output is then written to the output file. Process is done sequentially, and should adequately support large VCF file inputs.
  • From rnateam:
    • atactk_trim_adapters: Trim adapters from paired-end HTS reads. The trim_adapters utility aligns paired reads to each other and trims off sequence outside the alignment.
    • sshmm: ssHMM is an RNA sequence-structure motif finder for RNA-binding protein data, such as CLIP-Seq data. ssHMM is an RNA motif finder that recovers sequence-structure motifs from RNA-binding protein data, such as CLIP-Seq data. The tool input consists of a BED file with genomic binding regions and the corresponding genome reference FASTA file. For structure prediction, the user can select between RNAshapes and RNAstructures. Advanced parameters can be set for both the preprocessing and the training stage. The output consists of a graph showing the found sequence motifs for the 5 structural contexts multiloop, hairpin, stem, internal loop, and exterior loop (output in .png format). The height of the nucleotides corresponds to their emission probabilities, while the thickness of the arrows corresponds to their transition probabilities. Additional files (intermediate, logo, model, raw) can be selected for output in the "Output options" section.
    • chipseeker: A tool for ChIP peak annotation and visualization. This wrapper implements the ChIPseeker functions to retrieve the nearest genes around peaks and annotate with genomic region information (Promoter, 5’ UTR, 3’ UTR, Exon, Intron, Downstream, Intergenic, distance to nearest TSS). It also generates some visualisations of the results. It requires a peaks file in BED or interval format and a GTF file to use as the annotation source.
    • graphprot_predict_profile: GraphProt predict profile from GraphProt framework. GraphProt is a computational framework for learning sequence- and structure-binding preferences of RNA-binding proteins (RBPs) from high-throughput experimental data such as CLIP-seq data. After model training, the learned sequence or structure models can be applied to predict RBP binding profiles on FASTA sequences.
  • From artbio:
    • cpm_tpm_rpk: Generate CPM,TPM or RPK from raw counts. Normalizes raw counts expression matrix on different parameters. - CPM : library-size normalization - TPM : gene length and library-size normalization - RPK : gene length normalization.
    • justgzip: Compress fastq sequence files. Returns the compressed version of a fastq file using the unix gzip command.
  • From pravs:
    • protein_rna_correlation: Correlation between protein and rna expression. Correlation between protein and rna expression (Single Sample).
  • From bgruening:
    • racon: Consensus module for raw de novo DNA assembly of long uncorrected reads. Racon is intended as a standalone consensus module to correct raw contigs generated by rapid assembly methods which do not include a consensus step. The goal of Racon is to generate genomic consensus which is of similar or better quality compared to the output generated by assembly methods which employ both error correction and consensus steps, while providing a speedup of several times compared to those methods. It supports data produced by both Pacific Biosciences and Oxford Nanopore Technologies.
    • split_file_to_collection: Split tabular, MGF, FASTA, or FASTQ files to a dataset collection.
    • nanopolish_eventalign: Nanopolish tool: Nanopolish eventalign. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more.
    • nanopolish_variants: Nanopolish tool: Nanopolish variants. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more.
    • nanopolish_methylation: Nanopolish tool: Nanopolish methylation. Nanopolish can calculate an improved consensus sequence for a draft genome assembly, detect base modifications, call SNPs and indels with respect to a reference genome and more.
    • canu: Canu is a hierarchical assembly pipeline designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION). Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II/Sequel or Oxford Nanopore MinION). Canu is a hierarchical assembly pipeline which runs in four steps: - Detect overlaps in high-noise sequences using MHAP - Generate corrected sequence consensus - Trim corrected sequences - Assemble trimmed corrected sequences.
    • wtdbg: WTDBG tool: WTDBG. WTDBG is a fuzzy Bruijn graph (FBG) approach to long noisy reads assembly. A challenge in assembling long noisy reads from third generation sequencing (TGS) is reducing its requirement of computing resource, especially for large genomes. To address this issue, I developed a novel sequence alignment algorithm and a new assembly graph for efficiently assembling large genomes using TGS data.
  • From yating-l:
    • ucsc_trix_index_generator: Create UCSC Trix index from a Tabular file or a Fasta file.
    • rename_tracks: Wrapper for the program which renames the scaffolds in custom track files. This tool is to rename scaffolds in custom track files, so that the tracks use the same scaffold names as the reference genome renamed by rename the scaffolds tool.
  • From eslerm:
    • vkmz: metabolomics formula prediction and van Krevelen diagram generation.
  • From mheinzl:
    • fsd_bvsa: Tool that plots a histogram of sizes of read families from various stages of the dunovo pipeline. Tool that plots a histogram of sizes of read families from various stages of the dunovo pipeline.
    • fsd_regions: Tool that plots a histogram of sizes of read families that were aligned to regions of the reference genome. Tool that plots a histogram of sizes of read families that were aligned to regions of the reference genome.
    • hd: Tool that calculates the Hamming distances of the tags and plots them as a histogram. Tool that calculates the Hamming distances of the tags and plots them as a histogram.
  • From si-datascience:
    • interps_test: interps test. test.
    • genome_tools: SI-datascience test repository. Useful tools for SI-datascience (under test).
  • From gga:
  • From crs4:
  • From peterjc:
    • seq_length: Initial release v0.0.1. Compute sequence length (from FASTA, QUAL, FASTQ, SFF, etc). Using Biopython's SeqIO library, this generates a table of sequence lengths (one line per sequence), from which you can then compute a histogram, or filter by length, etc.
  • From nml:
    • mob_suite: MOB-suite is a set of software tools for clustering, reconstruction and typing of plasmids from draft assemblies. The MOB-suite is designed to be a modular set of tools for the typing and reconstruction of plasmid sequences from WGS assemblies designed by Robertson James et al.
    • staramr: Scan genome contigs against the ResFinder and PointFinder antimicrobial resistance databases. staramr - Scans genome contigs (in FASTA format) against the ResFinder and PointFinder databases to search for antimicrobial resistance genomes. Makes predictions of the drugs these genes give resistance to.
  • From johnheap:

    • vapper: VAPPER. The Trypanosoma congolense variant antigen repertoire is divided into 15 clades or phylotypes. These phylotypes are present in any T. congolense isolate, but their relative abundance varies between strains. The purpose of the VAPPER is to accurately quantify antigen diversity in any T. congolense isolate by calculating the relative frequency of each phylotype. The Galaxy VAPPER Tool has three modes.

      1. T.congolense Genomic: This takes raw NGS reads (or pre-assmebled contigs) as input, assembles them de-novo, searches for evidence of each phylotype based on hidden Markov models (HMM), and calculates their relative abundances. The results are visualized in three different ways: a table with each phylotype and their relative frequencies as proportions of the full repertoire in the given genome; a heat map with dendogram showing either absolute VAP variation or deviation from the mean, using our pilot dataset; and a Principal Component Analysis (PCA) plot showing variation distribution in the given sample compared to our pilot dataset.
      2. T.congolense Transcriptomic: This requires NGS paired reads and uses bowtie2 and samtools for read mapping and processing, cufflinks for transcript abundance estimation, and hmmer for sequence identification. The output is a stacked bar chart and a table of frequencies based on the transcript abundances. .) T.vivax clusters of orthologs The approach for T. vivax relies on the presence/absence of clusters of orthologs (COGs). It requires velvet for the genome assembly and blast. it recieves paired sequencing reads in fastq format (or a contig file if already assembled) and the output is a binary matrix of the presence/absence of each COG/gene for a given sample. Within the tool there is a database of 28 isolates that are used as a comparison producing a heatmap and dendogram.
  • From mingchen0919:

  • From ubi.igc:

    • goenrichment: GOEnrichment is a Java application that can be used to analyze gene product sets (e.g., from microarray or RNAseq experiments) for enriched GO terms. GOEnrichment is a tool for performing GO Enrichment Analysis of a set of gene products. It requires as input:

      • A Gene Ontology file in either OBO or OWL, and either the full GO or a GOSlim - An Annotation file, which can be in GAF format (from the Gene Ontology website), BLAST2GO format, or in tabular format (with gene ids in the first column and GO term ids in the second one)
      • A Study Set file listing the gene ids in the study (one gene product per line) [NOTE: the gene ids in the Study Set file must match the gene ids in the Annotation file]
      • Optionally, a Population Set listing the gene ids in the population (one gene product per line) [NOTE: if no Population Set file is provided, the population is assumed to consist of all genes listed in the Annotation file]
      • A multiple test correction strategy ("Bonferroni", "Bonferroni-Holm", "Sidak", "SDA", or "Benjamini-Hochberg") It produces as output, for each GO category (Molecular Function, Biological Process, and Cellular Component):
      • A tabular Result file listing all non-redundant GO terms present in the study set, their frequencies and p-values
      • A graph file in either PNG, SVG or TXT (list of relations)

      Command Line Usage

      To run the GOEnrichment.jar file from the command line, you need to have Java installed in your computer. You can run it by typing:

      "java -jar GOEnrichment.jar [OPTIONS]"

      The options are:

      • "-g,--go FILE_PATH" => Path to the Gene Ontology OBO or OWL file [Mandatory]
      • "-a,--annotation FILE_PATH" => Path to the tabular annotation file in GAF, BLAST2GO or 2-column table format [Mandatory]
      • "-s,--study FILE_PATH" => Path to the file listing the study set gene products [Mandatory]
      • "-p,--population FILE_PATH" => Path to the file listing the population set gene products [Optional] (Default: all the genes in the annotation file)
      • "-c,--correction OPTION" => Multiple test correction strategy; Options: "Bonferroni", "Bonferroni-Holm", "Sidak", "SDA", "Benjamini-Hochberg" [Optional] (Default: "Benjamini-Hochberg")
      • "-gf,--graph_format OPTION" => Output graph format; Options: "PNG", "SVG", "TXT" [Optional] (Default: "PNG")
      • "-so,--summarize_output" => Summarizes the list of enriched GO terms by removing closely related terms [Optional] (Default: FALSE)
      • "-e,--exclude_singletons" => Exclude GO terms that are annotated to a single gene product in the study set [Optional] (Default: FALSE)
      • "-o,--cut_off" => q-value or corrected p-value cut-off to apply [Optional] (Default: 0.01)
      • "-r,--use_all_relations" => Infer annotations through 'part_of' and other non-hierarchical relations [Optional] (Default: FALSE)
      • "-mfr,--mf_result FILE_PATH" => Path to the output MF result file [Optional] (Default: "MF_Result.txt")
      • "-bpr,--bp_result FILE_PATH" => Path to the output BP result file [Optional] (Default: "BP_Result.txt")
      • "-ccr,--cc_result FILE_PATH" => Path to the output CC result file [Optional] (Default: "CC_Result.txt")
      • "-mfg,--mf_graph FILE_PATH" => Path to the output MF graph file [Optional] (Default: "MF_Graph")
      • "-bpg,--bp_graph FILE_PATH" => Path to the output BP graph file [Optional] (Default: "BP_Graph")
      • "-ccg,--cc_graph FILE_PATH" => Path to the output CC graph file [Optional] (Default: "CC_Graph")
      • "-h,--help" => Display command line usage instructions.
  • From gpovysil:

    • range2tag: Tool that extracts tags of reads that are within user-specified regions. Tool that takes a SAM file, start and stop positions as input and prints all tags of reads that overlap with regions to user specified output file.
  • From trinity_ctat:

    • ctat_fusion_inspector: FusionInspector performs a supervised analysis of fusion predictions, attempting to recover and re-score evidence for such predictions. FusionInspector is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). FusionInspector assists in fusion transcript discovery by performing a supervised analysis of fusion predictions, attempting to recover and re-score evidence for such predictions. Given a list of candidate fusion genes (as derived from running any fusion transcript prediction tool, such as Prada, FusionCatcher, SoapFuse, TophatFusion, DISCASM/GMAP-Fusion, STAR-Fusion, or other), FusionInspector extracts the genomic regions for the fusion partners and constructs mini-fusion-contigs containing the pairs of genes in their proposed fused orientation. - https://github.com/FusionInspector/FusionInspector/wiki.
    • ctat_concatenate: Concatenates multiple files to one file. A concatenate tool is included in this suite for completeness purposes. Concatenates multiple files to one file.
    • ctat_lncrna: lncRNA discovery from RNA-Seq data. This tool uses slncky for lncRNA discovery from RNA-Seq data. slncky filters a high-quality set of noncoding transcripts, discovers lncRNA orthologs, and characterizes conserved lncRNA evolution.
    • ctat_metagenomics: Classifier for metagenomic sequences (RNA-Seq). Classifier for metagenomic sequences (RNA-Seq). For foreign transcript detection, we leverage Centrifuge and Kraken, leveraging RNA-Seq reads and Trinity-reconstructed transcripts. Our efforts here are being carried out in collaboration with the group of Steven Salzberg at JHU.
    • ctat_analyze_differential_expression: Analyze differential expression creates anaylses files from the output from EdgeR differential expression. Analyze differential expression creates anaylses files from the output from EdgeR differential expression.
    • ctat_discasm: DISCASM extracts reads that map to reference genomes in a discordant fashion and performs a de novo transcriptome assembly of these reads. DISCASM aims to extract reads that map to reference genomes in a discordant fashion and optionally include reads that do not map to the genome at all, and perform a de novo transcriptome assembly of these reads. DISCASM relies on the output from STAR (as run via STAR-Fusion), and supports de novo transcriptome assembly using Trinity or Oases. - https://github.com/DISCASM/DISCASM/wiki.
    • ctat_clean_headers: Removes whitespace from the header of each selected fastq file. Removes whitespace from the header of each selected fastq file. If your Trinity run gives you errors with dying threads during the normalization step, try this tool on each input first.
    • ctat_rsem_align_and_estimate_abundance: Align and Estimate Abundance generates transcript quantification for genes and isoforms using RSEM. Align and Estimate Abundance generates transcript quantification for genes and isoforms using RSEM. RSEM enables accurate transcript quantification for species without sequenced genomes.
    • ctat_genome_resource_libs_data_manager: Download, build, or set the location of a CTAT Genome Resource Library. The CTAT Genome Resource Library Data Manager facilitates the download, creation, and/or use of, within a Galaxy instance, CTAT Genome Resource Libraries. Such libraries are used by the ctat_fusion_suite of tools. There are three basic ways to use this tool.

      1. Download and Build the CTAT Genome Resource Library from the CTAT archive.
      2. Build the library from source data files that are already downloaded.
      3. Specify the location of an already built library. Any of these methods can incorporate or be followed by a gmap build on the library.

      More information about these libraries and how to build them is at: https://github.com/FusionFilter/FusionFilter/wiki The libraries downloaded by this tool are at: https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB.

    • ctat_lncrna_annotations_data_manager: Retrieve and/or specify the location of CTAT lncrna annotations files. The CTAT lncrna annotations Data Manager facilitates the download and/or specification of the location of CTAT lncrna annotations files. The following file is downloaded and unpacked by the tool: https://data.broadinstitute.org/Trinity/CTAT/lncrna/annotations.tar.gz Currently annotation files for mm9, mm10, hg19, and hg38 are within this archive.
    • ctat_gmap_fusion: GMAP-fusion is a utility for identifying candidate fusion transcripts. GMAP-fusion is a utility for identifying candidate fusion transcripts based on transcript sequences reconstructed via RNA-Seq de novo transcriptome assembly. - https://github.com/GMAP-fusion/GMAP-fusion/wiki.
    • ctat_centrifuge_indexes_data_manager: Facilitates the download and/or specification of the location of a centrifuge index. The CTAT Centrifuge Indexes Data Manager facilitates the download and/or specification of the location of a centrifuge index. A centrifuge index is one of the input parameters of the ctat_metagenomics tool. At the moment only one index is supported by the ctat_metagenomics tool: ftp://ftp.ccb.jhu.edu/pub/infphilo/centrifuge/data/p_compressed+h+v.tar.gz.
    • ctat_edger_differential_expression: EdgeR differential expression identifies differentially expressed transcripts. EdgeR differential expression uses the counts_matrix from abundance_estimation_to_matrix to identify differentially expressed transcripts.
    • ctat_abundance_estimation_to_matrix: Abundance estimation to matrix joins RSEM-computed gene or isoform fragment counts into a matrix file. Abundance estimation to matrix joins RSEM-computed gene or isoform fragment counts into a matrix file.
    • ctat_star_fusion: STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). STAR-Fusion uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads. STAR-Fusion further processes the output generated by the STAR aligner to map junction reads and spanning reads to a reference annotation set. - https://github.com/STAR-Fusion/STAR-Fusion/wiki.
    • ctat_trinity_rnaseq: Trinity assembles transcript sequences from Illumina RNA-Seq data. Trinity, developed at the Broad Institute and the [Hebrew University of Jerusalem] (http://www.cs.huji.ac.il), represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.
  • From lissacoffey:
    • hayerone: hayerone - REAL ESTATE DEVELOPMENT COMPANY KENYA - hayerone.com. REAL ESTATE DEVELOPMENT COMPANY KENYA.
  • From mbernt:
  • From erasmus-medical-center:
    • miniasm: Miniasm - Ultrafast de novo assembly for long noisy reads (though having no consensus step). Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.
    • hla_dq: HLA-DQ typing. Determine possible associated types given BLAST IMGT/HLA annotation.
    • gfa_to_fa: gfa_to_fa - Converting GFA format to Fasta format. Convert GFA files to Fasta.
    • split_pairs: split a dataset pair into two regular files.
  • From galaxyp:
  • From labis-app:
  • From ntino:
    • mgsfast: MGS-Fast: MetaGenomic Shotgun annotation using microbial gene catalogs. Current methods for metagenomic sequencing data analysis to identify function in the large number of reads in a high-throughput sequence data file rely on the computationally intensive and low stringency approach of mapping each read to a generic database of proteins or reference microbial genomes. We have developed MGS-Fast, an alternative analysis approach for shotgun metagenomic sequence data utilizing Bowtie2 DNA-DNA alignment of the reads to a database of well annotated genes compiled from human microbiome data. This method is rapid and provides high stringency matches (>90% DNA sequence identity) of shotgun metagenomics reads to genes with annotated functions. We demonstrate the use of this method with synthetic data, Human Microbiome Project shotgun metagenomic data sets, and data from a study of liver disease, and detect differentially abundant KEGG gene functions in these experiments.
  • From earlhaminst:
  • From martasampaio:
  • From biocomp-ibens:
    • alfa: Plot the distribution of the genomic features captured by aligned reads. ALFA provides a global overview of features distribution composing New Generation Sequencing dataset(s). Given a set of aligned reads (BAM files) and an annotation file (GTF format), the tool produces plots of the raw and normalized distributions of those reads among genomic categories (stop codon, 5'-UTR, CDS, intergenic, etc.) and biotypes (protein coding genes, miRNA, tRNA, etc.). Whatever the sequencing technique, whatever the organism.
    • data_manager_build_alfa_indexes: Build ALFA indexes from automatically downloaded gtf annotation file.
      1. The tool asks the admin to enter a 'species_name' and automatically download the last release of the corresponding gtf annotation file on Ensembl.
      2. The tool calls ALFA to generate the alfa indexes from this gtf file.
      3. Resulting indexes are stored in the child directory 'alfa_indexes' of the dir <galaxy_data_manager_data_path> defined in config/galaxy.ini
      4. Finally, the tool adds the new entry to the table 'alfa_indexes.loc'. This .loc file is where the data table 'alfa_indexes' points, as defined in config/shed_tool_data_table.conf.xml 5. At the end of the process, when a user will use alfa, the built-in indexes corresponding to the 'species_name' will be available.
  • From kyu:
    • vampire: Vampire Morphology Analysis. Vampire Morphology Analysis integrated with CellProfiler.
  • From genouest:
    • meneco: Meneco computes minimal completions to your draft network with reactions from a repair network. Large-scale metabolic networks as well as measured data sets suffer from substantial incompleteness. meneco is a tool for metabolic network completion. It can be used to check whether a network provides the synthesis routes to comply with the required functionality described by the producibility of metabolites. In particular, it tests whether it is possible to synthesize so called target metabolites from a set of seed metabolites. For networks that fail this test meneco can attempt to complete the network by importing reactions from a metabolic reference database such that the resulting network provides the required functionality. meneco can identify unproducible target metabolites and computes minimal extensions to the network that satisfy the producibility constraints. Additionally, it can compute the union and intersection of all minimal networks extensions without enumerating all minimal network extensions. meneco builds upon a formal method for analyzing large-scale metabolic networks. This qualitative approach describes the bio-synthetic capacities of metabolic networks. Implementing this approach, meneco maps its principles into Answer Set Programming to express the producibility constraints for a set of metabolites.
    • logol: Logol is a pattern matching grammar language and a set of tools to search a pattern in a sequence.
  • From jjohnson:
    • ensembl_variant_report: Report variant peptides for snpEff variants. Report variant peptides for snpEff variants for epitope analysis. Replacement for snpeff_cds_report.
  • From jfb:
    • commonality_finder: Finds commonality. This tool finds the commonly shared substrates between three KALIP runs.
    • st_kinamine: Serine/Threonine KinaMine. This tool should only be used to extract significant pS/pT motifs from a Distinct Peptide Report that was created by Protein Pilot.
  • From iuc:
    • bioext_bealign: Galaxy wrapper for BioExt operation Align sequences. A suite of Galaxy tools designed around the BioExt extension to BioPython. Align sequences, merge duplicate sequences into one, and more!.
    • spotyping: SpoTyping allows fast and accurate in silico Mycobacterium spoligotyping from sequence reads. SpoTyping is a software for predicting spoligotype from sequencing reads, complete genomic sequences and assembled contigs.
    • presto_parseheaders: pRESTO ParseHeaders (from the presto tool suite). The REpertoire Sequencing TOolkit (pRESTO) is composed of a suite of utilities to handle all stages of sequence processing prior to germline segment assignment. pRESTO is designed to handle either single reads or paired-end reads. It includes features for quality control, primer masking, annotation of reads with sequence embedded barcodes, generation of unique molecular identifier (UMI) consensus sequences, assembly of paired-end reads and identification of duplicate sequences. Numerous options for sequence sorting, sampling and conversion operations are also included.
    • presto_pairseq: pRESTO PairSeq (from the presto tool suite).
    • presto_filterseq: pRESTO FilterSeq (from the presto tool suite).
    • presto_collapseseq: pRESTO CollapseSeq (from the presto tool suite).
    • prestor_abseq3: pRESTOr AbSeq3 Report (from the presto tool suite).
    • presto_partition: pRESTO Partition (from the presto tool suite).
    • presto_maskprimers: pRESTO MaskPrimers (from the presto tool suite).
    • presto_buildconsensus: pRESTO BuildConsensus (from the presto tool suite).
    • presto_alignsets: pRESTO AlignSets (from the presto tool suite).
    • presto_parselog: pRESTO ParseLog (from the presto tool suite).
    • presto_assemblepairs: pRESTO AssemblePairs (from the presto tool suite).
    • pureclip: PureCLIP is an HMM based peak caller specifically designed for eCLIP/iCLIP data. PureCLIP is a tool to detect protein-RNA interaction footprints from single-nucleotide CLIP-seq data, such as iCLIP and eCLIP. It accepts mapped eCLIP/iCLIP reads in BAM format as input and also supports control library and crosslink-associated (CL) motifs input for bias correction. PureCLIP outputs two BED files, containing the found crosslink sites (first file) and binding regions (second file) that merge nearby crosslink sites to contiguous regions (region width controlled by -dm parameter). By default, the tool parameters are set to values optimized for proteins binding to short defined binding regions, e.g. proteins binding to short specific motifs such as PUM2 and RBFOX2. This behaviour can be changed with the -bc option. The default setting -bc 0 is equivalent to manually setting -bdwn 50 -ntp 10 -ntp2 0 -b1p 0.01 -b2p 0.15. The second setting -bc 1 is designed for RBPs that produce larger clusters (proteins causing larger crosslink clusters with relatively lower read start counts, e.g. proteins binding to low complexity motifs). -bc 1 corresponds to the manual setting -bdwn 100 -antp -b2p 0.01 -b2p 0.1.
    • intermine_galaxy_exchange: InterMine Exporter. Export feature IDs to InterMine.
    • trinity_super_transcripts: Generate SuperTranscripts (from the Trinity tool suite). Trinity represents a method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data https://github.com/trinityrnaseq/trinityrnaseq.
    • gtftobed12: Convert GTF files to BED12 format. Convert GTF files to BED12 format using the UCSC Kent Bioinformatics utiltities.
    • qfilt: Filter sequencing data. This simple program is meant to filter sequencing data, optionally removing or splitting reads with poor quality scores and to optionally only retain fragments from reads that are tagged with a given 5' sequence.
    • bioext_bam2msa: Galaxy wrapper for BioExt operation Convert BAM. A suite of Galaxy tools designed around the BioExt extension to BioPython. Align sequences, merge duplicate sequences into one, and more!.
    • umi_tools_count: Wrapper for the UMI-tools suite tool: UMI-tools count. Extract UMI barcode from a read and add it to the read name, leaving any sample barcode in place. Can deal with paired end reads and UMIs split across the paired ends.
    • pygenometracks: pyGenomeTracks: Standalone program and library to plot beautiful genome browser tracks. pyGenomeTracks aims to produce high-quality genome browser tracks that are highly customizable. Currently, it is possible to plot: bigwig, bed (many options), bedgraph, links (represented as arcs) and Hi-C matrices. pyGenomeTracks can make plots with or without Hi-C data.
    • lorikeet_spoligotype: Tools for M. tuberculosis DNA fingerprinting (spoligotyping). Lorikeet implements digital spoligotyping (spacer oligonucleotide typing) of M. tuberculosis strains from Illumina sequencing data.
    • varscan_somatic: Wrapper for VarScan somatic. VarScan is a variant caller for high-throughput sequencing data.
    • varscan_copynumber: Wrapper for VarScan copynumber. VarScan is a variant caller for high-throughput sequencing data.
    • varscan_mpileup: Wrapper for VarScan mpileup. VarScan is a variant caller for high-throughput sequencing data.
    • hivclustering: Infers transmission networks from pairwise distances inferred by tn93.
    • collection_element_identifiers: Extract element identifiers of a collection.
    • ggplot2_heatmap: Heatmap w ggplot tool from the ggplot package. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • ggplot2_pca: PCA plot w ggplot2 tool from the ggplot package. ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
  • From daumsoft:
  • From anmoljh:
  • From thondeboer:
    • neat_genreads: NEAT_genReads. Tool that simulates FASTQ reads from error models or VCF file. Provides truth sets for evaluation.
  • From florianbegusch:

  • From ieguinoa:

    • data_manager_fetch_gff: Data manager to handle GFF files associated with genome builds. Links GFF files obtained from URLs/history/etc to specific data tables. Can handle different type of GFF files: including only a subset of the genes/transcripts or a different set of attributes as required by different tools.
  • From geco-team:
    • gmql_download: It allows importing in the current Galaxy history the selected dataset from the GMQL Repository.
    • gmql_queries_composer: Create, Compile and Run GMQL queries step by step.
    • gmql_auth: Manage the registration, login and logout of users to the GMQL system.
    • gmql_queries_monitor: List the user's jobs and their status.
    • gmql_queries_editor: Compile and run GMQL queries (Advanced Mode).
    • gmql_upload: It allows uploading a new dataset on the user’s private space of the GMQL Repository.
    • gmql_datatypes: Custom datatypes for the GMQL for Galaxy tool suite.
    • gmql_repository: View, browse, rename or delete datasets in the user's space on the GMQL system.