Galaxy Hub

Tools

From davidvanzessen:
- imgt_concatenate: Concatenating IMGT zip files. Concatenates 1 or more IMGT zip files into a new IMGT zip file.
From yating-l:
- hubarchivecreator: A tool to create a UCSC track hub.
From kaymccoy:
- calculate_fitness: of transposon insertion locations.
- aggregate_fitness: of transposon insertion locations.
From insilico-bob:
- ngchm: Generate clustered Heatmaps with optional co-variate bars. Generate a clustered Heatmap from NGCHM data, or other data matrices, with many methods to choose from for clustering. Also, multiple category/co-variate bars may be added to either the columns or rows. The output is a zip file that can be displayed in Galaxy via the visualize icon at the bottom of the output file in the History ( near the save, information “I”, rerun, then the visualize icon. Click the icon and the heatmap displays in the Galaxy middle region. The input matrix is assume to have both the first column and the first row containing labels Any input co-variate bar files must have the same number of labels as in the input matrix’s row or column labels (whichever the co-variate bar is to map to).
From bgruening:
- salmon: Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data. Salmon achieves is accuracy and speed via a number of different innovations, including the use of quasi-mapping (accurate but fast-to-compute proxies for traditional read alignments), and massively-parallel stochastic collapsed variational inference. The result is a versatile tool that fits nicely into many differnt pipelines. For example, you can choose to make use of our quasi-mapping algorithm by providing Salmon with raw sequencing reads, or, if it is more convenient, you can provide Salmon with regular alignments (e.g. an unsorted BAM file produced with your favorite aligner), and it will use the same wicked-fast, state-of-the-art inference algorithm to estimate transcript-level abundances for your experiment.
- deeptools_compute_matrix_operations: Wrapper for the deepTools: computeMatrixOperations. deepTools address the challenge of visualizing the large amounts of data that are now routinely generated from sequencing centers in a meaningful way. To do so, deepTools contain useful routines to process the mapped reads data through removal of duplicates and different filtering options to create coverage files in standard bedGraph and bigWig file formats. deepTools allow the creation of normalized coverage files or the comparison between two files (for example, treatment and control). Finally, using such normalized and standardized files, multiple visualizations can be created to identify enrichments with functional annotations of the genome. For a gallery of images that can be produced and a description of the tools see http://f1000.com/posters/browse/summary/1094053 https://github.com/fidelram/deepTools doi: 10.1093/nar/gku365
  - Wikipage: https://github.com/fidelram/deepTools/wiki
  - Repository-Maintainer: Björn Grüning https://github.com/fidelram/deepTools.
From pdeford:
- dotplot: Creates a dot plot of the contents of a LASTZ tabular file resulting from the alignment of one or more sequences to a single reference sequence. If multiple query sequences are present, they will be sorted by size, offset from one another in the dot plot, and separated by a gray line.
From galaxyp:
- idpassemble: Bumbershoot IDPicker idpAssemble. idpAssemble merges multiple idpDB files into a single idpDB. It can also filter the result at PSM/spectrum/peptide/protein/gene levels.
- percolator: Percolator. Uses a semi-supervised machine learning to discriminate correct from incorrect peptide-spectrum matches, and calculates accurate statistics such as q-value (FDR) and posterior error probabilities. http://per-colator.com/.
From gdroc:
- scaffhunter: Scaffhunter. Scaffhunter regroups several programs which principal aims are to work and visualize genetic mapping data. In addition to 2D visualization of linkage between markers, these tools can be used to order scaffolds without passing by the tedious step of genetic map construction and reconciliation between marker order in scaffold and marker order in the genetic map.
- scaffremodler: Scaffremodler can be used to improve scaffold assemblies or to detect large structural variations between a reference sequence and a re-sequenced genome. Scaffremodler can be used to improve scaffold assemblies or to detect large structural variations between a reference sequence and a re-sequenced genome.
From marpiech:
- rnaseq_pro_workflow_tools: Additional tools for EDGE-PRO.
From charles-bernard:
- cytosine_report_to_bedgraph: Converts genome-wide cytosine methylation report to Bedgraphs. This tool takes as input a genome-wide cytosine methylation report (generated by the tool Bismark Meth. Extractor) and converts it into a bedGraph for each cytosine context (CpG, CHG and CHH). These bedGraphs display, for a given context, the ratio of methylation of each covered cytosine in the genome. It also produces a bedGraph displaying the coverage count of each cytosine in the genome (non-covered cytosine are ignored). The tool outputs offer the possibility to vizualise the methylation signal of covered cytosines thanks to softwares like IGV (Integrative Genomics Viewer). In this respect, the tool can optionally generate a tdf binary file (Tiled Data Format) from each converted bedGraph. Tdf format is indeed better handled by IGV than bedGraph.
From jjohnson:
- seq2hla: Precision HLA typing and expression from RNAseq data. seq2HLA is an in-silico method, written in python and R, which takes standard RNA-Seq sequence reads in fastq format as input, uses a bowtie index comprising all HLA alleles and outputs the most likely HLA class I and class II genotypes (in 4 digit resolution),a p-value for each call, and the expression of each class.
From mvdbeek:
- dedup_hash: This is a commandline utility to remove exact duplicate reads from paired-end fastq files. Reads are assumed to be in 2 separate files. Read sequences are then concatenated and a short hash is calculated on the concatenated sequence. If the hash has been previsouly seen the read will be dropped from the output file. This means that reads that have the same start and end coordinate, but differ in lengths will not be removed (but those will be “flattened” to at most 1 occurence). This algorithm is very simple and fast, and saves memory as compared to reading the whole fastq file into memory, such as fastuniq does.
From hathkul:
- rapidcluster: Tool for fast clustering of aptamer sequences based on Levenshtein distance. This tool is up to 9x faster than FASTAptamer_cluster when low threshold values are used. It uses the same input (FASTAptamer_count output file) and generates output in the same format as FASTAptamer_cluster.
From earlhaminst:
- blast_parser: Convert 12- or 24-column BLAST output into 3-column hcluster_sg input.
- ete: Analyse phylogenetic trees using the ETE Toolkit. Generate a species tree from a list of species using the ETE Toolkit.
- miranda: Finds potential target sites for miRNAs in genomic sequences. miRanda is an algorithm for the detection of potential microRNA target sites in genomic sequences.
- treebest_best: TreeBeST best. Generate a phylogenetic tree using CDS alignment and species tree.
- hcluster_sg_parser: Converts hcluster_sg 3-column output into lists of ids.
- gafa: Gene Align and Family Aggregator (GAFA) generates an SQLite database that can be visualised with Aequatus, an open-source homology browser developed with novel rendering approaches to visualise homologous, orthologous and paralogous gene structures.
- gff3_to_json: GFF3 to JSON converter. Converts a set of GFF3 datasets into JSON format.
- t_coffee: T-Coffee. A suite of Galaxy tools designed to run T-Coffee from input FASTA and an optional list of sequence IDs, it can also generate CIGAR alignments.
- hcluster_sg: Hierarchically clustering on a sparse graph.
From drosofff:
- lumpy: Find structural variations. This tool takes as an input a sorted bam alignment of paired-end sequencing reads. It extracts discordant paired-end alignments and split-read alignments, and generates a vcf file containing structural variation calls.
- From chrisd:
- snpfinder: A simple naive metagenomics variant caller.
From devteam:
- vcftools_consensus: Apply VCF variants to a fasta file to create consensus sequence.
From rnateam:
- graphclust_preprocessing: Preprocessing input for GraphClust. The tool takes as an input file of sequences in Fasta format and creates the final input for GraphCLust based on given parameters.
- viennarna_rnalfold: Wrapper for ViennaRNA application RNALfold. RNA secondary structure prediction through energy minimization is the most used function in the package. There are three kinds of dynamic programming algorithms for structure prediction provided: the minimum free energy algorithm of (Zuker & Stiegler 1981) which yields a single optimal structure, the partition function algorithm of (McCaskill 1990) which calculates base pair probabilities in the thermodynamic ensemble, and the suboptimal folding algorithm of (Wuchty et.al 1999) which generates all suboptimal structures within a given energy range of the optimal energy. For secondary structure comparison, the package contains several measures of distance (dissimilarities) using either string alignment or tree-editing (Shapiro & Zhang 1990). Finally, we provide an algorithm to design sequences with a predefined structure (inverse folding).
- mlocarna: LocARNA - Multiple Alignment and Folding of RNAs. The LocARNA package comprises tools for fast, high-quality pairwise and multiple alignment of RNA sequences, while inferring unknown structure; this is accomplished by simultaneous folding and alignment based on sequence and structure features of the RNAs.
- rnasnp: RNAsnp. Efficient detection of local RNA secondary structure changes induced by SNPs. RNAsnp requires an RNA sequence and optionally a list of SNPs to be analyzed. The effect of SNPs on local RNA secondary structure can be detected in three possible modes: Mode 1: The first mode is designed to compute the effect of SNPs by using global folding. This option should be used only for short input sequences, since the base pair probabilities are calculated using RNAfold. Mode 2: The second mode is designed to compute the effect of SNPs on large sequences. Here the local base pair probabilities are calculated using RNAplfold (with the parameters -W 200 and -L 120). Mode 3: The third mode is the combination of the above two. It is intended to determine the positions of putative structure-disruptive SNPs using either transcript or genome sequence.
- graphclust_postprocessing: Post-processing. Redundant clusters are merged and instances that belong to multiple clusters are assigned unambiguously. For every pair of clusters, the relative overlap (i.e. the fraction of instances that occur in both clusters) is computed and clusters are merged if the overlap exceeds 50%. instances that occur in both clusters) is computed and clusters are merged if the overlap exceeds 50%. Post-processing. Redundant clusters are merged and instances that belong to multiple clusters are assigned unambiguously. For every pair of clusters, the relative overlap (i.e. the fraction of instances that occur in both clusters) is computed and clusters are merged if the overlap exceeds 50%. instances that occur in both clusters) is computed and clusters are merged if the overlap exceeds 50%.
- viennarna_rnainverse: Wrapper for ViennaRNA application RNAinverse.
- viennarna_rnaheat: Wrapper for ViennaRNA application RNAheat.
- locarna_multiple: Wrapper for application LocARNA Multiple Aligner (mlocarna) of the LocARNA suite. The LocARNA package comprises tools for fast, high-quality pairwise and multiple alignment of RNA sequences, while inferring unknown structure; this is accomplished by simultaneous folding and alignment based on sequence and structure features of the RNAs.
- locarna_pairwise: Wrapper for application LocARNA Pairwise Aligner of the LocARNA suite. The LocARNA package comprises tools for fast, high-quality pairwise and multiple alignment of RNA sequences, while inferring unknown structure; this is accomplished by simultaneous folding and alignment based on sequence and structure features of the RNAs.
- viennarna_rnadpdist: Wrapper for ViennaRNA application RNApdist.
- viennarna_rnapkplex: Wrapper for ViennaRNA application RNAPKplex.
- viennarna_rna2dfold: Wrapper for ViennaRNA application RNA2Dfold.
- viennarna_rnaplot: Wrapper for ViennaRNA application RNAplot.
- viennarna_rnaaliduplex: Wrapper for ViennaRNA application RNAaliduplex.
- viennarna_rnafold: Wrapper for ViennaRNA application RNAfold.
- viennarna_rnasubopt: Wrapper for ViennaRNA application RNAsubopt.
- viennarna_rnaeval: Wrapper for ViennaRNA application RNAeval.
- viennarna_rnaplex: Wrapper for ViennaRNA application RNAplex.
- locarna_reliability_profile: Wrapper for application LocARNA reliability-profile of the LocARNA suite. The LocARNA package comprises tools for fast, high-quality pairwise and multiple alignment of RNA sequences, while inferring unknown structure; this is accomplished by simultaneous folding and alignment based on sequence and structure features of the RNAs.
- graphclust_prepocessing_for_mlocarna: This tool prepares files for locarna step. This tool prepares files for locarna step.
- paralyzer: A method to generate a high resolution map of interaction sites between RNA-binding proteins and their targets. We developed the PARalyzer algorithm to generate a high resolution map of interaction sites between RNA-binding proteins and their targets. The algorithm utilizes the deep sequencing reads generated by PAR-CLIP (Photoactivatable-Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation) protocol.The use of photoactivatable nucleotides in the PAR-CLIP protocol results in more efficient crosslinking between the RNA-binding protein and its target relative to other CLIP methods; in addition a nucleotide substitution occurs at the site of crosslinking, providing for single-nucleotide resolution binding information. PARalyzer utilizes this nucleotide substition in a kernel density estimate classifier to generate the high resolution set of Protein-RNA interaction sites.
- graphclust_cmfinder: Determines consensus motives for sequences. At first it converts CLUSTAL format files to STOCKHOLM format. Then using CMFinder determines consensus motives for sequences.
- viennarna_rnasnoop: Wrapper for ViennaRNA application RNAsnoop.
- viennarna_rnacofold: Wrapper for ViennaRNA application RNAcofold.
- pipmir: A method to identify novel plant miRNA. We developed the PIPmiR algorithm to identify novel plant miRNA genes from a combination of deep sequencing data and genomic features.
- viennarna_rnadistance: Wrapper for ViennaRNA application RNAdistance.
- remurna: remuRNA - Measurement of Single Nucleotide Polymorphism induced Changes of RNA Conformation. Single-nucleotide polymorphisms (SNPs) are often linked to critical phenotypes such as diseases or responses to vaccines, medications and environmental factors. However, the specific molecular mechanisms by which a causal SNP acts is usually not obvious. Changes in RNA secondary structure emerge as a possible explanation necessitating the development of methods to measure the impact of single-nucleotide variation on RNA structure. To answer this need, remuRNA commutes the relative entropy between the Boltzmann ensembles of the native and a mutant structure.
- viennarna_rnaduplex: Wrapper for ViennaRNA application RNAduplex.
- viennarna_rnaalifold: Wrapper for ViennaRNA application RNAalifold.
- viennarna_rnalalifold: Wrapper for ViennaRNA application RNALalifold.
- targetfinder: Plant small RNA target prediction tool. TargetFinder will computationally predict small RNA binding sites on target transcripts from a sequence database. This is done by aligning the input small RNA sequence against all transcripts, followed by site scoring using a position-weighted scoring matrix.
- viennarna_kinfold: Wrapper for ViennaRNA application Kinfold.
- viennarna_rnaup: Wrapper for ViennaRNA application RNAup.
- methylkit: A method for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Perl is needed to read SAM files only.
- graphclust_fasta_to_gspan: Second step of GraphClust. For each fragment of input sequence RNAshapes is used to create a set of structures. The default parameters for example consider for each input fragment again a window of size 40nt and 150nt with a window shift of 30%. This allows to consider local structures as well as global structures for a fragment. From each such RNAshape window we take the top 5 shreps (suboptimal structures for the top 5 shapes) within 20% of the mfe energy of that window and convert them into graphs. As shape level (abstraction level) we use 3 for short sequences and 5 for sequences >= 80nt. Please see also RNAshapes documentation for all these terms.
- graphclust_mlocarna: MLocARNA computes a multiple sequence-structure alignment of RNA sequences. MLocARNA computes a multiple sequence-structure alignment of RNA sequences. It uses treefile - file with guide tree in NEWICK format. The given tree is used as guide tree for the progressive alignment.This saves the calculation of pairwise all-vs-all similarities and construction of the guide tree.
- graphclust_nspdk: Produces an explicit sparse feature encoding and copmutes global feature index and returns top dense sets. Integer code for the invariant graph encoding is used as a feature indicator. In this way,the integer associated to each feature (i.e. each pair or neighborhood subgraphs of radius r whose roots are at distance d) can be interpreted as the feature key and the (normalized) count of occurrences as its value. Also copmutes global feature index and returns top dense sets.The candidate clusters are chosen as the top ranking neighborhoods provided that the size of their overlap is below a specified threshold.
- viennarna_rnapaln: Wrapper for ViennaRNA application RNApaln.
From jnavarro:
- rapsosnp_v1_5_with_dubious: SNP dectection for Brassica Napus with dubious, on Rapsodyn project. 4 inputs : - Reads 1 - Reads 2 - Reference - List known errors position on reference. Format : Chrom/tPos/n (can be empty).
From marie-tremblay-metatoul:
- spectral_normalization: [Metabolomics][W4M][ALL] Spectral Normalization - Normalization (operation applied on each individual spectrum) of spectral data. Part of the W4M project: http://workflow4metabolomics.org.
- nmr_alignment: [Metabolomics][W4M][NMR] NMR Alignment - Alignment of NMR spectra based on the Cluster-based Peak Alignment (CluPA) algorithm. Part of the W4M project: http://workflow4metabolomics.org.
From ycogne:
- bridger: De novo assembly tools. De novo assembly tools.
From iuc:
- length_and_gc_content: Gets gene length and gc content from a fasta and a GTF file. Gets gene length and gc content from a fasta and a GTF file.
- data_manager_bwameth_index_builder: bwa-meth is a fasta and accurate aligner for BS-seq data. A data manager for the bwameth aligner.
- picrust_normalize_by_copy_number: Wrapper for picrust application: Normalize. PICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.
- blastxml_to_gapped_gff3: BlastXML to gapped GFF3. Convert BlastXML results into GFF3 format.
- trinotate: Trinotate is a comprehensive annotation suite designed for automatic functional annotation of de novo transcriptomes. Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms. It will summarize, in a tabular report and SQLite database, different upstream outputs: - Trinity: iuc/trinity https://toolshed.g2.bx.psu.edu/repository?repository\_id=faf6028922d9220a - TransDecoder: iuc/transdecoder https://toolshed.g2.bx.psu.edu/repository?repository\_id=7a2a8151a50f8099 - Generate gene to transcript map for Trinity assembly: iuc/trinity_gene_to_trans_map https://toolshed.g2.bx.psu.edu/repository?repository\_id=faf6028922d9220a - NCBI BLAST+ blastp: devteam/ncbi_blast_plus https://toolshed.g2.bx.psu.edu/repository?repository\_id=1d92ebdf7e8d466c - NCBI BLAST+ blastx: devteam/ncbi_blast_plus https://toolshed.g2.bx.psu.edu/repository?repository\_id=1d92ebdf7e8d466c - HMMER hmmscan: iuc/hmmer_hmmscan https://toolshed.g2.bx.psu.edu/repository?repository\_id=a2cc4683090b1800 - TMHMM 2.0: peterjc/tmhmm_and_signalp https://toolshed.g2.bx.psu.edu/repository?repository\_id=292389a45f1a238a - SignalP 3.0: peterjc/tmhmm_and_signalp https://toolshed.g2.bx.psu.edu/repository?repository\_id=292389a45f1a238a.
- kobas: KOBAS KEGG Orthology Based Annotation System. KOBAS is a KEGG Orthology Based Annotation System. Its purpose is to identify statistically enriched pathways, diseases, and GO terms for a set of genes or proteins, using pathway, disease, and GO knowledge from multiple famous databases.
- ebi_metagenomics_run_downloader: Wrapper for EBI tool application: Download run data. The European Bioinformatics Institute (EMBL-EBI) maintains the world’s most comprehensive range of freely available and up-to-date molecular databases. This tool is a tool to query or download data from EMBL-EBI databases.
- nugen_nudup: Marks/removes PCR introduced duplicate molecules based on the molecular tagging technology used in NuGEN products. Marks/removes PCR introduced duplicate molecules based on the molecular tagging technology used in NuGEN products. For SINGLE END reads, duplicates are marked if they fulfill the following criteria: a) start at the same genomic coordinate b) have the same strand orientation c) have the same molecular tag sequence. The read with the highest mapping quality is kept as the non-duplicate read. For PAIRED END reads, duplicates are marked if they fulfill the following criteria: a) start at the same genomic coordinate b) have the same template length c) have the same molecular tag sequence. The read pair with the highest mapping quality is kept as the non-duplicate read.
- describe_samples: Describe samples (from the Trinity tool suite). Trinity represents a method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data https://github.com/trinityrnaseq/trinityrnaseq.
- abricate: Mass screening of contigs for antiobiotic resistance genes.
- mlst: Scan contig files against PubMLST typing schemes. MLST will scan fasta or genbank files against PubMLST (Public databases for molecular typing) schemes. This repository contains both MLST, and MLST List, which will list the PubMLST databases currently supported by MLST.
- rnaspades: rnaSPAdes is an assembler for RNA-Seq data based on SPAdes genome assembler. rnaSPAdes is an de novo assembler for RNA-Seq data based on the SPAdes genome assembler. http://bioinf.spbau.ru/en/spades\_3\_9.
- picrust_categorize: Wrapper for picrust application: Categorize by function. PICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.
- trinity_analyze_diff_expr: Extract and cluster differentially expressed transcripts (from the Trinity tool suite). Trinity represents a method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data https://github.com/trinityrnaseq/trinityrnaseq.
- trinity_define_clusters_by_cutting_tree: Partition genes into expression clusters (from the Trinity tool suite). Trinity represents a method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data https://github.com/trinityrnaseq/trinityrnaseq.
- data_manager_star_index_builder: RNA STAR is an ultrafast universal RNA-seq aligner. Spliced Transcripts Alignment to a Reference. This is the data manager that builds the indices. https://www.ncbi.nlm.nih.gov/pubmed/23104886.
- picrust_predict_metagenomes: Wrapper for picrust application: Predict Metagenome. PICRUSt (pronounced “pie crust”) is a bioinformatics software package designed to predict metagenome functional content from marker gene (e.g., 16S rRNA) surveys and full genomes.
- ebi_search_rest_results: Wrapper for EBI tool application: EBI Search. The European Bioinformatics Institute (EMBL-EBI) maintains the world’s most comprehensive range of freely available and up-to-date molecular databases. This tool is a tool to query or download data from EMBL-EBI databases.
- goseq: goseq does selection-unbiased testing for category enrichment amongst differentially expressed (DE) genes for RNA-seq data. Gene Ontology analyser. Does selection-unbiased testing for category enrichment amongst differentially expressed (DE) genes for RNA-seq data.
- raxml: RAxML - A Maximum Likelihood based phylogenetic inference. Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. http://sco.h-its.org/exelixis/web/software/raxml/.
- samblaster: samblaster marks duplicates and can output split and discordant alignments from SAM/BAM files. samblaster is a fast and flexible program for marking duplicates in read-id grouped paired-end SAM files. It can also optionally output discordant read pairs and/or split read mappings to separate SAM files, and/or unmapped/clipped reads to a separate FASTQ file. When marking duplicates, samblaster will require approximately 20MB of memory per 1M read pairs.
From matteoc:
- agame_custom_tools: agame tools. Tools of general utility for the processing of eDNA assembly data.

Dependency Definitions

From insilico-bob:
- ngchm_dependencies: Visualization directories needed for ngchm to display Heatmap in Galaxy. Visualization directories needed for ngchm to display Heatmap in Galaxy.
From iuc:
- package_blast_plus_2_5_0: initial version. NCBI BLAST+ 2.5.0 (binaries only). This Tool Shed package is intended to be used as a dependency of the Galaxy wrappers for NCBI BLAST+ and any other tools which call the BLAST+ binaries internally. Note that for compatibility with BioConda, internally this is now called “blast” rather than “blast+” as in the older Galaxy BLAST+ packages.