Tools contributed to the Galaxy Project Tool Shed in [January and February 2017](/galaxy-updates/2017-03/).

Featured Updates

Tools

Dependency Definitions

Suites

Tools

  • From tomnl:

    • mzml2isa: Tool to generate ISA-Tab files using metadata derived from mzML files. mzML2ISA automatically generate ISA-Tab document structure metadata files from raw XML metabolomics data files (mzML open access data format). The mzml2ISA tool provides the backbone of ISA-Tab metabolomics study which can then be edited with an ISA editing tool, ISAcreator (see MetaboLights pre-packaged ISA Creator).
    • nmrml2isa: This is a Galaxy wrapper for the nmrml2isa python package tool. Full documentation: http://2isa.readthedocs.io/en/latest/ Python PyPi package: https://pypi.python.org/pypi/mzml2isa/ Github code: https://github.com/ISA-tools/mzml2isa nmrml2isa is a program that allows you to convert metabolomic studies in .mzML format to the open ISA-Tab standard supported by the MetaboLights database. The recommended installation is by means of the toolshed (https://toolshed.g2.bx.psu.edu/). Dependencies should be installed automatically when using Galaxy version >= 16.10. The dependencies are dealt with Bioconda. To ensure that Bioconda is working check to make sure the following settings are in the config/galaxy.ini file. # dependencies before each job runs. conda_auto_install = True # Set to True to instruct Galaxy to install Conda from the web automatically # if it cannot find a local copy and conda_exec is not configured. conda_auto_init = True.
  • From pravs:

    • msms_extractor: Filters scans from mzML using PSM report. This program removes the scan numbers that has been assigned a peptide spectrum match in PSM report file. Input: mzML file and PSM report file Output: mzML file with scan numbers that were not assigned any PSM.
  • From bgruening:

    • replace_column_by_key_value_file: A tool to replace all column entries of a file given by values of a key-value file. All entries of a column can be replaced by a given value. This value is determined with the key-value principle. This is useful if someone wants to replace e.g. the chromosomes notation in ensembl notation by the ones from UCSC.
  • From iuc:

    • fermi2: Wrapper for the fermikit tool: fermi2. FermiKit is a de novo assembly based variant calling pipeline for deep Illumina resequencing data. It assembles reads into unitigs, maps them to the reference genome and then calls variants from the alignment to an accuracy comparable to conventional mapping based pipelines.
    • fermikit_variants: Wrapper for the fermikit tool: fermikit-variants.
    • data_manager_plant_tribes_scaffolds_downloader: Downloads PlantTribes Scaffolds data hosted on the Huck website.
    • busco: BUSCO assess genome and annotation completeness. Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs.
    • gvcftools_extract_variants: Wrapper for ngsutils tool: Extract Variants from gVCF files.
  • From mingchen0919:

  • From galaxyp:

    • regex_find_replace: Use python regular expressions to find and replace text either in text lines or in columns of a tabular file.
  • From immport-devteam:

    • run_flock: runs FLOCK using a FCS file that was converted to a text file. FLOCK needs to be compiled after a fresh install, see README
    • FLOCK (FLOw Clustering without K) is a computational approach to flow cytometry analysis which computationally determines the number of unique populations in high dimensional flow data using a rapid binning approach.
    • extract_fcs_keywords: extracts the Keywords from a FCS file. Input files This tool uses FCS files as input. Output file The list of FCS file headers is output.
    • flowclr_summary: generates summary statistics on FLOCK output. Input Any flowclr file, output from FLOCK or Cross Sample, containing fluorescence intensity value par marker and assigned population. Output This tool produces two reports. One indicates the population distribution in the input file, the other gives descriptive summary statistics per population and marker.
    • profile_cl: uses flowCL to find a match for each of the populations defined by FLOCK. Input This tool reads in the population score profiles from FLOCK. The marker names need to be in the cell ontology for this to work Output The output is a page that allows visualization of the data.
    • convert_fcs_to_text: converts FCS files to text format with no tranformation. Input files This tool requires valid FCS files as input. Files are processed serially. Applying compensation is an option for FCS files including a compensation matrix. Output file The output is tab-separated text, containing the fluorescence intensity values for each marker.
    • fcs_summary: generates a summary of a FCS file and list of markers. Input file This tool uses valid FCS files as input. Output file FCS file summary includes number of events, list of markers and parameters, and summary statistics for each.
    • flowtext_summary: generates a summary of a txt-converted FCS file and list of markers. Input file This tool uses txt-converted FCS files as input. Output file File summary includes number of events, list of markers and parameters, and summary statistics for each.
    • flow_overview: generates an overview of the flow analysis results. Input Tool input is a tab-separated file containing markers fluorescence intensities for each event as well as population and is generated as part of FLOCK or CrossSample output. If the option is selected, flowCL is used to associate populations defined by FLOCK to a Cell Ontology term. Output The output is a page with multiple tabs that allows visualization of the data.
    • cross_sample: runs CrossSample using the MFI from FLOCK and text-converted FCS files. *** FLOCK needs to be compiled upon fresh install, see README *** Input This tool compares text-converted FCS files from a data collection to the MFI generated by a FLOCK run. The same data collection merged and run with FLOCK should be used to ensure consistency in the attribution of events to populations. Output Each event within each file of a dataset collection is attributed to a population depending on its intensity profile. A table of the population composition of each file is generated as well as MFI and population descriptive statistics.
    • generate_mfi: generates the Mean, Median or Geometric Mean Fluorescence Intensity of a FLOCK output file. Input file This tool reads in a FLOCK output file. Output file The output is a table containing the mean, median or geometric mean fluorescent intensity values of each marker within each population defined by FLOCK.
    • flowtext_scatterplot: allows generation of density scatter plots using ggplot2. Input files This tool takes txt-converted FCS files as input. Output files This tool generates a scatter plot for each marker combination in a single png file. A pdf file can optionally be generated.
    • check_fcs_headers: returns a table of the headers of a set of FCS files. Input files This tool requires collections of FCS files as input. Output file The output file is a table listing the markers and channels for each file.
    • flowai: automatically performs quality control of flow cytometry data. Input files • One or more FCS files. Output files • full HTML report • new FCS file containing only high quality events (default) • new FCS file containing only low quality events (optional) • original FCS file containing an additional parameter where the low quality events have a value higher than 10,000 (optional).
    • flowcl: uses flowCL to find the most likely match to a given set a markers. Input Type in the marker names and select the expression level in the drop-down menu. Output A summary of the ouptut of flowCL is captured in a txt file. For more details, please refer to the flowCL documentation. Graphical output FlowCL generates a plot for the most likely matches to ontology.
    • fcs_gate_trans: allows automated gating of debris using flowDensity and conversion of FCS files to text using the FCSTrans transformation. Input files This tool uses FCS files as input and files are processed serially. Users choose to automatically gate cellular debris, and/or compensate the data. Output files Output is tab-separated text containing transformed fluorescence intensity values for each marker. If the option is selected, a FCS file (format FCS3.0) is generated. Gating output Automatically gated output includes a summary of data pre- and post-gating as well as density scatter plots pre- and post- gating for each marker pair. ----- Compensation will be implemented according to the spillover matrix included in the FCS files (if available). ----- Automated Gating Automated gating is implemented with flowDensity. Cellular debris removal uses gate coordinates calculated based on the density of the forward scatter channel only. The calculated gate is vertical and located at the largest value of either the 0.1 quantile of the FSC density or the lowest density between the first and second density peaks. Cells smaller than the threshold are removed.
    • collapse_pop: collapses several populations into one. Input FLOCK or Cross Sample output - a table of the fluorescence intensities for each event and the population associated with each. Output The input file with selected populations replaced by the indicated population.
    • check_headers: returns a table of the headers of a set of text files. Input files This tool requires collections of txt, flowtext or tabular files as input. Output file The output file is a table listing the headers for each file.
    • fcs_scatterplot: allows generation of density scatter plots using flowDensity. Input files This tool takes valid FCS files as input. Output files This tool generates a scatter plot for each marker combination in a single png file. A pdf file can optionally be generated.
    • txt_diagnosis: looks for potential errors in txt-converted FCS files. Input This diagnosis tools reads in text files, and checks that the data is all numeric. Output The output is a report with the errors and corresponding line numbers.
    • extract_pop: extracts events from given populations from FLOCK or Cross Sample outputs. Input FLOCK or Cross Sample output - a table of the fluorescence intensities for each event and the population associated with each. Output The input file filtered for selected populations.
    • auto_collapse_pop: automatically collapses populations together based on FLOCK score profiles. Input FLOCK or Cross Sample output - a table of the fluorescence intensities for each event and the population associated with each, as well as the file containing the score profiles for each FLOCK population. Output The input file with selected populations replaced by the indicated population. This tool also generates a report.
    • cs_overview: generates an overview of the flow analysis results. Input Input files are summary tables generated from a CrossSample analysis (tab-separated file containing counts of events in each population for each file run through CrossSample and a tab-separated file containing the MFI for each marker in each population for each file.) Output The output is a page that allows visualization of the data.
    • flow_datatypes: flow datataypes. datatypes for flow cytometry tools.
    • merge_ds_flowtext: downsamples and merges multiple txt-converted FCS files into one text file. Input files This tool requires collections of txt, flowtext or tabular files as input. Downsampling By default, files are not downsampled. If a downsampling factor is provided, each file in the input dataset collection will be downsampled randomly without replacement as follows: - If n is between 0 and 1, the size of the output will be n times that of the input files. - If n is between 1 and 100, the size of the output will be n% that of the input files. At this time, up-sampling is not supported. If the number provided is greater than 100, the tool will exit. Output file The output flowtext file contains is a concatenation of the input files provided all data after the header contains only numbers. By default, only columns existing in all input files (as assessed by the header) are concatenated. The user can specify columns to merge, bypassing the headers check. If a downsampling factor is provided, the corresponding proportion of each input file ONLY will be read in (and checked for errors). Potential errors are logged to stderr. If the number of errors reaches 10, the run will be aborted. If a file contains non-numeric data, the run will be aborted. Tip: Three tools in the Flow File Tools section can help prepare files for merging and/or downsampling: - Check headers tool provides a list of headers for all files in a collection of text, flowtext or tabular files. - Remove, rearrange and/or rename columns tool allows manipulation of the columns of a file or a set of files. - Check data tool identifies the lines in a file containing non-numeric data.
    • rearrange_columns: enables the removal, rearrangement and/or renaming of text file columns. Input files This tool requires txt, flowtext or tabular files as input. Column order Please indicate columns to keep in the order in which they should be (comma-separated list). This field is optional. Column names Please indicate the new columns headings in the order in which they should appear in the ouptut file (comma-separated list). The number of headings should match the number of columns in the output. This field is optional. Output file The output flowtext file is a copy of the input file with rearranged and/or renamed columns.
  • From testtool:

    • find_dmr: Galaxy tool designed to identify differentially methylated regions. Galaxy tool designed for identifying genomic regions of biological interest in large series of samples using bumphunter method.
    • get_gsm: Get sample accessions numbers (GSM) and phenotype from (GSE) series number. Galaxy tool designed to download Gene Expression Omnibus series of samples ID and phenotype records. Example usage: GSE51547.
    • anno_peak_figure: Visualizing annotation data. Galaxy tool designed to visualizing annotation data.
    • annotate_peak: Annotate genomic region of the peak (BED). Galaxy tool designed to annotate genomic region of the peak.
    • geo_data: Get GEO samples records by GSM "ID" and "PHENOTYPE" table. Galaxy tool designed to download Gene Expression Omnibus samples records for further analysis. Example usage: ID,Phenotype GSM1247787,melanoma GSM1247784,melanoma GSM1247733,healthy.
  • From nick:

  • From md-anderson-bioinformatics:

    • heat_map_creation: Tools for creating Next-Generation Clustered Heat Maps (NG-CHM). Generate a clustered Heat Map from a data matrix, with many options for clustering. The input matrix is required to have labels in the first column and the first row containing. For example, column headers could be patient IDs and row headers (first column) could contain gene symbols. Covariate files add additional information bars to the heat map. For example, a patients smoking status could be provided as a covariate file. Any input covariate bar files must have the same row or column labels as in the input matrix to associate the covariate information with the appropriate row or column. The output is a compressed ngchm file that can be displayed in the NG-CHM viewer. To access the viewer in Galaxy, use the visualize icon at the bottom of the Galaxy History NG-CHM tool output file. Expand the History output file, then at the bottom are several icons the order being -- the save icon, information "I", rerun, then the Visualization (a chart looking icon). Hover over it, and select the 'NG-CHM Heat Map Viewer' option. The Heat Map will display in the Galaxy middle pane. Please see our YouTube tutorial video for an overview of creating NG-CHM heat maps in galaxy https://www.youtube.com/watch?v=UmP7HjFD-ns Full Documentation: http://bioinformatics.mdanderson.org/main/NG-CHM-V2:Overview The Galaxy visualization component does not install automatically. You will need to run the following commands in terminal mode: NOTE: The following assumes /galaxy-central is the home directory for your Galaxy instance, otherwise replace /galaxy-central with your Galaxy instance's root directory 1) mv /galaxy-central/../shed_tools/toolshed.g2.bx.psu.edu/repos/md-anderson-bioinformatics/heat_map_creation/*/heat_map_creation/mda_heatmap_viz.zip /galaxy-central/config/plugins/visualizations/ 2) cd /galaxy-central/config/plugins/visualizations/ 3) unzip mda_heatmap_viz.zip Then you must restart Galaxy for the visualization portion to take effect. 4) cd /galaxy-central 5) sh run.sh Or if using a docker instance 4b) docker stop 5b) docker start Remember you MUST be logged in to be able to see the heat map visualization component icon. The NG-CHM Generator will run even if you are not logged into Galaxy.
  • From mvdbeek:

    • bam_readtagger: Tags reads in a BAM file based on other BAM files. Useful when inspecting reads that may originate from different sources with different degrees of uncertainty.
  • From insilico-bob:

    • mean_center_matrix: Mean-center a matrix with header row and 1st column with labels Assumes Labels are in row 1 and in column 1 Mean-center all values in a row (cell value = cell value - row mean value) Repeat for all rows 2 - N+1.
  • From erasmus-medical-center:

    • dr_disco: Dr. Disco: fusion gene breakpoint detection in total RNA-seq data. Detects and classifies exon-to-exon and genomic breakpoints of fusion genes in total RNA-seq data using a Graph data structure.
    • voom_transform: Voom: Transform count data to fit linear modelling. Voom: Transform count data to log2-counts per million (logCPM), estimate the mean-variance relationship and use this to compute appropriate observational-level weights.
  • From melpetera:

    • generic_filter: [W4M][Utils] Filtering according to specific variables. Part of the W4M project: http://workflow4metabolomics.org / The R script removes all samples and/or variables corresponding to specific values regarding designated factors or numerical variables, in datasets corresponding to dataMatrix, sampleMetadata and variableMetadata.
    • tablemerge: [W4M][Utils] Merging dataMatrix with a metadata table. Part of the W4M project: http://workflow4metabolomics.org / The R script merges the data matrix with a selected metadata file (sample metadata or variable metadata) to obtain a single file.
  • From yating-l:

    • psltobigpsl: Tranform a psl format file to a bigpsl format file.
    • rename_scaffolds: Wrapper for the program which renames the scaffold. This tool is to rename scaffolds in reference genome so that the sequence names are less than 31 characters. Rename all scaffolds to scaffold_1, scaffold_2, ..., scaffold_N and provide a name mapping file.
  • From earlhaminst:

    • geneseqtofamily: Workflow based on the Ensembl GeneTrees pipeline to identify homologous genes and generate gene trees. This is a complex workflow, based on the Ensembl GeneTrees pipeline, which uses NCBI BLAST+, hcluster_sg, T-Coffee and TreeBeST to identify homologous genes and generate gene trees. The output gene trees can be visualised using the Aequatus.js visualisation plugin.
  • From peterjc:

    • count_roi_variants: v0.0.4 - Previously only on Test Tool Shed. Count sequence variants in region of interest in BAM file. This tool runs the command samtools view (taking advantage of an indexed BAM file) to access only those reads mapped to the region of interest (ROI), and then counts the different sequence variants found.
  • From rnateam:

    • nastiseq: A method to identify cis-NATs using ssRNA-seq. To identify cis-NATs using ssRNA-seq, we developed a new computational method based on model comparison that incorporates the inherent variable efficiency of generating perfectly strand-specific libraries.
    • graphclust_motif_finder_plot: Plotting results for GraphClust. Plotting results for GraphClust.
    • locarna_exparnap: Wrapper for application LocARNA Exact Matcher of the LocARNA suite. The LocARNA package comprises tools for fast, high-quality pairwise and multiple alignment of RNA sequences, while inferring unknown structure; this is accomplished by simultaneous folding and alignment based on sequence and structure features of the RNAs.
    • aresite2: AREsite2 REST Interface. AREsite2 represents an update for AREsite, an on-line resource for the investigation of AU-rich elements (ARE) in human and mouse mRNA 3’UTR sequences. The new updated and enhanced version allows detailed investigation of AU, GU and U-rich elements (ARE, GRE, URE) in the transcriptome of Homo sapiens, Mus musculus, Danio rerio, Caenorhabditis elegans and Drosophila melanogaster. It contains information on genomic location, genic context, RNA secondary structure context and conservation of annotated motifs. Improvements include annotation of motifs not only in 3’UTRs but in the whole gene body including introns, additional genomes, and locally stable secondary structures from genome wide scans. Furthermore, we include data from CLIP-Seq experiments in order to highlight motifs with validated protein interaction. Additionally, we provide a REST interface for experienced users to interact with the database in an semi-automated manner. The database is publicly available at: http://rna.tbi.univie.ac.at/AREsite.
    • intarna: Efficient RNA-RNA interaction prediction incorporating accessibility and seeding of interaction sites. During the last few years, several new small regulatory RNAs (sRNAs) have been discovered in bacteria. Most of them act as post-transcriptional regulators by base pairing to a target mRNA, causing translational repression or activation, or mRNA degradation. Numerous sRNAs have already been identified, but the number of experimentally verified targets is considerably lower. Consequently, computational target prediction is in great demand. Many existing target prediction programs neglect the accessibility of target sites and the existence of a seed, while other approaches are either specialized to certain types of RNAs or too slow for genome-wide searches.
    • locarna_pairwise_p: Wrapper for application LocARNA Pairwise Probability Aligner of the LocARNA suite. The LocARNA package comprises tools for fast, high-quality pairwise and multiple alignment of RNA sequences, while inferring unknown structure; this is accomplished by simultaneous folding and alignment based on sequence and structure features of the RNAs.
  • From thomaswollmann:

    • anisotropic_diffusion: Anisotropic image diffusion. Edge-preserving, Anisotropic image diffusion.
    • mahotas_features: Compute image features using mahotas. Mahotas is a computer vision and image processing library for Python. This tool computes image features using mahotas.
    • binary2labelimage: Binary 2 label image. This tools converts a binary image to a label image (every object has an own grey value).
    • color_deconvolution: Color-deconvolution methods. This tools does color space transformation using preset transformation matrices or color space decomposition.
    • visceral_evaluatesegmentation: Visceral Project - Evaluate Segmentation Tool. This tools calculates several measures to evaluate image segmentations.
  • From vmarcon:

    • repet_teannot: TEannot - REPET Lite. Genome annotation for masking transposable elements @Authors: Gwendoline Andres Valentin Marcon Veronique Jamilloux Olivier Inizan.
    • giveinfofasta: GiveInfoFasta (a tool from REPET pipe). Give Information about your Fasta File GiveInfoFasta is a tool from the REPET suite @Authors: Valentin Marcon.
    • package_repet_2_5: Declaration des variable d'environnement pour REPET v2.5. Environment variables for REPET 2.5 (for REPET Lite) @Authors: Gwendoline Andres.
    • repet_tedenovo: TEdenovo - REPET Lite. Compute a library of transposable element @Authors: Gwendoline Andres Valentin Marcon Veronique Jamilloux Olivier Inizan.
  • From nml:

    • combine_tabular_collection: Combine Tabular Collection into a single file.
    • kat_sect: SEquence Coverage estimator Tool. Estimates the coverage of each sequence in a file using K-mers from another sequence file.
    • fasta_extract: extract single fasta from multiple fasta file.
    • srst2: Short Read Sequence Typing for Bacterial Pathogens.
    • plasmid_profiler: Explores plasmid content in WGS data.
  • From pstew:

    • escape_excel: XML. Escape Excel: a tool for preventing gene symbol and accession conversion errors.
  • From fgiacomoni: