October 2018 Tool Shed contributions

Tools contributed to the Galaxy Project ToolShed in October 2018.

All monthly summaries

New Tools

From galaxyp:
- cardinal_spectra_plots: Cardinal (mass spectrometry imaging) suite: MSI plot spectra. Cardinal is an R package that implements statistical and computational tools for analyzing mass spectrometry imaging datasets.
- cardinal_mz_images: Cardinal (mass spectrometry imaging) suite: MSI mz images.
- lfq_protein_quant: Enable protein summarisation and quantitation. Enable protein summarisation and quantitation.
- cardinal_preprocessing: Cardinal (mass spectrometry imaging) suite: MSI preprocessing.
- cardinal_combine: Cardinal (mass spectrometry imaging) suite: MSI combine.
- cardinal_quality_report: Cardinal (mass spectrometry imaging) suite: MSI Qualitycontrol.
- cardinal_segmentations: Cardinal (mass spectrometry imaging) suite: MSI segmentation.
- cardinal_data_exporter: Cardinal (mass spectrometry imaging) suite: MSI data exporter.
- cardinal_classification: Cardinal (mass spectrometry imaging) suite: MSI classification.
- cardinal_filtering: Cardinal (mass spectrometry imaging) suite: MSI filtering.
From chemteam:
- gmx_md: Wrapper for the gromacs package: GROMACS production simulation. GROMACS is a versatile package to perform molecular dynamics, i.e. simulate the Newtonian equations of motion for systems with hundreds to millions of particles. It is primarily designed for biochemical molecules like proteins, lipids and nucleic acids that have a lot of complicated bonded interactions, but since GROMACS is extremely fast at calculating the nonbonded interactions (that usually dominate simulations) many groups are also using it for research on non-biological systems, e.g. polymers. GROMACS supports all the usual algorithms you expect from a modern molecular dynamics implementation (check the online reference or manual for details).
- gmx_solvate: Wrapper for the gromacs package: GROMACS solvation and adding ions.
- gmx_npt: Wrapper for the gromacs package: GROMACS NPT equilibration.
- gmx_em: Wrapper for the gromacs package: GROMACS energy minimization.
- gmx_nvt: Wrapper for the gromacs package: GROMACS NVT equilibration.
- gmx_setup: Wrapper for the gromacs package: GROMACS initial setup.
- bio3d_rmsf: Wrapper for the Bio3D package: RMSF Analysis. Bio3D is an R package containing utilities for the analysis of protein structure, sequence and trajectory data.
- bio3d_rmsd: Wrapper for the Bio3D package: RMSD Analysis.
- bio3d_pca: Wrapper for the Bio3D package: PCA.
- mdanalysis_angle: Wrapper for the MDAnalysis package: Angle Analysis. MDAnalysis (https://www.mdanalysis.org) is a Python toolkit to analyze molecular dynamics trajectories generated by a wide range of popular simulation packages including DL_Poly, CHARMM, Amber, NAMD, LAMMPS, and Gromacs.
- mdanalysis_distance: Wrapper for the MDAnalysis package: Distance Analysis.
- mdanalysis_rdf: Wrapper for the MDAnalysis package: RDF Analysis.
- mdanalysis_dihedral: Wrapper for the MDAnalysis package: Dihedral Analysis.
- packmol: PACKMOL is a package for creating starting structures for Molecular Dynamics simulations. “PACKMOL creates an initial point for molecular dynamics simulations by packing molecules in defined regions of space. The packing guarantees that short range repulsive interactions do not disrupt the simulations. The great variety of types of spatial constraints that can be attributed to the molecules, or atoms within the molecules, makes it easy to create ordered systems, such as lamellar, spherical or tubular lipid layers. The user must provide only the coordinates of one molecule of each type, the number of molecules of each type and the spatial constraints that each type of molecule must satisfy.”.
- md_converter: A tool for interconverting between different MD structure and trajectory file formats. A tool for interconverting between different MD structure and trajectory file formats, from GROMACS, CHARMM and NAMD. PDB, DCD, GRO, TRR and XTC formats are currently supported.
From hepcat72:
- vcfsamplecompare: sort variants by sample differences. This utility sorts and (optionally) filters the rows/variants of a VCF file (containing data for 2 or more samples) based on the differences in the variant data between samples or sample groups. Degree of “difference” is determined by either the best possible degree of separation of sample groups by genotype calls or the difference in average allelic frequency of each sample or sample group (with a gap size threshold). The pair of samples or sample groups used to represent the difference for a variant row is the one leading to the greatest difference in consistent genotype or average allelic frequencies (i.e. observation ratios, e.g. AO/DP) of the same variant state. If sample groups are not specified, the pair of samples leading to the greatest difference is greedily discovered and chosen to represent the variant/row.
- lumpyexpress: Structural Variant detection.
From bgruening:
- graphicsmagick_image_convert: Wrapper for the GraphicsMagic suite: Image Converter. GraphicsMagick is the swiss army knife of image processing. Comprised of 265K physical lines (according to David A. Wheeler’s SLOCCount) of source code in the base package (or 1,220K including 3rd party libraries) it provides a robust and efficient collection of tools and libraries which support reading, writing, and manipulating an image in over 88 major formats including important formats like DPX, GIF, JPEG, JPEG-2000, PNG, PDF, PNM, and TIFF.
- graphicsmagick_image_montage: Wrapper for the GraphicsMagic suite: Image Montage.
- plotly_ml_performance_plots: performance plots for machine learning problems. The tool creates three plots to measure the performance of a machine learning trained model based on multiple metrics. The metrics include: confusion matrix and precision, recall, F-score and area under the ROC curve. http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion\_matrix.html http://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision\_recall\_fscore\_support.html http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc\_curve.html.
From melpetera:
- intensity_checks: [W4M][Utils] Adding information about intensities in the variable metadata file. Part of the W4M project: http://workflow4metabolomics.org / This tool performs two metrics: the mean fold change calculation, and the number and proportion of missing values.
- idchoice: [W4M][Utils] Choosing new identifiers. Available for the W4M project: http://workflow4metabolomics.org / This tool is designed to make you choose a particular column in your metadata file to be considered as identifiers for your W4M-3-tables-format data.
- corr_table: [W4M][Statistics] Correlation table between two tables. Part of the W4M project: http://workflow4metabolomics.org / Creation of a correlation table between the variables of two distinct tables, along with a colored graphical output.
From ieguinoa:
- data_manager_fetch_tx2gene: Load entries in tx2gene table with transcript to gene tables or GTF/GFF file.
From jaredgk:
- ppp_vcfphase: Phase VCF file.
- ppp_vcf_to_ima: Convert VCF file(s) to IMa input format.
From estrain:
- sum_fastqc: Basic Summary of FASTQC Raw Output. Extract basic summary statistics from raw FASTQC output: pass-fail, number of reads, number of poor reads, gc content, and percentage of reads > Q-score threshold.
- confindr: ConFindr is a pipeline that can detect contamination in bacterial NGS data. ConFindr works by looking at rMLST genes. These 53 genes are known to be single copy and conserved across all bacteria, making them excellent markers. As they are known to be single copy (with some caveats), any sample that has multiple alleles of one or more rMLST gene is likely to be contaminated.
From frogs:
- frogs_2_0_0: Suite for metabarcoding. This is FROGS version 2.0.0 Galaxy Wrappers FROGS: Find Rapidly OTUs through Galaxy Solution Authors: Valentin Marcon (valentin.marcon[a]inra.fr) Olivier Inizan (olivier.inizan[a]inra.fr) Laure Quintric (laure.quintric[a]ifremer.fr) Patrick Durand (pgdurand[a]ifremer.fr) Maria Bernard (maria.bernard[a]inra.fr) Géraldine Pascal (geraldine.pascal[a]inra.fr).
From jowong:
- data_path: output path of datasets in a txt file.
- prince_galaxy: VNTR copy number approximation.
From mvdbeek:
- damidseq_consecutive_peaks: Find consecutive peaks in deseq2 output of DamID data. Find consecutive peaks in deseq2 output of DamID data.
From greg:
- affy2vcf: Contains a tool converts Affymetrix genotype calls and intensity files to VCF format. Contains a tool converts Affymetrix genotype calls and intensity files to VCF format.
From fgiacomoni:
- bank_inhouse: Master branch Updating - - Fxx. [W4M][LC-MS] Bank in House - Annotation - Search by accurate mass on local bank. Part of the F.L.A.M.E. project. The process returns outputs files (CSV and HTML formats).
From iuc:
- cd_hit: Cluster or compare biological sequence datasets. CD-HIT is a widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to reduce sequence redundancy and improve the performance of other sequence analyses.
- newick_utils: Perform operations on Newick trees. The Newick Utilities are a set of tools for working with phylogenetic trees. They are not tools for making phylogenies. Rather, they are for processing existing ones, for example manipulating the tree or extracting information from it; displaying, rerooting, simplifying, extracting subtrees, printing branch lengths and distances, etc.
- samtools_merge: merge multiple sorted alignment files. Merge multiple sorted alignment files, producing a single sorted output file that contains all the input records and maintains the existing sort order.
- data_manager_fetch_refseq: Fetch FASTA data from NCBI RefSeq and update all_fasta data table. This data manager fetches FASTA format collections of proteins, nucleotides (genomic DNA) and RNA from NCBI’s RefSeq data collection.
- slamdunk: Slamdunk maps and quantifies SLAMseq reads. Slamdunk maps SLAMseq reads, resolves multimappers and quantifies the fraction of labeled and unlabeled reads within samples as well as provides diagnostics plots.
- fgsea: Perform gene set testing using fgsea. fgsea implements an algorithm for fast gene set enrichment analysis. Using the fast algorithm allows to make more permutations and get more fine grained p-values, which allows to use accurate stantard approaches to multiple hypothesis correction.
- volcanoplot: Tool to create a Volcano Plot. Create a Volcano Plot with ggplot2 where significant upregulated and downregulated points are coloured (red/blue) and labels can be applied to points with ggrepel.
- samtools_depth: Computes the depth at each position or region. Computes the depth at each position or region of a sequence for a given alignment file.
From jfrancoismartin:
- mixmodel4repeated_measures: [Metabolomics][W4M][Statistics] Mixed models - Analysis of variance for repeated measures using mixed model. Part of the W4M project: http://workflow4metabolomics.org.
From lecorguille:
- xcms_export_samplemetadata: [Metabolomics][W4M][LC-MS] XCMS R Package - Preprocessing - Get a sampleMetadata file. Part of the W4M project: http://workflow4metabolomics.org XCMS: http://www.bioconductor.org/packages/release/bioc/html/xcms.html Filtration and Peak Identification using xcmsSet function from xcms R package to preprocess LC/MS data for relative quantification and statistical analysis.
From rnateam:
- graphclust_align_cluster: Align predicted clusters of glob_report_no_align step with locarna and conservation analysis and visualizations.
- graphclust_postprocessing_no_align: Redundant GraphClust clusters are merged and instances that belong to multiple clusters are assigned unambiguously. Post-processing. Redundant clusters are merged and instances that belong to multiple clusters are assigned unambiguously. For every pair of clusters, the relative overlap (i.e. the fraction of instances that occur in both clusters) is computed and clusters are merged if the overlap exceeds 50%. instances that occur in both clusters) is computed and clusters are merged if the overlap exceeds 50%.
- graphclust_aggregate_alignments: Aggregate and filter alignment metrics of individual clusters, like the output of graphclust_align_cluster.
From jfb:
- blosum_toolshed: Scores peptide for Terbium binding propensity. Scores peptides for terbium binding propensity as part of the KALIP-KINATEST-ID process.