The April 2015 Galactic News!

Galaxy Updates

Welcome to the April 2015 Galactic News, a summary of what is going on in the Galaxy community. These newsletters complement the Galaxy Development News Briefs which accompany new Galaxy releases and focus on Galaxy code updates.

New Papers

68 new papers referencing, using, extending, and implementing Galaxy were added to the Galaxy CiteULike Group in March, bring the total to 2200 publications. Some highlights:

The new papers were tagged with:

# Tag    # Tag    # Tag    # Tag
2 Cloud - Project 6 Tools 12 UsePublic
1 HowTo 5 RefPublic - UseCloud 1 Visualization
1 IsGalaxy 1 Reproducibility 6 UseLocal 19 Workbench
33 Methods 2 Shared 7 UseMain

Events

April GalaxyAdmins Meetup

GalaxyAdmins meetup April 16

The next GalaxyAdmins online meetup will be 16 April. Carrie Ganote from the National Center for Genome Analysis Support (NCGAS) and Pervasive Technology Institute at Indiana University will talk about her Galaxy work with Trinity, IU Galaxy, and the Open Science Grid.

See the meetup page for more.

GalaxyAdmins is a special interest group for Galaxy community members who are responsible for Galaxy installations.

Galaxy Workshop Tokyo, April 28

The Galaxy Workshop Tokyo 2015 is a full day of hands-on training, keynotes, lightning talks and discussions all about ways of using Galaxy for high-throughput biology, specially for human genome sequencing. This workshop will consist of two parts: The morning session is a hands-on training to learn how to run Pitagora-Galaxy, our Galaxy preconfigured virtual machine, on your laptop or on AWS cloud. The afternoon session includes keynotes and lightning sessions to explain actual workflows, which you can try immediately on Pitagora-Galaxy.

GCC2015: 6-8 July, Norwich UK

Sponsor GCC2015

The 2015 Galaxy Community Conference (GCC2015) is the Galaxy community's annual gathering of users, developers, and administrators. Previous GCC's have drawn over 200 participants, and we expect that to happen again in 2015. GCC2015 is being hosted by The Sainsbury Lab in Norwich, UK, immediately before BOSC and ISMB/ECCB in Dublin.

There are a lot of events going on at GCC2015, including:

Code Hackathon

An intense two-day hands-on collaboration to develop working code that is useful to the Galaxy community. If you know how to code, and want to contribute to one of the most successful open source projects in the life sciences, then please consider attending. See the Code Hackathon home page for more.

Data Wrangling Hackathon

An intense two-day hands-on collaboration to develop cutting edge analysis pipelines that are useful to the Galaxy community. If you know data analysis, we would love to have you here to help us beat back those seemingly unsurmountable analysis challenges. See the Data Wrangling Hackathon home page for more.

Training SunDay

Something new for GCC2015 is Training SunDay, an additional day of training offered the day before its older sibling Training Day, and featuring a single track with the most in-demand topics. You can attend both Training Days, or just one. Training SunDay features these three topics:

These three topics are also offered on Monday as well. You can register for one or both Training Days.


Training Day Schedules set

Training (Mon)Day

The schedule for Training Day, Monday, 6 July is available. Training Day featuring five parallel tracks, each with three, two and a half hour workshops. There are topics on using Galaxy, interacting with it programmatically, and deploying, administering, and extending Galaxy. No matter what you do with Galaxy, there are workshops for you.

Early Registration opens ...

Early registration (save heaps) will open in April, we promise. Early registration is very affordable and starts at less than £40 per day for students and postdocs. If you work in data-intensive life science research, then it is hard find a meeting more relevant than GCC2015. We look forward to seeing you there.

Paper Abstract Submission Extended to April 20

Abstract submission for Oral and Poster Presentations at the 2015 Galaxy Community Conference (GCC2015) is now open.

Abstract submission for oral presentations closes 10 20 April, while poster submission closes 1 May. Poster authors will be notified of acceptance status within two weeks of submission, while oral presentation authors will be notified no later than 4 May. Please consider presenting your work. If you are dealing with big biological data, then this meeting wants to hear about it.

GCC2015 Sponsorships

SGI    Intel    Kelway

We are pleased to announce a joint GCC2015 Platinum Sponsorship from SGI, Intel, and Kelway. Please welcome Intel and Kelway to the community, and welcome SGI back!

Call for Sponsors

The 2015 Galaxy Community Conference (GCC2015) is still accepting Sponsorships. Your organisation can play a prominent part in the Galaxy community by sponsoring GCC2015. Sponsorship is an excellent way to raise your organization’s visibility.

Several sponsorship levels are available, including two levels of premier sponsorships that include presentations. Premium sponsorships are limited, however, so you are encouraged to act soon.

Please let the organisers ([gcc2015-org AT lists DOT galaxyproject DOT org](mailto:gcc2015-org AT lists DOT galaxyproject DOT org)) know if you are interested in helping make this event a success.

Other Events

Fourth GUGGO meeting   16 April GalaxyAdmins Web Meetup   QFAB Workshops Bio-IT World 2015 High-throughput Biology: From Sequence to Networks 2015

There are upcoming events in 8 countries on 4 continents. See the Galaxy Events Google Calendar for details on other events of interest to the community.

Date Topic/Event Venue/Location Contact
April 7 5th GUGGO meeting
Training offered by GTN Member
Galaxy User Group Grand Ouest (GUGGO), Rennes, France
Cyril Monjeaud, Yvan Le Bras
April 14-15 Swift Parallel Scripting for Science, Engineering and Data Analysis GlobusWorld 2015, Chicago, Illinois, United States Mike Wilde
April 16 GalaxyAdmins April 2015 Meetup Online Hans-Rudolf Hotz, Dave Clements, Carrie Ganote
April 20-21 Workshop: Extended RNA-Seq analysis
Training offered by GTN Member
QFAB, University of Queensland, St Lucia, Australia
Mark Crowe
April 21 Large-Scale NGS Analysis Using Globus Genomics: Challenges and User Success Stories Bio-IT World 2015, Boston, Massachusetts, United States Ravi Madduri
Workshop: Large Scale NGS Analysis Using Globus Genomics Paul Davé, Ravi Madduri, Alex Rodriguez, Dinanath Sulakhe
Extending Galaxy with External Microbiome Databases Bob Brown
April 23 Visualization Tools for the Refinery Platform Nils Gehlenborg
April 27 - May 3 High-throughput Biology: From Sequence to Networks 2015
Training offered by GTN Member
New York City, New York, United States
Francis Ouellette
April 28 Galaxy Workshop Tokyo 2015 RCAST, The University of Tokyo, Japan Ryota Yamanaka
May 12 Initiation à l’utilisation de Galaxy Jouy-en-Josas, France Veronique Martin, Sophie Schbath
May 13 Analyse primaire de données issues de séquenceurs nouvelle génération sous Galaxy
May 13 Workshop: Variant detection using Galaxy
Training offered by GTN Member
QFAB, University of Queensland, St Lucia, Australia
Mark Crowe
May 25-29 MIPRO Opatija, Croatia Enis Afgan
May 26-28 Workshop on the Application of Next Generation Sequencing to Repetitive DNA Analysis in Plants České Budějovice, Czech Republic Jiri Macas
June 2 Initiation à l’utilisation de Galaxy Jouy-en-Josas, France Veronique Martin, Sophie Schbath
June 3 Analyse primaire de données issues de séquenceurs nouvelle génération sous Galaxy
June 4 Traitement bioinformatique des données RNA-Seq sous Galaxy
June 3 Workshop: RNA-Seq analysis using Galaxy
Training offered by GTN Member
QFAB, University of Queensland, St Lucia, Australia
Mark Crowe
June 22 Introduction to variant calling for pathologists and laboratory managers
Training offered by GTN Member
Part of Short Course in Medical Genetics and Genetic Pathology 2015, Sydney, Australia
Ross Lazarus, Andrew Lonie
July 6-8 2015 Galaxy Community Conference (GCC2015)
Training offered by GTN Member
The Sainsbury Lab, Norwich, United Kingdom
Galaxy Outreach
July 10-14 Using Biological Cyberinfrastructure to Scale Science and People – Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training ISMB / ECCB 2015, BOSC 2015, Dublin, Ireland Dave Clements
Designates a training event offered by GTN Member Designates a training event offered by GTN member(s)

Who's Hiring

Please Help! Yes you!

The Galaxy is expanding! Please help it grow.

Got a Galaxy-related opening? Send it to outreach@galaxyproject.org and we'll put it in the Galaxy News feed and include it in next month's update.



New Public Servers

One new public Galaxy server was added in March:

Vinther Lab

Vinther Lab: User-Friendly Tools for Sequencing-Based RNA Structure Probing Data

Galaxy Community Hubs

Galaxy Training Network Galaxy Community Log Board Galaxy Deployment Catalog
Share your training resources and experience now Share your experience now

One new Community Log Board entry was added in March:


Releases

March 2015 Galaxy Release (v 15.03)

width=175

Complete Release Notes

Highlights

Release Versioning

Starting with this distribution, an updated Galaxy release versioning system has been implemented. The versioning scheme is Ubuntu-style.

Github

Galaxy development has moved to Github, but stable/release changes are mirrored to Bitbucket. Deployers can continue to use Bitbucket as they have done in the past. Release branches discussed in the full release notes.

Tool Redesign

Much of Galaxy’s core tool set has been redesigned. Several contain new functionality. These tools are included in the Tool Shed and many are ready for use on Galaxy Main.

Get the Galaxy Release

getgalaxy    getgalaxy.org
galaxy-dist.readthedocs.org
bitbucket.org/galaxy/galaxy-dist
new: $ hg clone https://bitbucket.org/galaxy/galaxy-dist#stable
upgrade: $ hg pull
$ hg update latest_15.03



Thanks for using Galaxy!
The Galaxy Team

BioBlend v0.5.3 Released

CloudMan

BioBlend v0.5.3 has been released. BioBlend is a python library for interacting with CloudMan and the Galaxy API. (CloudMan offers an easy way to get a personal and completely functional instance of Galaxy in the cloud in just a few minutes, without any manual configuration.)

This is mostly an incremental bug fix release with the following summary of changes:

  • Project source moved to new URL - https://github.com/galaxyproject/bioblend
  • Huge improvements to automated testing, tests now run against Galaxy release_14.02 and all later versions to ensure backward compatibility (see travis.yml for details).
  • Many documentation improvements (thanks to Helena Rasche).
  • Add Galaxy clients for the tool data tables, the roles, and library folders (thanks to Anthony Bretaudeau).
  • Add method to get the standard error and standard output for the job corresponding to a Galaxy dataset (thanks to Anthony Bretaudeau).
  • Add get_state() method to JobsClient.
  • Add copy_from_dataset() method to LibraryClient.
  • Add create_repository() method to ToolShedClient (thanks to Helena Rasche).
  • Fix DatasetClient.download_dataset() for certain proxied Galaxy deployments.
  • Make LibraryClient._get_root_folder_id() method safer and faster for Galaxy release_13.06 and later.
  • Deprecate and ignore invalid deleted parameter to WorkflowClient.get_workflows().
  • CloudMan: Add method to fetch instance types.
  • CloudMan: Update cluster options to reflect change to SLURM.
  • BioBlend.objects: Deprecate and ignore invalid deleted parameter to ObjWorkflowClient.list().
  • BioBlend.objects: Add paste_content() method to History objects.
  • BioBlend.objects: Add copy_from_dataset() method and root_folder property to Library objects.
  • BioBlend.objects: Add container and deleted attributes to Folder objects.
  • BioBlend.objects: Set the parent attribute of a Folder object to its parent folder object (thanks to John M. Eppley).
  • BioBlend.objects: Add deleted parameter to list() method of libraries and histories.
  • BioBlend.objects: Add state and state_details attributes to History objects (thanks to Gianmauro Cuccuru).
  • BioBlend.objects: Rename upload_dataset() method to upload_file() for History objects.
  • BioBlend.objects: Rename input_ids and output_ids attributes of Workflow objects to source_ids and sink_ids respectively.
  • Add run_bioblend_tests.sh script (useful for Continuous Integration testing).

Enjoy and please let us know what you think,

Enis & John & Nicola Soranzo & Simone Leo & Helena Rasche

Planemo 0.6.0

Planemo 0.6.0 was released in March. The Release Notes:

  • Many enhancements to the tool building documentation - descriptions of macros, collections, simple and conditional parameters, etc…
  • Fix tool_init to quote file names (thanks to Peter Cock). Pull Request 98.
  • Allow ignoring file patterns in .shed.yml (thanks to Björn Grüning). Pull Request 99
  • Add --macros flag to tool_init command to generate a macro file as part of tool generation. ec6e30f
  • Add linting of tag order for tool XML files. 4823c5e
  • Add linting of stdio tags in tool XML files. 8207026
  • More tests, much higher test coverage. 0bd4ff0

Planemo is a set of command-line utilities to assist in building tools for the Galaxy project

CloudMan and blend4j

New versions CloudMan, and blend4j were released in August.


Other News



Galaxy ToolShed

ToolShed Contributions

A best practices for creating Galaxy Tools is now available on this wiki. Thanks to the many contributors who created it.

Galaxy Project ToolShed Repos

Note: Starting with the May news, this list will be placed on a separate page and linked to from here. This section is just getting too long (which is the kind of problem we want to have :-).

Suites

Tools

  • From galaxyp:

    • ms_wiff_loader: Loads AB Sciex wiff files from URLs into a Galaxy Wiff Composite dataset
    • ms_data_converter: Converts WIFF format MS data to mzML or MGF using AB SCIEX MS Data Converter
  • From rnateam:

    • kinwalker: The Kinwalker algorithm performs cotranscriptional folding of RNAs, starting at a user a specified structure (default: open chain) and ending at the minimum free energy structure. Folding events are performed between transcription of additional bases and are regulated by barrier heights between the source and target structure
    • vienna_rna: The ViennaRNA Package consists of several stand-alone programs for the prediction and comparison of RNA secondary structures. RNA secondary structure prediction through energy minimization is the most used function in the package. We provide three kinds of dynamic programming algorithms for structure prediction: the minimum free energy algorithm of (Zuker & Stiegler 1981) which yields a single optimal structure, the partition function algorithm of (McCaskill 1990) which calculates base pair probabilities in the thermodynamic ensemble, and the suboptimal folding algorithm of (Wuchty et.al 1999) which generates all suboptimal structures within a given energy range of the optimal energy. For secondary structure comparison, the package contains several measures of distance (dissimilarities) using either string alignment or tree-editing (Shapiro & Zhang 1990). Finally, we provide an algorithm to design sequences with a predefined structure (inverse folding). In case you are using our software for your publications you may want to cite:

    Lorenz, Ronny and Bernhart, Stephan H. and H\u00f6ner zu Siederdissen, Christian and Tafer, Hakim and Flamm, Christoph and Stadler, Peter F. and Hofacker, Ivo L.
    ViennaRNA Package 2.0, Algorithms for Molecular Biology, 6:1 26, 2011, doi:10.1186/1748-7188-6-26

  • From izsam:

    • phylogeny_converter: Converts different file formats (FASTA, GenBank, phylip, nexus) to allow data-exchange from different phylogeny tools.
  • From biomonika:

  • From jjkoehorst:

    • sapp: GBK2RDF Semantic Annotation Platform for Prokaryotes. It might take a while but I will try to make for each module in the SAPP paper a galaxy tool shed module.
  • From fastaptamer:

    • fastaptamer_cluster: Cluster closely-related sequences using Levenshtein edit distance. FASTAptamer-Cluster uses the Levenshtein algorithm to cluster together closely-related sequences based on a user-defined edit distance (the minimum number of insertions, deletions, or subsitutions required to transform one string into another).
  • fastaptamer_count: Count, rank, sort and normalize sequence reads in a selection population. FASTAptamer-Count serves as the gateway to the FASTAptamer suite of bioinformatics tools for combinatorial selections (aptamers, (deoxy)ribozymes, phage display, direct mutagenesis, etc.). For a given FASTQ input file, FASTAptamer-Count will determine the number of times each sequence was read, normalize sequence frequency to reads per million, and rank and sort sequences by decreasing total reads.
  • fastaptamer_compare: Compare sequence distribution between two populations. FASTAptamer-Compare facilitates statistical analysis of two populations by rapidly generating a tab-delimited output file that lists each unique sequence along with RPM (reads per million) in each population file (if available) and log(2) of the ratio of their RPM values in each population. RPM data for both populations can be utilized to generate an XY-scatter plot of sequence distribution across two populations. FASTAptamer-Compare also facilitates the generation of a histogram of the sequence distribution by creating 102 bins for the log(2) values. This histogram can provide a quick visual comparison of the two populations: distributions centered around 0 indicate similar populations, while distributions shifted to the left or right indicate overall enrichment or depletion.
  • fastaptamer_search: Degenerate nucleotide motif searching. FASTAptamer-Search searches for degenerate nucleotide motifs within a FASTA file.
  • fastaptamer_enrich: Calculate fold-enrichment of each sequence across populations. FASTAptamer-Enrich rapidly calculates "fold-enrichment" values for each sequence across two or three input files. Output is provided as a tab-delimited file and is formatted to include sequence composition, length, rank, reads, reads per million (RPM), cluster information (if available) and enrichment values for each sequence.
  • From bgruening:

    • diamond: DIAMOND is a new high-throughput program for aligning a file of short reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity

    Repository-Maintainer: Bjoern Gruening
    Repository-Development: https://github.com/bgruening/galaxytools

  • text_processing: High performance text processing tools using the GNU coreutils, sed, awk and friends. That repository contains all kind of different text processing tools.
  • awk - The AWK programmning language ( http://www.gnu.org/software/gawk/ )
  • sed - Stream Editor ( http://sed.sf.net )
  • grep - Search files ( http://www.gnu.org/software/grep/ )
  • sort_columns - Sorting every line according to there columns
  • GNU Coreutils programs ( http://www.gnu.org/software/coreutils/ ):

    • sort - sort files
    • join - join two files, based on common key field.
    • cut - keep/discard fields from a file
    • unsorted_uniq - keep unique/duplicated lines in a file
    • sorted_uniq - keep unique/duplicated lines in a file
    • head - keep the first X lines in a file.
    • tail - keep the last X lines in a file.

    Originally known as "Unix Tools" and developed from Assaf Gordon @ Greg Hannon's lab ( http://hannonlab.cshl.edu ) in Cold Spring Harbor Laboratory, it is now hosted under https://github.com/bgruening/galaxytools/tree/master/unix_tools and open for contributions. It will also replace several smaller sed, sort and uniq wrappers, developed over the time.
    Repository-Maintainer: Bjoern Gruening
    Repository-Development: https://github.com/bgruening/galaxytools

  • data_manager_diamond_database_builder: Diamond data manager
  • find_genes_located_nearby_workflow: Galaxy workflow for the identification of candidate genes clusters This approach screens two proteins against all nucleotide sequence from the NCBI nt database within hours on our cluster, leading to all organisms with an interesting gene structure for further investigation. As usual in Galaxy workflows every parameter, including the proximity distance, can be changed and additional steps can be easily added. For example additional filtering to refine the initial BLAST hits, or inclusion of a third query sequence.
  • find_three_genes_located_nearby_workflow: Galaxy workflow for the identification of candidate genes clusters with three known genes This approach screens three proteins against a given genome sequence, leading to a genome position were all three genes are located nearby. As usual in Galaxy workflows every parameter, including the proximity distance, can be changed and additional steps can be easily added. For example additional filtering to refine the initial BLAST hits, or inclusion of a third query sequence.
  • find_subsequences: Searches for a subsequence in a larger sequence. For example to get all restriction enzymes for BamH1. Searches for a subsequence in a larger sequence. For example to get all restriction enzymes for BamH1.

    This tool is based on biopython: 10.1093/bioinformatics/btp163
    Repository-Maintainer: Bjoern Gruening
    Repository-Development: https://github.com/bgruening/galaxytools

  • From iuc:

    • macs2: MACS - Model-based Analysis of ChIP-Seq. With the improvement of sequencing techniques, chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) is getting popular to study genome-wide protein-DNA interactions. To address the lack of powerful ChIP-Seq analysis method, we present a novel algorithm, named Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. MACS can be easily used for ChIP-Seq data alone, or with control sample with the increase of specificity.

    Repository-Maintainer: Bjoern Gruening
    Repository-Development: https://github.com/iuc/galaxytools

  • seqtk: toolkit for processing FASTA and FASTQ files. Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.

    https://github.com/lh3/seqtk/
    Repository-Maintainer: Helena Rasche
    Repository-Development: https://github.com/galaxy-iuc/tool_shed

  • From hogart:

    • unafold: Galaxy Tool wrapper for UNAFold This is the Galaxy wrapper for the UNAFold (http://mfold.rna.albany.edu/?q=DINAMelt/software). UNAFold software was developed for nucleic acid folding and hybridization prediction (doi: 10.1007/978-1-60327-429-6_1, doi: 10.1093/nar/gki591)

    Note: the UNAFold requires a big amount of RAM - e.g. the folding of 43 kb RNA eats near 30 GB of memory. So, after the installation of this wrapper you will need to modify the job_conf.xml of your Galaxy instance properly. Also, please be sure that zip datatype is registered as binary datatype in your Galaxy instance.

  • From steffen:

    • covenntree: CoVennTree compares up to three rooted trees at the same time. CoVennTree (Comparative weighted Venn Tree) is a software to analyze and compare up to three datasets. Unlike other methods, CoVennTree correlates data on the leaf level and transfers this information to the root node. CoVennTree works with numbers to compute weighted Venn diagrams for each node in the graph (rooted tree). Therefore any kind of input data can be processed as long as the data structure will be taken into account.

    http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00043/abstract

  • From ngsplot:

    • ngsplot: ngs.plot is a program that allows you to easily visualize your next-generation sequencing (NGS) samples at functional genomic regions. This galaxy implementation of ngs.plot has been tested to work with ngs.plot v2.47.1. For instructions on the system installation of ngs.plot, please see https://github.com/shenlab-sinai/ngsplot.
  • From devteam:

    • vcfhethom: Count the number of heterozygotes and alleles, compute het/hom ratio. This tool perfoms three basic calculations:
    1. Computes the number of heterozygotes
    2. Computes the ratio between heterozygotes and homozygotes 1. Computes the total number of alleles in the input dataset
    • vcfselectsamples: Select samples from a VCF file Allows selecting samples from a VCF dataset. This tool combines vcfkeepsamples and vcfremovesamples from VCFlib package into a single utility.
    • vcfleftalign: Left-align indels and complex variants in VCF dataset Left-aligns variants in VCF dataset. Window size is determined dynamically according to the entropy of the regions flanking the indel. These must have entropy > 1 bit/bp, or be shorter than ~5kb.
    • vcfannotategenotypes: Annotate genotypes in a VCF dataset using genotypes from another VCF dataset. Annotates genotypes in the first file with genotypes in the second adding the genotype as another flag to each sample filed in the first file. Annotation-tag is the name of the sample flag which is added to store the annotation. also adds a 'has_variant' flag for sites where the second file has a variant.
    • vcfbreakcreatemulti: Break multiple alleles into multiple records, or combine overallpoing alleles into a single record This tool breaks or creates multiallelic VCF records based on user selection.

      • Breaking = If multiple alleles are specified in a single record, break the record into multiple lines, preserving allele-specific INFO fields.
      • Creation = If overlapping alleles are represented across multiple records, merge them into a single record.
    • vcfgenotypes: Convert numerical representation of genotypes to allelic. Converts numerical representation of genotypes (standard in GT field) to the alleles provided in the call's ALT/REF fields.
    • vcfaddinfo: Adds info fields from the second dataset which are not present in the first dataset. Adds info fields from the second dataset which are not present in the first dataset.
    • vcfsort: Sort VCF dataset by coordinate This tools uses native UNIX sort command to order VCF dataset in coordinate order.
    • vcfbedintersect: Intersect VCF and BED datasets Computes intersection between a VCF dataset and a set of genomic intervals defined as either a BED dataset (http://genome.ucsc.edu/FAQ/FAQformat.html#format1) or a manually typed interval (in the form of chr:start-end).
    • vcf2tsv: Converts VCF files into tab-delimited format Converts stdin or given VCF file to tab-delimited format, using null string to replace empty values in the table. Specifying -g will output one line per sample with genotype information. A part of the vcflib utilities developed by Erik Garrison (https://github.com/ekg/vcflib).
    • vcfcheck: Verify that the reference allele matches the reference genome Verifies that the VCF REF field matches the reference as described
    • vcffixup: Count the allele frequencies across alleles present in each record in the VCF file. Uses genotypes from the VCF file to correct AC (alternate allele count), AF (alternate allele frequency), NS (number of called), in the VCF records.
    • data_manager_bwa_mem_index_builder: Data Manager for building BWA (0.6+) indexes Data Manager for building BWA (0.6+) indexes.
    • vcfrandomsample: Randomly sample sites from VCF dataset Randomly sample sites from an input VCF file. Scale the sampling probability by the field specified by --scale-by (see advanced controls). This may be used to provide uniform sampling across allele frequencies, for instance (AF field in this case).
    • vcfgeno2haplo: Convert genotype-based phased alleles into haplotype alleles Convert genotype-based phased alleles within a window size specified by -w option into haplotype alleles. Will break haplotype construction when encountering non-phased genotypes on input.
    • data_manager_fetch_genome_dbkeys_all_fasta: Allows optionally defining a new DBKEY and retrieves a FASTA file and populate the all_fasta.loc data table.
    • samtools_bedcov: Calculate read depth on BAM files This tool uses the SAMTools toolkit to produce read depth per BED region.
    • vcfcommonsamples: Output records belonging to samples commong between two datasets. Outputs each record in the first file, removing samples not present in the second.
    • vcfvcfintersect: Intersect two VCF datasets Computes intersections and unions for two VCF datasets. Unifies equivalent alleles within window-size bp.
    • vcfcombine: Combine multiple VCF datasets Combines VCF files positionally, combining samples when sites and alleles are identical. Any number of VCF files may be combined. The INFO field and other columns are taken from one of the files which are combined when records in multiple files match. Alleles must have identical ordering to be combined into one record. If they do not, multiple records will be emitted.
    • vcfdistance: Calculate distance to the nearest variant. Adds a value to each VCF record indicating the distance to the nearest variant in the file. The dataset used as input to this tool must be coordinate sorted. This can be achieved by either using the VCFsort utility or Galaxy's general purpose sort tool (in this case sort on the first and the second column in ascending order).
    • vcfflatten: Removes multi-allelic sites by picking the most common alternate Removes multi-allelic sites by picking the most common alternate. Requires allele frequency specification 'AF' and use of 'G' and 'A' to specify the fields which vary according to the Allele or Genotype.
    • vcfprimers: Extract flanking sequences for each VCF record For each VCF record, extract the flanking sequences, and write them to stdout as FASTA records suitable for alignment. This tool is intended for use in designing validation experiments. Primers extracted which would flank all of the alleles at multi-allelic sites.
    • vcffilter: Tool for filtering VCF files A vcflib-based tool for flexible filtering of VCF datasets on a variety of tags. This is a galaxy wrapper for vcffilter utility from vcflib package.
    • vcfallelicprimitives: Splits alleleic primitives (gaps or mismatches) into multiple VCF lines If multiple alleleic primitives (gaps or mismatches) are specified in a single VCF record, this tools splits the record into multiple lines, but drops all INFO fields. "Pure" MNPs are split into multiple SNPs unless the -m flag is provided. Genotypes are phased where complex alleles have been decomposed, provided genotypes in the input.
    • vcfannotate: Intersect VCF records with BED annotations Intersects the records in the VCF file with targets provided in a BED file. Intersections are done on the reference sequences in the VCF file.
  • From damion:

    • blast_reporting: Provides filtered, sorted HTML and tabular reports of Blast XML format search results NCBI BLAST+ searches can output in a range of formats, but in the past only the XML format included fields like sequence description. This tool converts the NCBI BLAST XML report into 12, 24, 26 or custom column tabular and HTML reports. It is used as a command-line tool or via its Galaxy tool.

    The tool allows almost complete control over which fields are displayed and filtered, how columns are named, and how the HTML report on each query is sectioned. Search result records can be filtered out based on values in numeric or textual fields. Matches (by accession id) to a selection of reference databases can be shown, and this can include a description of the matched sequence.

  • ffp_phylogeny: calculating Feature Frequency Profiles (FFP) from fasta sequence and text data. FFP (Feature frequency profile) is an alignment free comparison tool for phylogenetic analysis and text comparison. It can be applied to nucleotide sequences, complete genomes, proteomes and even used for text comparison. This tool calculates FFP on one or more fasta sequence or text datasets. It prepares a mini pipeline consisting of [ffpry | ffpaa | ffptxt] > [ ffpfilt | ffpcol > ffprwn] > ffpjsd > ffptree

    The original command line ffp-phylogeny code is at http://ffp-phylogeny.sourceforge.net/ . This tool uses Aaron Petkau's modified version: https://github.com/apetkau/ffp-3.19-custom.

  • From yokofakun:

  • From kosrou:

    • ngs_plot: ngs plot Novel tool to visualise next generation sequencing data around TSS, gene bodies etc
  • From dereeper:

    • admixture: admixture: fast ancestry estimation ADMIXTURE is a software tool for maximum likelihood estimation of individual ancestries from multilocus SNP genotype datasets. It uses the same statistical model as STRUCTURE but calculates estimates much more rapidly using a fast numerical optimization algorithm.
    • snpeff_from_gff_vcf: snpeff v4.0 from VCF, fasta reference and GFF files snpeff v4.0 from VCF, fasta reference and GFF files
    • sniplay: SNiPlay3: a package for exploration and large scale analyses of SNP polymorphisms (filtering, density, vcftools, diversity, linkagedisequilibrium, GWAS) SNiPlay3: a package for exploration and large scale analyses of SNP polymorphisms (filtering, density, vcftools, diversity, linkagedisequilibrium, GWAS)
    • tassel5: Software to evaluate traits associations, evolutionary patterns, and linkage disequilibrium. Software to evaluate traits associations, evolutionary patterns, and linkage disequilibrium.
  • From okorol:

    • itsx: ITSx -- Identifies ITS sequences and extracts the ITS region ITSx is an open source software utility to extract the highly variable ITS1 and ITS2 subregions from ITS sequences, which is commonly used as a molecular barcode for e.g. fungi.
  • From iracooke:

    • protk_proteogenomics: Docker support and update for protk 1.4 Tools for mapping peptides and proteins to genomic coordinates
  • From wolma:

    • mimodd_workflows: Some example workflows for use with MiModD The workflows defined here let you automate much of the tutorial analyses from the MiModD documentation (see http://mimodd.readthedocs.org/en/latest/tutorial.html). These example workflows should be easy to customize for your own needs.
    • mimodd: MiModD - Identify Mutations from Whole-Genome Sequencing Data installs the MiModD suite of tools for the analysis of genome-wide sequencing data from model organisms along with their Galaxy tool wrappers.

Tool Dependency Definitions

  • From dereeper:

  • From iuc:

    • package_cofold_0_0_1: Contains a tool dependency definition that downloads and compiles CoFold. A tool for prediction of RNA secondary structure that takes co-transcriptional folding into account.

    http://www.e-rna.org/cofold/
    Repository-Maintainer: Bjoern Gruening
    Repository-Development: https://github.com/bgruening/galaxytools/

  • package_samtools_1_2: Contains a tool dependency definition that downloads and installs version 1.2 of the SAMTools package. samtools \u2212 Utilities for the Sequence Alignment/Map (SAM) format

    Samtools is a set of utilities that manipulate alignments in the BAM format. It imports from and exports to the SAM (Sequence Alignment/Map) format, does sorting, merging and indexing, and allows to retrieve reads in any regions swiftly.

    Samtools is designed to work on a stream. It regards an input file \u2018-\u2019 as the standard input (stdin) and an output file \u2018-\u2019 as the standard output (stdout). Several commands can thus be combined with Unix pipes. Samtools always output warning and error messages to the standard error output (stderr).

    Samtools is also able to open a BAM (not SAM) file on a remote FTP or HTTP server if the BAM file name starts with \u2018ftp://\u2019 or \u2018http://\u2019. Samtools checks the current working directory for the index file and will download the index upon absence. Samtools does not retrieve the entire alignment file unless it is asked to do so.

    Repository-Maintainer: Bjoern Gruening

  • package_gengetopt_2_22_6: Contains a tool dependency definition that downloads and compiles version 2.22.6 of GNU gengetopt Gengetopt is a tool to write command line option parsing code for C programs.

    http://www.gnu.org/software/gengetopt/gengetopt.html
    Repository-Maintainer: Bjoern Gruening
    Repository-Development: https://github.com/galaxyproject/tools-iuc

  • package_rnastructure_5_7: Contains a tool dependency definition that downloads and compiles version 5.7 of RNAstructure RNAstructure is a complete package for RNA and DNA secondary structure prediction and analysis. It includes algorithms for secondary structure prediction, including facility to predict base pairing probabilities. It also can be used to predict bimolecular structures and can predict the equilibrium binding affinity of an oligonucleotide to a structured RNA target. This is useful for siRNA design. It can also predict secondary structures common to two, unaligned sequences, which is much more accurate than single sequence secondary structure prediction. Finally, RNAstructure can take a number of different types of experiment mapping data to constrain or restrain structure prediction. These include chemical mapping, enzymatic mapping, NMR, and SHAPE data.

    http://rna.urmc.rochester.edu/RNAstructure.html

  • package_vcflib_8a5602bf07: Compiled vcflib binaries for x86_64 Binary files in this package are compiled from source code with SHA: 8a5602bf07.

    This is package dependency for tools relying on VCFlib toolkit developed by Erik Garrison (https://github.com/ekg/vcflib). This package is distributed as x86_64 binaries only as it is difficult to compile on other platforms. These binaries should work on any of the supported linux platforms other than RHEL/CentOS 5.

  • package_vienna_rna_2_1: Contains a tool dependency definition that downloads and compiles version 2.1 of the Vienna RNA package. The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.

    http://www.tbi.univie.ac.at/RNA/
    Repository-Maintainer: Bjoern Gruening
    Repository-Development: https://github.com/bgruening/galaxytools

  • package_diamond_0_6_13: Contains a tool dependency definition that downloads and compiles version 0.6.13 of DIAMOND DIAMOND is a new high-throughput program for aligning a file of short reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity

    Repository-Maintainer: Bjoern Gruening
    Repository-Development: https://github.com/bgruening/galaxytools

  • package_stringtie_1_0_1: tool dependency definition. Contains a tool dependency definition that downloads and installs version 1.0.1 of the stringtie RNA-seq assembler. StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.To identify differentially expressed genes between experiments, StringTie's output can be processed either by the Cuffdiff or Ballgown programs.
  • From biomonika:

  • From devteam:

    • package_freebayes_0_9_20_b040236: Contains a tool dependency definition that downloads and compiles version 0.9.20 of FreeBayes. Program: freebayes (Bayesian haplotype-based polymorphism discovery and genotyping.)

    Version: 0.9.20 (b040236)

  • From jjohnson:

    Requires perl compiled to use threads, bioperl, and perl modules: PerlIO::gzip and Bio::DB::Sam

Select Updates

Tools