← Back to news

Downloading data from NCBI SRA

How to download sequencing data from NCBI SRA?

Once upon a time I saw that paper ...

… and the paper included sequencing data (for example, GSE244303). Suppose you want to download these data. The general strategy for doing this in Galaxy has traditionally been:

  1. Download the list of accessions from NCBI
  2. Upload that list into Galaxy
  3. Use Galaxy’s fasterq-dump to retrieve the data

... but there is a better way

fasterq-dump is a robust tool, but sometimes one accession fails—and if one fails, the entire job fails. To address this, we developed a simple workflow that splits your list of accessions into separate fasterq-dump jobs, runs them independently, and then produces two output collections: one for paired-end data and one for single-end data.

Intrigued?

Watch this video to see how this works:

The bottom line

Use parallel-download workflow to reliably pull down tens to hundreds of datasets from NCBI SRA.