Downloading data from NCBI SRA

Once upon a time I saw that paper …

… and the paper included sequencing data (for example, GSE244303). Suppose you want to download these data. The general strategy for doing this in Galaxy has traditionally been:

Download the list of accessions from NCBI
Upload that list into Galaxy
Use Galaxy’s fasterq-dump to retrieve the data

… but there is a better way

fasterq-dump is a robust tool, but sometimes one accession fails—and if one fails, the entire job fails. To address this, we developed a simple workflow that splits your list of accessions into separate fasterq-dump jobs, runs them independently, and then produces two output collections: one for paired-end data and one for single-end data.

Intrigued?

Watch this video to see how this works:

The bottom line

Use parallel-download workflow to reliably pull down tens to hundreds of datasets from NCBI SRA.

❗ This approach works reliably for up to ~100 SRA datasets. For downloading thousands of accessions, you will need a slightly different strategy—coming soon in a follow-up blog post and video.