- Quick help: Manupulating NGS data with Galaxy: Getting Data In
- Full tutorial: Uploading data into Galaxy
- Dataset Collections, including creation during Upload: Processing many samples at once with collections
- FTP/FTPS tutorial: FTP Upload
Most tutorials from the Galaxy Training Network (GTN) include one or more data upload steps.
- Start with this tutorial, section "Getting Data In", for a quick overview: NGS Logistics
- Advanced Upload methods are covered in the tutorial topic group Data Manipulation
- All Tutorials: https://training.galaxyproject.org/
Data is loaded using the tools in the Get Data tool group. Some access specific data provider sites that will load data back into your Galaxy history. To directly load your own local data or data from another source, use the tool Get Data → Upload File (also accessible from the top of the left tool panel, as seen in the graphics below). Want to practice import/export functions with small sample data? Import the Upload sample data history here.
- Each file loaded creates one dataset in the history.
- The maximum size limit is 50G (uncompressed).
- Most individual file compression formats are supported, but multi-file archives are not (
- If a
.tararchive contains multiple datasets, only the first dataset inside the archive will upload.
- If your compressed data does not load correctly, try loading an uncompressed version.
- How to format fastq data for tools that require .fastqsanger format?
- Understanding compressed fastq data (fastq.gz)
- More help for FTP can be found on Galaxy Help. Search with the keyword "ftp". Example post: See here
Upload tool location
Upload tool option to move FTP datasets into a History
If you DO NOT see any files, load data using FTP first, then come back to the Upload tool.
Make sure that the FTP transfer is completed before moving files into a History.
If you already moved a partial or truncated dataset into a History, you will need to FTP upload the data again.
- Data quota is at limit, so no new data can be loaded. Disk usage and quotas are reported at User → Preferences when logged in.
- Password protected data will require a special URL format. Ask the data source. Double check that it is publicly accessible.
- Use FTP or FTPS, not SFTP. Check with local admin if not sure.
- No HTML content. The loading error generated may state this. Remove HTML fields from your dataset before loading into Galaxy or omit HTML fields from the query if importing from a data source (such as Biomart).
- Compression types .gz/.gzip, .bz/.bzip, .bz2/.bzip2, and single-file .tar and .zip are (usually) supported -- but if your tar/zip data does not load -- download the data locally, unpack the archive, and upload the data directly.
- Only the first file in any compressed archive will load as a dataset.
- Data must be < 50G (uncompressed) to be successfully uploaded and added as a dataset to a history, from any source.
- Is the problem the dataset format or the assigned datatype? Can this be corrected by editing the datatype or converting formats? See Learn/Managing Datasets for help or watch the screencast above for a how-to example.
- Problems in the first step working with your loaded data? It may not have uploaded completely. If you used an FTP client, the transfer message will indicate if a load was successful or not and can often restart interrupted loads. This makes FTP a great choice for slower connections, even when loading small files.