Datatype Compressed Fastq
- Fastq Manipulation and Quality Control
- How to format fastq data for tools that require .fastqsanger format?
- Format help for Tabular/BED/Interval Datasets
- Common datatypes explained
- Input datatype misassignment and errors
A compressed version of a fastq dataset.
How compressed fastq data loads
Uploaded gz compressed FASTQ data loads in compressed format directly into the History. Tools accept compressed formatted datasets as input.
Why bother? Compressed data saves space in your account. This is a priority for many that have larger sized data/experiments to analyze. As before, some tools accept fastq datatypes (example: prep/QA steps/tools) and others accept fastqsanger datatypes (example: mapping and downstream analysis steps/tools). See the tool form to know which is expected. When in doubt, use fastqsanger.
Using compressed data as tool inputs
- If the tool accepts fastq input, then gz compressed data assigned the datatype
- If the tool accepts fastqsanger input, then gz compressed data assigned the datatype
- Using uncompress fastq data is still an option with tools. The choice is yours.
TIP Avoid labeling compressed data with an uncompressed datatype, and the reverse. Jobs using mismatched datatype versus actual format will fail with an error.
- Example - What tool errors can look like when there is a datatype assignment problem: https://github.com/galaxyproject/galaxy/issues/3511
fastq.gz datasets relate to the
.fastqsanger datatype metadata assignment?
fastqsanger.gz, be sure to confirm the format.
TIP Using non-fastqsanger scaled quality values will cause scientific problems with tools that expected fastqsanger formatted input. Even if the tool does not fail. Get the format right from the start to avoid problems. Incorrect format is still one of the most common reasons for tool errors or unexpected results (within Galaxy or not).
Best practises for loading fastq data into Galaxy
- If you are certain that the data is in
fastqsangerformat, name the file with the extension
.fastqsanger.gzbefore loading to have the metadata datatype
fastqsangerauto-detected. This saves time and is a smart choice when loading many datasets at once.
- If the data is close to or over 2 GB in size, be sure to use FTP
- If the data was already loaded as
fastq.gz, don't worry! Just test the data for correct format (as needed) and assign the metadata type as explained above. This is currently a one-dataset-at-a-time edit, but future plans include making these assignments a batch operation.