Global framework for SARS-CoV-2 data analysis: Application to intrahost variation
Parts 1 and 2
By Wolfgang Maier, Marius Van Den Beek, Björn Grüning, Sergei Kosakovsky Pond, Anton Nekrutenko, and the Galaxy Team in US, EU, and AU
February 22nd 2021
We describe a fully open end-to-end analytic framework for standardized reproducible high-throughput analysis of these data on public computing infrastructure. Using high quality datasets from two studies, we describe patterns of variation detectable in SARS-COV-2 intrahost data and analyze them in the context of N501Y lineages and sites under selection. In particular, we identify a subset of variants present in the N501Y lineages that were detectable at low frequencies in individual hosts prior to the emergence of these lineages. Our results suggest that intrahost dynamics, which did not receive significant attention during this pandemic, should be an integral part of any serious pathogen surveillance effort.