This post is split into Part 1 and Part 2. From the introduction:
We describe a fully open end-to-end analytic framework for standardized reproducible high-throughput analysis of these data on public computing infrastructure. Using high quality datasets from two studies, we describe patterns of variation detectable in SARS-COV-2 intrahost data and analyze them in the context of N501Y lineages and sites under selection. In particular, we identify a subset of variants present in the N501Y lineages that were detectable at low frequencies in individual hosts prior to the emergence of these lineages. Our results suggest that intrahost dynamics, which did not receive significant attention during this pandemic, should be an integral part of any serious pathogen surveillance effort.