Our paper "Empowering the annotation and discovery of structured RNAs with scalable and accessible integrative clustering" is now available as preprint on bioRXiv. Congratulation to Milad and co-authors!
RNA plays essential regulatory roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available. Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 provides an integrative solution by incorporating diverse types of experimental and genomic data in an accessible fashion via the Galaxy framework. We demonstrate that the tasks of clustering and annotation of structured RNAs can be considerably improved, through a scalable methodology that also supports structure probing data. Based on this, we further introduce an off-the-shelf procedure to identify locally conserved structure candidates in long RNAs. In this way, we suggest the presence and the sparsity of phylogenetically conserved local structures in some long non-coding RNAs. Furthermore, we demonstrate the advantage of a scalable clustering for discovering structured motifs under inherent and experimental biases and uncover prominent targets of the double-stranded RNA binding protein Roquin-1 that are evolutionary conserved.