Publication: Developing Classifiers for tRNA Nanopore Sequencing: from DNA Barcodes to Amino Acid Identities
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Nanopore sequencing expands the new frontier of sequencing with its single-molecule resolution and ability to handle diverse types of biomolecules without compromising molecular information, such as DNA/RNA modifications. However, it has primarily been developed and validated for use with long DNA/RNA thus far, and only within the past year has directly nanopore sequencing tRNAs become experimentally feasible with sufficiently high yield and read quality. Given how recently this progress has been made, there is currently a lack of computational tools to handle tRNA nanopore sequencing data.
A multiplexing strategy allows simultaneous sequencing of multiple barcoded experimental groups, with retrieval of barcode labels using a computational demultiplexing tool. In this thesis, we develop a demultiplexing tool capable of handling both barcoded tRNAs and long RNAs. This tool achieves an AUROC of 0.99 with tRNAs and 0.95 with long RNAs in a four-barcode classification task. It is capable of classifying 78.9% of tRNAs with 99.6% accuracy, and 70.2% of long RNAs with 95.5% accuracy. The release of this demultiplexing tool would be the first open-source demultiplexing tool equipped for direct tRNA nanopore sequencing. We illustrate its usage with the incorporation of our demultiplexing tool into a streamlined pipeline for identifying modifications by comparing experimental groups.
Finally, we demonstrate a proof-of-concept for direct nanopore sequencing of aminoacylated tRNAs. We optimize the reaction conditions and choice of heterobifunctional linker in a bioconjuation strategy to capture and prepare aminoacylated tRNAs for direct nanopore sequencing. We then train a support vector machine for binary classification of a dataset containing the same type of tRNA aminoacylated with two types of amino acids. This model achieves an AUROC of 0.72 (p 0.05). This model verifies that nanopore sequencing can capture statistically significant signal surrounding the amino acid identity of an aminoacylated tRNA. This thesis is, to our knowledge, the first demonstration that direct nanopore sequencing can be used to analyze aminoacylated tRNAs.