Publication:

Developing Classifiers for tRNA Nanopore Sequencing: from DNA Barcodes to Amino Acid Identities

Loading...
Thumbnail Image

Date

2025-03-17

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Lu, Michelle. 2024. Developing Classifiers for tRNA Nanopore Sequencing: from DNA Barcodes to Amino Acid Identities. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

Nanopore sequencing expands the new frontier of sequencing with its single-molecule resolution and ability to handle diverse types of biomolecules without compromising molecular information, such as DNA/RNA modifications. However, it has primarily been developed and validated for use with long DNA/RNA thus far, and only within the past year has directly nanopore sequencing tRNAs become experimentally feasible with sufficiently high yield and read quality. Given how recently this progress has been made, there is currently a lack of computational tools to handle tRNA nanopore sequencing data.

A multiplexing strategy allows simultaneous sequencing of multiple barcoded experimental groups, with retrieval of barcode labels using a computational demultiplexing tool. In this thesis, we develop a demultiplexing tool capable of handling both barcoded tRNAs and long RNAs. This tool achieves an AUROC of 0.99 with tRNAs and 0.95 with long RNAs in a four-barcode classification task. It is capable of classifying 78.9% of tRNAs with 99.6% accuracy, and 70.2% of long RNAs with 95.5% accuracy. The release of this demultiplexing tool would be the first open-source demultiplexing tool equipped for direct tRNA nanopore sequencing. We illustrate its usage with the incorporation of our demultiplexing tool into a streamlined pipeline for identifying modifications by comparing experimental groups.

Finally, we demonstrate a proof-of-concept for direct nanopore sequencing of aminoacylated tRNAs. We optimize the reaction conditions and choice of heterobifunctional linker in a bioconjuation strategy to capture and prepare aminoacylated tRNAs for direct nanopore sequencing. We then train a support vector machine for binary classification of a dataset containing the same type of tRNA aminoacylated with two types of amino acids. This model achieves an AUROC of 0.72 (p 0.05). This model verifies that nanopore sequencing can capture statistically significant signal surrounding the amino acid identity of an aminoacylated tRNA. This thesis is, to our knowledge, the first demonstration that direct nanopore sequencing can be used to analyze aminoacylated tRNAs.

Description

Other Available Sources

Research Data

Keywords

Bioinformatics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories