Publication:
Development of Bioinformatics Solutions for Single-cell Transcriptomics and Novel Codec System for Robust DNA-based Data Storage

No Thumbnail Available

Date

2020-11-23

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Zhou, Guangyu. 2020. Development of Bioinformatics Solutions for Single-cell Transcriptomics and Novel Codec System for Robust DNA-based Data Storage. Doctoral dissertation, Harvard University Graduate School of Arts and Sciences.

Research Data

Abstract

With the unprecedented wealth of biological and biomedical data generated, bioinformatics has shaped our understandings in life science through various applications: from the de novo assembly to pathway network analysis of complex diseases, from the pan-genome construction to identification of both DNA & RNA variations at single-cell resolution. In this thesis, I developed two bioinformatics solutions: applying the current state-of-the-art methods in single-cell transcriptomics to provide new insights in gastric cancer biology and developing a novel codec system for robust DNA-based data storage. Chapter 1 introduces the key topics on bioinformatics, with an emphasis on the implications and current status of the two solutions. Chapter 2 profiled the transcriptomes of 43,101 cells from nine patients with gastric cancer. We found CTHRC1 specifically expressed in tumor endothelial cells, associating with poor prognosis in the cohort data, may serve as a potential target of gastric cancer treatment. Meanwhile, a highly active myofibroblast with feature expression of PDGFRB, was also identified, associating with epithelial-mesenchymal transition (EMT) pathway and involving in tumor progression. This single-cell atlas contains cellular and molecular data of gastric cancer progression and will serve as a valuable resource for the discovery of new targeted therapies for gastric cancer. Chapter 3 proposes a robust DNA-based data storage method based on a new codec algorithm, namely ‘Yin-Yang’. Using this strategy, we successfully stored different file formats in a single synthetic DNA oligonucleotide pool, with enhanced robustness in transcoding of different data structure and practical feasibility. Through successful retrieval of 3 files totaling 2.02 Megabits after sequencing and decoding, our strategy exhibits great qualities of achieving high storing capacity per nucleotide and high fidelity of data recovery. Chapter 4 concludes with remarks on the future prospects and presents a vision of what can be achieved as informatics and data science converge to enable further adoption of cellular level anticancer targets and industrial grade DNA-based storage.

Description

Other Available Sources

Keywords

Bioinformatics, DNA Storage, Single-cell Transcriptomics, Molecular biology, Bioinformatics

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Referenced By

Related Stories