# Improved Analysis of Nanopore Sequence Data and Scanning Nanopore Techniques

 Title: Improved Analysis of Nanopore Sequence Data and Scanning Nanopore Techniques Author: Szalay, Tamas Citation: Szalay, Tamas. 2016. Improved Analysis of Nanopore Sequence Data and Scanning Nanopore Techniques. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences. Full Text & Related Files: SZALAY-DISSERTATION-2016.pdf (14.37Mb; PDF) Abstract: The field of nanopore research has been driven by the need to inexpensively and rapidly sequence DNA. In order to help realize this goal, this thesis describes the PoreSeq algorithm that identifies and corrects errors in real-world nanopore sequencing data and improves the accuracy of \textit{de novo} genome assembly with increasing coverage depth. The approach relies on modeling the possible sources of uncertainty that occur as DNA advances through the nanopore and then using this model to find the sequence that best explains multiple reads of the same region of DNA. PoreSeq increases nanopore sequencing read accuracy of M13 bacteriophage DNA from 85\% to 99\% at 100X coverage. We also use the algorithm to assemble \textit{E. coli} with 30X coverage and the $\lambda$ genome at a range of coverages from 3X to 50X. Additionally, we classify sequence variants at an order of magnitude lower coverage than is possible with existing methods. This thesis also reports preliminary progress towards controlling the motion of DNA using two nanopores instead of one. The speed at which the DNA travels through the nanopore needs to be carefully controlled to facilitate the detection of individual bases. A second nanopore in close proximity to the first could be used to slow or stop the motion of the DNA in order to enable a more accurate readout. The fabrication process for a new pyramidal nanopore geometry was developed in order to facilitate the positioning of the nanopores. This thesis demonstrates that two of them can be placed close enough to interact with a single molecule of DNA, which is a prerequisite for being able to use the driving force of the pores to exert fine control over the motion of the DNA. Another strategy for reading the DNA is to trap it completely with one pore and to move the second nanopore instead. To that end, this thesis also shows that a single strand of immobilized DNA can be captured in a scanning nanopore and examined for a full hour, with data from many scans at many different voltages obtained in order to detect a bound protein placed partway along the molecule. Terms of Use: This article is made available under the terms and conditions applicable to Other Posted Material, as set forth at http://nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA Citable link to this page: http://nrs.harvard.edu/urn-3:HUL.InstRepos:33493548 Downloads of this work: