Publication:

Surveying Mount Improbable: Computational Challenges Facing Evolution

Loading...
Thumbnail Image

Date

2019-09-10

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

sinai, sam. 2019. Surveying Mount Improbable: Computational Challenges Facing Evolution. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Abstract

Since its origins on earth, life has overcome challenges that appear to be improbable. For life to have emerged, sets of molecules needed to cooperate and maintain it to build evolvable populations. There are cooperators face significant obstacles in early life. In the first two chapters of this thesis, we propose and analyze mechanisms that may have reduced the probabilistic burden of building evolvable systems. We present theoretical work on the role of population structure in early evolution. In chapter 1, we suggest that merging compartments can improve the odds of finding minimal evolvable cells. In chapter 2, we recognize a mechanism for elongation of sequences within compartments that results in surprising asymmetric selection and cooperative dynamics. In the second half of this thesis, we study the structure of fitness landscapes. A fitness landscape is a map between biological sequences and their functionality (“fitness”) in a particular context. The structure of the fitness landscape determines how hard it would be for evolution to find or optimize particular functionalities. Understanding this structure can help us predict evolutionary outcomes, and can be exploited for engineering proteins. Due to their size and complexity, building accurate models of fitness landscape present a major challenge. In chapter 3, we show that Variational Auto-Encoders can learn the effects of mutations on sequences (the ``local landscape") by training on the evolutionary sequence record. In chapter 4, we expand the scope of our investigation of fitness landscapes by experimentally assaying nearly thousands of variants of Adeno-associated virus capsid proteins. We use this data to train tens of machine learning models and subsequently assay nearly half a million mutants of AAV2. By experimentally testing these designed sequences, we demonstrate that our approach can efficiently discover thousands of functional variants and propose state-of-the-art machine learning approaches and propose best practices. In chapter 5, we show that with the aid of computational models, fitness landscapes can be explored through experimental batches in a manner that is more efficient than evolution. We develop new exploration strategies that complement models to increase the efficiency of search on fitness landscapes. The results in this thesis are relevant to evolutionary biologists, machine learning scientists, protein engineers, and gene therapy researchers.

Description

Other Available Sources

Research Data

Keywords

Origins of life, Fitness landscapes, Machine Learning

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories