Publication: Data-Centric Connectomics Segmentation
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The field of connectomics aims to reconstruct the comprehensive connectivity maps of neurons in animal brains to build a structural foundation for neurodegenerative diseases and intelligence. High-resolution imaging techniques like electron microscopy (EM) enable neuroscientists to investigate neurons at an individual-synapse level and inspect sub-cellular structures like mitochondria. Image segmentation is the core task in the computing workflow that converts raw data into discriminative features for hypothesis verification and data-driven scientific discovery. However, modern imaging techniques have been accumulating data at unprecedented rates, which calls for accurate and efficient machine learning algorithms and systems to reduce the costly manual annotation.
To tackle the challenges, we employ a data-centric methodology, which consists of novel algorithms to reduce human annotation for vast unlabeled data, new benchmark datasets to reveal unknown challenges and inspire capable segmentation algorithms, and new open-source software to assist researchers in adapting the segmentation approaches to data from different imaging modalities. Specifically, this dissertation presents the following five contributions. First, we introduce a novel active learning algorithm that suggests informative queries by combining the information from an unsupervised feature extractor. Second, we present two hybrid-representation learning models that simultaneously predict multiple representations calculated from the permutation-invariant instance masks, achieving promising performance with limited training data. Third, we extend the hybrid-representation learning models with image-translation functionality to segment instances in new imaging modalities without any annotation in the target domain. Fourth, we presented several benchmark datasets covering mitochondria, neuronal nuclei, and synapses, serving as testbeds for novel segmentation algorithms. Finally, we introduce the PyTorch Connectomics open-source framework to overcome obstacles in scalability and flexibility, whose detailed documentation ensures reproducibility and usability for users with different experiences in deep learning methods. We expect the algorithms, datasets, and software introduced with this dissertation can alleviate the computing bottleneck in future connectomics research.