Publication: Deep Learning as a Scientific Method and a Model Organism of Intelligence
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
The nature and origin of intelligence is a fundamental question in science that has been studied throughout history in psychology, neuroscience, and artificial intelligence. Recent advances in machine learning point to a promising direction: deep learning. Training neural networks by optimizing their parameters via gradient descent has shown success in both practical AI applications and in the pursuit of artificial general intelligence. This thesis investigates both the practical applications of deep learning and its scientific foundations.
The first part focuses on using deep learning to accelerate experimental neuroscience. I present two applications developed during my PhD: one that uses deep learning and synthetic data generation to track neurons in multi-channel 3D videos with improved efficiency by generating training data for rare postures not covered in the original dataset; and another that employs an uncertainty-aware system to actively guide electron microscope image acquisition in real time, achieving higher throughput by focusing the time budget on critical pixels.
The second part addresses the robustification of scientific data analysis using deep learning. I discuss how neural networks can either correct systematic errors in data or generate synthetic samples for better calibrated error estimates. The first approach is applied to hyperspectral data to remove cloud shadow’s effects on acquired spectra, while the second is used to generate probabilistic dark matter maps that quantify uncertainties in density fields without known ground truth.
The third part examines how intelligent abilities emerge in modern AI models during training. I first explore how AI models learn underlying concepts and compose them, discovering that compositional abilities may emerge without obvious behavioral signs. I then investigate how models develop in-context learning abilities based on their training data distribution, revealing a phase diagram composed of different algorithms the model implements.
The final part analyzes how large language models perform complex intelligent tasks. One study reveals that models generate task-specific representations in their internal activations when presented with new data generation processes at inference time. Another evaluates how language models integrate new information into their internal world models. I conclude by discussing the fundamental cognitive abilities that current models need to improve on to arrive at a general form of intelligence.
In summary, this thesis presents investigations of deep learning both as a tool to enhance scientific discoveries and as a model organism for studying intelligence.