Publication:

Computationally Speaking: The Mathematical Foundation of Large Language Models and An Exploration Into How They Tell Stories

Loading...
Thumbnail Image

Open/View Files

Date

2025-03-14

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Melas-Kyriazi, Natalie. 2024. Computationally Speaking: The Mathematical Foundation of Large Language Models and An Exploration Into How They Tell Stories. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

Over the past year, large language models such as ChatGPT have gained immense popularity, with hundreds of millions of active users. The adoption of these models in everyday tasks marks a significant shift in how we perceive and interact with tech- nology, making it all the more crucial to understand how these new tools work. This thesis aims to elucidate the inner workings of large language models, starting from first principles. We begin with an introduction to foundational machine learning concepts. Next we analyze the underlying architecture of neural networks, focusing on the evo- lution from basic feed-forward networks to Recurrent Feed-Forward networks, Long Short-Term Memory networks, and most importantly Transformer networks. In this analysis, we highlight key components such as the residual stream vector space and at- tention block. We then explore the optimization algorithms used to train autoregres- sive Transformer networks, including deterministic gradient descent, stochastic gradi- ent descent, and Adam, with an emphasis on their convergence properties. Finally, we present current research on Transformer network interpretability, including an ongoing research project about differentiating storytelling modes in the popular large language model Llama2. This thesis underscores that the first step to using machine learning responsibly is to understand it mathematically.

Description

Other Available Sources

Research Data

Keywords

Mathematics, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories