Publication: Large Language Models and How We Train Them
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Large language models (LLMs) are the technological revolution of the decade, yet the mechanisms behind how they were developed or how they work fundamentally are often misunderstood by the public—and even debated among AI researchers and industry leaders—despite their popularity. As a result, for many individuals, these models have become unexplainable black boxes, which can lead to misuse or over-reliance on LLMs as people fail to grasp how they work or what their current limitations are. Due to a dramatic increase in AI discussion over the last five years, there is also the problem of information overload. There are many resources on AI from academic papers, blogs, news articles, and more that cover many different topics in a vast literature, but few unify these ideas into a coherent framework that explains how modern LLMs are built and trained. Many resources are also inaccessible and assume deep background knowledge, or focus on narrow implementation-level and mathematical details. This thesis is a structured, theoretical introduction to large language models that aims to combine theory with intuitive explanations and bridge the gap between understanding AI and its usage.