Publication: AIlice in Numberland: Comparing numerical understanding in language models and humans through multilingual reasoning puzzles
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Numbers in language constitute an extraordinary human cultural innovation. When counting, languages around the world use diverse mathematical strategies to construct and combine their numbers. People learn how to use these systems of numbers despite this diversity. But while large language models (LLMs) appear to independently excel at linguistic and mathematical tasks, they are unable to solve linguistic-mathematical puzzles about systems of numbers in different languages, which humans can learn to solve successfully. This thesis presents a detailed investigation into why this task is difficult for language models.
We design a series of experiments that untangle the linguistic and mathematical aspects of numbers in language, probing at how individual parameters of numeral construction and combination affect model performance. Our experiments establish the novel finding that while individual mathematical features do not hinder the solving ability of current large language models, LLMs are unable to infer the compositional structure of numerals in these problems like humans can. LLMs cannot consistently solve such problems unless the mathematical operations in the problems are explicitly marked using known symbols (+, ×, etc.). Humans are able to use their understanding of numbers in language to make inferences about the implicit compositional structure of numerals — language models seem to lack this notion of numeral structure. We conclude that flexible, adaptive cross-domain use of language appears to remain a challenge for current language models.