Publication:

AIlice in Numberland: Comparing numerical understanding in language models and humans through multilingual reasoning puzzles

Loading...
Thumbnail Image

Date

2025-06-24

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Bhattacharya, Antara Raaghavi. 2025. AIlice in Numberland: Comparing Numerical Understanding in Language Models and Humans Through Multilingual Reasoning Puzzles. Bachelors Thesis, Harvard University Engineering and Applied Sciences.

Abstract

Numbers in language constitute an extraordinary human cultural innovation. When counting, languages around the world use diverse mathematical strategies to construct and combine their numbers. People learn how to use these systems of numbers despite this diversity. But while large language models (LLMs) appear to independently excel at linguistic and mathematical tasks, they are unable to solve linguistic-mathematical puzzles about systems of numbers in different languages, which humans can learn to solve successfully. This thesis presents a detailed investigation into why this task is difficult for language models.

We design a series of experiments that untangle the linguistic and mathematical aspects of numbers in language, probing at how individual parameters of numeral construction and combination affect model performance. Our experiments establish the novel finding that while individual mathematical features do not hinder the solving ability of current large language models, LLMs are unable to infer the compositional structure of numerals in these problems like humans can. LLMs cannot consistently solve such problems unless the mathematical operations in the problems are explicitly marked using known symbols (+, ×, etc.). Humans are able to use their understanding of numbers in language to make inferences about the implicit compositional structure of numerals — language models seem to lack this notion of numeral structure. We conclude that flexible, adaptive cross-domain use of language appears to remain a challenge for current language models.

Description

Other Available Sources

Research Data

Keywords

language models, mathematics, multilingual, numbers, problem-solving, reasoning, Linguistics, Artificial intelligence, Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories