Luna: A Game-Based Rating System for Artificial Intelligence
MetadataShow full item record
AbstractResearch in Artificial Intelligence (AI) is driven by standardized tests and benchmarks. The level of success of a model on a popular benchmark can determine the amount of funding and attention from academia that the model receives. Despite this emphasis on testing, there are currently no widely accepted practical benchmarks for general AI. The Turing Test has long occupied this void in theory, but it has proven to be a poor practical guide for research, prompting a recent push in the research community to move "beyond the Turing Test". In this thesis, I put forth the Luna Rating System as a practical benchmark for AI. The system takes inspiration from chess ratings; humans and machines participate in two-player language-based games called Luna Games, and ``Smarts Ratings'' are assigned to both players based on the outcomes. The Smarts Rating of a machine player is indicative of its proximity to AI. After presenting the Luna Rating System and defining the Luna Game, I evaluate the robustness of the system to likely human player strategies. I then describe the three machine learning problems implicit in the Luna Game: Question Generation, Question Answering, and a third previously uncharacterized problem that I call Luna Rating Prediction. Finally, I introduce a web-based implementation of the Luna Rating System and recruit over 1200 human participants. The complete thesis amounts to a comprehensive introduction and evaluation of Luna as a practical test for AI.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:38811432
- FAS Theses and Dissertations