Publication:

The Pen and the Processor: A Turing-like Test to Gauge GPT-Generated Poetry

Loading...
Thumbnail Image

Date

2024-10-10

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Bechard, Deni. 2024. The Pen and the Processor: A Turing-like Test to Gauge GPT-Generated Poetry. Master's thesis, Harvard University Division of Continuing Education.

Abstract

The emergence of powerful large language models (LLMs) is rapidly transforming the cultural landscape through the generation of music, photography, illustration, video, and writing. In this experiment, I evaluate the capacity of ChatGPT-4 to emulate poetry, historically one of the most celebrated forms of creative expression for the ways that it pushes the boundaries of language to express the deepest aspects of human consciousness. Experiences such as love, belonging, mortality, and divinity are considered challenging to communicate, and the idea that an AI devoid of consciousness could create poetry that elicits deep feelings is not only controversial but also has profound implications for human culture and art. However, comparing human-written poetry with AI-generated poetry presents a challenge given that consciousness shapes human poetry in important ways that AIs cannot independently reproduce. To emulate the influence of consciousness on GPT-4’s poetic capacity, I applied the idea of constraints from poetry: the notion that traditional and formal elements such as meter, structure, and rhyme shape a poet’s craft. Similarly, a poet’s culture, emotions, cognitive biases, and personal experiences can act as constraints that influence their creativity. Being without consciousness, an LLM does not work within such constraints, and an evaluation of its poetic abilities requires that it be given measurable constraints that mimic those imposed by consciousness. To achieve this, I attempted to give GPT-4 the constraints that it could perceive in human-written poems. I instructed GPT-4 to generate poems under three constraint levels: high (imitating human-written originals), medium (following detailed instructions derived from human-written originals), and low (with no constraints based on human-written originals). Whereas the high constraint GPT-4 poems were composed within constraints intended to mimic human consciousness, the low constraint poems relied more on its inherent abilities. In this way, I created three categories of decreasing constraints imposed upon GPT-4 in order to evaluate whether the constraints affected readers’ ability to distinguish AI poetry from human poetry. In Turing-like tests, 236 participants indicated whether poems were AI or human in a two- alternative forced-choice manner. I evaluated the three constraint categories according to two metrics: (1) the proportion of responses (aggregated from all participants in each category) that correctly identified the AI poems; and (2) the percentage of participants in each category whose ability to identify AI-generated poems was statistically significant or marginally significant. The results showed that constraint levels influence GPT-4’s ability to emulate human poetry. As constraints decreased, participants more accurately identified AI-written poems. Additionally, a higher percentage of participants correctly identified AI poetry with fewer constraints at significant or marginally significant levels. Low-constraint poems were more often identified as AI, and high-constraint poems were more often mistaken for human. At a time when prompt engineering is developing as a field of study and AI copiloting is becoming increasingly commonplace, these findings highlight AI’s strengths and the risks it may pose. This study shows that GPT-4, when properly constrained, can be a powerful imitator, capable of humanlike writing, with significant implications for how AI may be used to shape human culture and experiences in the years ahead.

Description

Other Available Sources

Research Data

Keywords

ChatGPT, Poetry, Turing, Biology, Artificial intelligence, Literature

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories