Publication:

DafnyBench

Loading...
Thumbnail Image

Date

2024-06-12

Published Version

Published Version

Journal Title

Journal ISSN

Volume Title

Publisher

The Harvard community has made this article openly available. Please share how this access benefits you.

Research Projects

Organizational Units

Journal Issue

Citation

Loughridge, Chloe Rose. 2024. DafnyBench. Bachelor's thesis, Harvard University Engineering and Applied Sciences.

Abstract

It is said that “program testing can be used to show the presence of bugs, but never to show their absence”. As Large Language Models are increasingly used to generate code, it is important that we verify the absence of bugs in their outputs. For this reason, we introduce DafnyBench, a novel benchmark for evaluating Large Language Models' abilities to write Dafny code, which is a verification-aware programming language. With over 1,000 sample programs, DafnyBench represents the largest benchmark of its kind to date. We argue this benchmark represents an important piece of infrastructure for an emerging field: AI-assisted formal verification. The benefits of having Large Language Models write verification-aware code cuts in two directions: the Dafny verifier provides a rich stream of feedback and clear learning signal for what constitutes correct code with respect to a set of pre- and post- conditions (helping machine learning research efforts), and also greater prevalence of Dafny code generally leads to a higher amount of safe software in the world.

Description

Other Available Sources

Research Data

Keywords

Computer science

Terms of Use

This article is made available under the terms and conditions applicable to Other Posted Material (LAA), as set forth at Terms of Service

Endorsement

Review

Supplemented By

Related Stories