Publication: Combining Datalog and SAT-Based Solving in Code-Reasoning Tools
Open/View Files
Date
Authors
Published Version
Published Version
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Datalog and SAT solving are powerful forms of automated logic that are both used to reason about the correctness of software. However, they are very different, and they are not typically combined in the same automated reasoning tool. This dissertation explores how to build code-reasoning tools that combine these complementary technologies, through two main thrusts: 1) the design and implementation of the logic-centric programming language Formulog, and 2) a progression of tools for Datalog program synthesis. In doing so, it explores the challenge of combining Datalog and SAT-based reasoning at multiple levels, viewing it variously as a language design problem, a system implementation problem, and a constraint solving problem. A domain-specific language for writing static analyses (programs that analyze other programs), Formulog augments Datalog with first-order functional programming and SMT solving, an extension to SAT solving. In Formulog, SMT formulas are data that can be constructed and inspected; a novel type system allows expressive SMT formulas while ensuring that neither normal evaluation nor SMT solving crashes. Formulog’s combination of features makes it possible to state a range of SMT-based analyses in a way that is both close to their formal specifications, and amenable to high-level optimizations and efficient evaluation. In addition to the design of Formulog, this dissertation presents implementation strategies for achieving good runtime performance. This includes a compiler from Formulog to Soufflé (an established Datalog system), as well as “eager evaluation,” a novel parallel execution algorithm for Datalog that leads to improved SMT solving times when used to evaluate Formulog programs. Thanks to these techniques, Formulog’s performance is competitive with (and sometimes much better than) non-Datalog reference implementations on diverse SMT-based case studies: refinement type checking, bottom-up points-to analysis, and symbolic execution. Finally, the dissertation presents a progression of tools for synthesizing Datalog programs (i.e., generating them automatically from specifications). Each tool explores a different way of combining Datalog and SAT/SMT solving; the ultimate solution takes advantage of answer set programming (ASP)—a constraint solving paradigm that subsumes Datalog and SAT-like reasoning—and achieves nearly order-of-magnitude geometric mean speedups over the prior state-of-the-art solution. This work elucidates connections between Datalog and SAT/SMT solving (for example, it shows that every Datalog program is the basis of a theoretically efficient SMT theory), and suggests the potential of using ASP to reason about Datalog programs.