Mechanized Verification with Sharing

. We consider software veriﬁcation of imperative programs by theorem proving in higher-order separation logic. Of particular interest are the diﬃculties of encoding and reasoning about sharing and aliasing in pointer-based data structures. Both of these are diﬃculties for reasoning in separation logic because they rely, fundamentally, on non-separate heaps. We show how sharing can be achieved while preserving abstraction using mechanized reasoning about fractional permissions in Hoare type theory.


Motivation
Axiomatic semantics [7] is one way to formally reason about programs.Under these semantics, programs are analyzed by considering the effect of primitive operations on predicates over the heap.Unfortunately, stating and reasoning about these predicates is complicated due to potential pointer aliasing.It was not until Reynolds proposed separation logic [16] that reasoning about imperative programs in a modular way became tractable.However, even with this logic some specifications are still not simple.For example, many algorithms are simplified by sharing data which can be difficult to express in separation logic.
The difficulty comes from conflicting goals: We want to reason locally and compositionally about programs, and, at the same time, we wish to share data globally to make algorithm and data structure implementations more efficient.Vanilla separation logic provides the first, but makes the second difficult because of the non-local effects illustrated by the following Java program: void error ( List <T > lst ) { Iterator <T > itr = lst .iterator (); lst .remove (0); itr .next (); // throws ConcurrentModificationException } Here, line 3 has removed the element that the iterator is referencing, so we've destroyed the view that the iterator is abstracting even though line 3 does not even mention the iterator.If problems like this go undetected at run-time, they can result in NullPointerExceptions in Java, or memory corruption or segmentation faults in lower level languages such as C.
In this paper we show how type-directed formal verification can be used to verify data structures that share state, in particular collections and their iterators.Our data structures are heap-allocated and make liberal use of pointer aliasing.We have found that sharing makes formally reasoning about the correctness of programs in an automated way difficult, and we believe general theorem proving techniques are most suitable to address these problems that other techniques have not been able to.
We consider sharing of two sorts, external and internal.In external sharing, we wish to support multiple, simultaneous views of the same underlying memory for clients.In internal sharing, the sharing is completely hidden behind the abstraction allowing the client to reason using a simple interface while the implementation uses aliasing to make implementations more efficient.

Contributions
We begin with a brief overview of the Ynot verification library [4] (Section 2), demonstrating how higher-order separation logic can be used to provide abstraction.We then cover our contributions, we -Show how fractional permissions [8] can be applied to provide sharing of high-level abstractions, we focus on collections.(Section 3) -Show how external sharing can be leveraged to mechanically verify higherorder, effectful computations in Ynot, we focus on iterators.(Section 4) -Show how internal sharing can be expressed by describing the representation of B+ tree, skirted n-ary trees, and how our approach simplifies the implementation of an iterator.(Section 5) -While formalizing B+ trees, we also show a technique for formalizing data structures with a non-functional connection to their specification.(Section 5) In our presentation, we focus on interfaces in stylized Coq, but our implementation and verification are available at http://ynot.cs.harvard.edu/.After our contributions, we consider the burden of verification, the implications of our techniques, and related work (Section 6).
We believe that our methodology extends previous work describing aliasing in separation logic [3] by being amenable to machine-checkable proofs and embedable in Hoare-type theory.Previous work has developed paper-and-pencil proofs and, as has been seen in other contexts [1], the evolution from rigorous, manual proofs to mechanically verified proofs is not always straightforward.

Background
Ynot [4] is a Coq library that implements Hoare type theory [14] to reason about imperative programs using types.Hoare logic describes commands using Hoare triples, commands along with pre-and post-conditions.Ynot encodes these in the type of the Cmd monad.
where the command c has pre-condition P and post-condition Q that depends on the return value of c (bound to r).This type means that the command c can be run in any state that satisfies P and, if c terminates with value r, then the resulting state will satisfy Q r.
Ynot defines pre-and post-conditions in the logic of Coq as predicates over heaps, which, themselves, are defined as functions from pointers to optional values.Previous work [4] showed how using a stylized fragment of separation logic makes verification conditions more amenable to automation and therefore less burdensome for the programmer to prove.As in previous work, we use a shallow embedding of separation logic which we extend with support for fractional permissions (Figure 1).The empty heap (emp) denotes a heap containing no allocated cells, all pointers are mapped to None1 .The permission to access the heap cell pointed to by p is given by the fractional points-to relation, [8,6,15].We use the simple model of fractional permissions originally developed by Boyland [8].In this work, the value of q is a rational number such that 0 < q ≤ 1, in all cases the points-to relation asserts that the heap contains a cell with the value v pointed to by p.When q = 1, the points to assertion gives code the ability to read, write, and deallocate the cell.When q < 1, the points-to relation gives read-only access to the heap cell.The separating conjunction ( * ) states that the two conjuncts hold on two "disjoint" pieces of the heap.In the definition h 0 ⊥ h 1 defines the disjointness which is slightly complicated by the fractional permissions.Two heaps are disjoint if each pointer is mapped by only one heap or the values are the same and the fractions sum to a valid fraction.The operator defines a similar notion of unioning disjoint heaps.Ynot also supports existential quantification and pure propositions (propositions that do not mention the heap such as x = y or x < 5) in heap propositions.Cmd(∃v : T, p Fig. 2. Axiomatic basis for Hoare type theory using separation logic.
Ynot axiomatizes the primitive heap operations using the commands given in Figure 2. The new command allocates memory by producing the read-write capability to access the memory cell pointed to by the return value.The precondition specifies that the command needs no heap capabilities so the resulting pointer must be globally unique.The free command deallocates a memory cell by consuming the read-write permission to access the cell.The read command reads the values from a cell given a predicate, P , that describes the rest of the heap based on this value.The dependence on P allows us to enforce that the v in the pre-condition is the same as the v in the post-condition because P could include a precise equation on v.For example, if p pointed to a pointer to v, we could pick P = fun r ⇒ r → v thus making the post-condition reduce to The write command updates the value in a heap cell given a pointer and the new value.
These commands are combined using monadic bind and return in addition to a cast command that takes a proof and applies Hoare's consequence rule.The frame command extends the footprint of a command with extra capabilities that are invariant under the command.This is essential to local reasoning and enables Ynot to run a command with pre-condition P and post-condition Q in an environment satisfying P * R and allows us to infer the post-condition Q * R.

Sharable Abstractions: Linked Lists
In this section, we develop the basis of our contributions by defining a simple interface for externally sharable list structures.Sharing will allow multiple readonly views of the list or a single read-write view.We will achieve this using fractional permissions in the same way that we do for heap cells.
In Ynot, abstract data types are defined by a representation predicate and associated theorems and imperative commands.The interface for sharable lists (ImpList) is given as a type-class [18] in Figure 3.The class is parametrized (* Fractional merging and splitting of lists *) llist_split : ∀ q q' t m , q |#| q' → llist ( q + q' ) t m ⇐⇒ llist q t m * llist q' t m ; (* Allocate an empty list *) new : Cmd ( emp ) ( res : tlst ⇒ llist 1 res nil ) ; (* Free the list *) free : Π ( t : tlst ) , Cmd (∃ ls : list T , llist 1 t ls ) ( _ : unit ⇒ emp ) ; (* Get the ith element from the list if it exists .*) sub : Π ( t : tlst T ) ( i : nat ) ( m : # list T #) ( q : # perm #) , Cmd ( llist q t m ) ( res : option T ⇒ llist q t m * [ res = specNth m i ]) ; (* Insert an element at the ith position in the list .*) by the type of the elements in the list (T) and the type of handles to the list (tlst).The representation predicate (llist) relates a fractional permission (of type perm), the list handle and a functional model of the list (the list T) to the imperative representation, i.e. the structure of the heap.The heap proposition llist q t l states that t is a handle to a q-fraction of an imperative representation of the functional list l.Conceptually, we can think of this as t q → l.Assuming this, new and free are analogous to Ynot's new and free commands.
The specifications for sub and insert are expressed by relating their return value and post-condition to the result of pure functions (specNth and specInsert) that we take as specifications (we give the specNth function as an example of our specifications).We use the # in types to denote computationally irrelevant variables [4].These can be thought of as compile-time-only values that are used to specify the behavior of computations without incurring run-time overhead.
One easy way to realize this interface is using singly-linked lists as shown in Figure 4.The following recursive equations specify the representation invariant for singly-linked list segments between pointers f rom and to.
llseg q (Ptr f rom) to (a :: b) In equation ( 1), the model list is empty so the start and end pointers are the same.When the model list is not empty, i.e. it is a cons (a :: b), f rom must not be null, and there must exist a pointer x such that f rom points to a heap cell containing a and x (from q → mkNode a x) and x points to the rest of the list (llseg x to b).Equation ( 4) makes the list mutable by making tlst an indirection pointer so the pointer to the head of the list can change.
Since the definition only claims a q-fraction of the list, all of the points-to assertions have fraction q.This allows us to prove the llist_split lemma that states a q 0 + q 1 fraction of the list is equivalent to a q 0 fraction of the list disjoint from a q 1 fraction of the list.We can use this proof to create two disjoint, readonly views of the same list to share.

External Sharing: Iterators
The ability to share the list abstraction pays off when we need to develop another view of the list.Here, we develop a simple, efficient iterator over our list representation.
The representation predicate defines the heap by relating a fractional ownership of the list and the handle to the underlying list to the iterator handle, the functional contents of the list, and a natural number which defines the current position in the list.Here, the fractional permission is the fractional ownership of the underlying list, not of the iterator, so even if this fraction is not 1, we will still be able to call next.The open computation constructs an iterator to the beginning of a tlst T by converting the heap predicate from llist q t m to liter q t res m 0. The next command returns the current element in the list (or None if the iterator is past the end of the list) and advances the position, reflected in the index argument of liter.The close command reverses the effect of open by converting the liter back into a llist.
The owner parameter to the representation predicate is necessary for describing the heap precisely enough to support the close command.Its use is similar the use of ownership types [5].By making it a parameter we can specify that the llist permissions in the post-condition of close are exactly those that went into the open command.
With this, we can describe an iterator by full ownership of a heap cell containing a pointer to the current node, and fractional ownership of the owner pointer and underlying list.For simplicity implementing the interface, we break the specification of the list into two parts: the part that has already been visited (firstn i m) that goes from st to cur, and the rest (skipn i m) that goes from cur to Null.

Internal Sharing & Non-functional Heaps: B+ trees
We now turn to the problem of internal sharing.Recall that in internal sharing, we completely hide the sharing from the client.To demonstrate our tech-nique, we discuss the representation of B+ trees that we presented in previous work [12].We choose B+ trees to implement this interface because they have a structure that is tricky to reason about because of aliasing and previous work only demonstrated an imperative fold rather than the more primitive iterator.Our implementation for this interface does not include fractional permissions, though we believe that it would be relatively straightforward to add them. Figure 5 gives our target interface for finite maps and their iterators, which we combine for brevity.The class is parametrized by the type of keys, values, and finite map handles.The logical model is a sorted association list (fmap K V) that we relate to the handle with the heap proposition repMap q t m.The remaining computations are similar to those of the list; we support allocation and deallocation as well as key lookup and key-value insertion.The iterator predicate is the same as the list iterator predicate except it does not have the fractional permission.The open, next and close commands are the same as for the list.Fig. 6.An B+ tree of arity 4 (n = 4) for the finite map from i → v i for 1 ≤ i ≤ 9.
B+ trees are balanced, ordered, n-ary trees that store data only at the leaves and maintain a pointer list in the fringe to make in-order iteration of the values efficient.Figure 6 shows a simple B+ tree with arity 4.
As with most tree structures, B+ trees are comprised of two types of nodes: -Leaf nodes store data as a sequence of at most n key-value pairs in increasing order by key.The trailing pointer position points to the next leaf node.-Branch nodes contain a sequence of at most n keys-subtree pairs and a final subtree.The pairs are ordered such that the keys in a subtree are less than or equal to the associated key (represented in the figure as treeSorted min max).For example, the second subtree can only contain values greater than 2 and less than or equal to 6.The final subtree covers the span greater than the last key; in the figure, this is the span greater than 6.
As with the iterator, the two main difficulties in formalizing B+ trees reveal themselves in the representation predicate.The first concerns the fact that multiple trees can represent the same finite map.The second concerns the aliasing at the leaves which is necessary to make iteration efficient.
The standard way to address the first problem is to use a direct relational specification of the heap, existentially quantifying the splitting of the list into subtrees at each level [17].While this works well for paper-and-pencil proofs, it makes automation difficult because tactics need to guess the way that the heap is broken up at every step in order to instantiate existentials.Following this approach can yield goals with many existential variables that are not trivial to pick automatically.To avoid this, we factor the relation between the interface model and the heap description into a relation and a function, as shown in Figure 7.
Our representation model is a functional tree that we index by the height to enforce the balancedness constraint.In Coq, we could define this as follows, though we will modify it slightly to address the next problem: end The second difficulty deals more directly with sharing.In the standard representation for a tree, we existentially quantify the pointers at the parent pointer for each node, but, if we follow this approach we can not directly encode the aliasing at the leaves because the predicate does not have access to both pointers.We could quantify the leaf pointers when the tree splits, but this gets ugly because we are working with n-ary trees.This would also lead to difficulties when defining iterators because we will want to frame the trunk part of the computation and consider only the leaves.Instead, we embed the pointers directly in the representation model using the following type: end Using this representation model, we can easily compute the pointers that alias without needing to worry about scoping since all of the pointers will be quantified at the root.
With this model, we can turn to describing the heap.We define repTree h o p to hold on to a heap representing the ptree p of height h when the rightmost leaf's next pointer equals o: The repTree predicate has two cases depending on the ptree's height.In the leaf case, the array holds the list of key-value pairs from the ptree. repLeaf Abstraction Barrier Fig. 7. Decouple the relational mapping between the interface and the heap by factoring out a representation model that is functionally related to the heap.
In the branch case, the array holds key-pointer pairs such that each pointer points to the representation of the corresponding subtree in the ptree.This is captured by the repBranch predicate: At this point, we have defined the rep function from Figure 7; it remains to define rel.A standard relation would be fine to implement this, but since each tree corresponds to exactly 1 finite map, we can simplify things by computing the finite map (using as_map) associated with the tree and stating that it equals the desired model.
We can pick the handle type to be a pointer and define the full representation predicate to be the conjunction of rep and rel with some additional pure facts: By packing a copy of the ptree with the root pointer, we avoid the need to search for a model during proofs.The alternative is to show that there is at most one ptree that a given pointer and heap can satisfy (i.e., that repTree is precise [15]).However, this is complicated by the fact that the ptree type is indexed by the height.The pure treeSorted predicate combines all of the facts about the key constraints, but is not necessary for the iterator and was explained in previous work [12], so we do not explain it in detail.
With our representation for B+ trees, we can now turn to their iterators.Our approach is similar to the technique we applied to the list iterator.First, we state the heap predicate that divides the tree into the "trunk" and the branches as disjoint entities.We can achieve this with only minor discomfort by parameterizing repTree by the leaf case and passing the empty heap when we only want to describe the trunk.We also implement a function repLeaves to describe a list of leaves in isolation.These two functions satisfy the following property which is key to opening and closing our iterator: ∀h optr p. repTree optr p ⇐⇒ repTrunk optr p * repLeaves (Some (firstPtr p)) (leaves p) optr Using these predicates, we can define the representation of the iterator: The first two lines after the existentials corresponds to the framed heap and pure facts needed to re-establish the tree representation invariant.The third line declares the iterator state (h → (cur, i)) and the combined repLeaves specify the representation of the leaves.Because each leaf could have a different number of key-value pairs, it is difficult to use the built-in firstn and skipn functions, so we existentially quantify two lists of leaves (prev and rest) and assert that their concatenation (+ +) must be equal to the leaves of the tree.The final pure fact establishes the invariant on the cur and the index into the current leaf: if there are elements left to iterate, i + length (as map prev) = idx and i is a valid index in the list.Otherwise, i = 0 and rest = nil.

Discussion
In this section we consider the overhead of verification (Section 6.1), summarize our sharing insights (Section 6.2), and review related work (Section 6.3).

The Burden of Mechanized Proofs
Our methodology places the burden of proof on the developer.Proof search scripts and lemmas are part of the final code and running them considerably increases compilation time.However, our proofs confirm strong functional correctness properties and our specifications document precise pre-and post-conditions for clients to use.
Figure 8, presents a quantitative look at the size of our development in number of lines.The Spec column counts command specifications; this is the interface that the client needs to reason about.Excluding the data structure invariants, this is the part of the code that a client of the library needs to reason about.The Impl column counts imperative code.The next two columns count auxiliary lemmas and automation.The second, Sep.Lemmas, counts lines that pertain to separation logic, while Log.Lemmas counts lines that only reason about pure structures, such as lists.The Overhead column gives the ratio of proofs to specification and code.The Time column gives the time required to prove all of the verification conditions not including auxiliary lemmas.Line counts include only new lines needed for verifying the function, so, if a lemma is required for both sub and insert it is only counted against sub.
As Figure 8 shows, the first commands contribute the most to the proof burden because we are writing general lemmas about the model and representation predicate.Once these lemmas have been proven, the remainder of the commands are almost immediate.We believe that the logical lemmas required for our code are mostly within the capabilities of existing automated theorem provers [13] and integrating such tools would likely eliminate all of the overhead from this column.It is less likely that existing tools are directly applicable to our separation logic though existing automation is fairly good at this.The time spent interactively verifying our implementation was mostly spent abstracting lemmas which is straightforward but time consuming because of Coq's toplevel model.Fig. 8. Breakdown of lines of code for lists and iterators.

Sharing Lessons
While originally proposed for parallel code, fractional permissions for external sharing are important for sequential code.This is a by-product of multiple views of the same data structure, in our case lists and iterators.Our solution is simple because the list and iterator are completely decoupled and so we do not need to correlate mutation through multiple views2 .Supporting mutation with a single iterator is relatively straight-forward though we need to change our iterator to carry the pointer to the list representation so that we can update the head pointer.The ConcurrentModificationException problem from Java is a general consequence of mutation of structures with multiple views over them and giving natural semantics to these operations is similar to the difficulty of writing precise specifications for concurrent functions.When describing internal sharing, we get to specify equations directly on pointers.The difficulties come from scoping the existential quantification of pointers in recursive representation predicates.We find that quantifying all of the pointers at the beginning is useful for addressing this problem and it fits well with our solution to the problem of heap structures being loosely related to interface models because we can store the pointers in the interface model and easily encode aliasing.This approach also allows us separately to state pure facts about the structure of the heap rather than having to fold them into the representation predicate.

Related Work
Weide [20] uses model-oriented specification in Resolve to specify how iterators behave.These specifications follow a requires/ensures template on top of a purely logical model, similar to Ynot's interface model.
Bierhoff [2] proposed a technique for using type-state specifications [10] for iterators.This system uses finite state machines to define the state of an object and specify when operations are permitted.This technique is particularly useful for specifying "non-interference" properties [19] such as marking a collection read-only when an iterator exists.We achieve this using fractional permissions, but can encode the same functionality by adding a state parameter to the representation predicate of our data structures.
Our approach is most similar to the work of Krishnaswami [11] where separation and Hoare logic are combined to reason about iterators.His technique relies on the separating implication (− * ), the separation logic analog of implication.We are interested in incorporating this into our separation logic, but we have not yet developed effective automation for it, so the burden of using it can be considerable.More recent work by Jensen [9] shows how a similar approach using separating implication can be applied to mutable views of a container.
B+ trees have been formalized in two previous developments.Bornat et al. [3] proposed using classical conjunction to capture the B+ tree as a tree and a list in the same heap.This is convenient for representation, but it requires reestablishing both the heap as a tree and as a list at every step of the code.By unifying the two views, we only need to reason about the view that we are using in our code.We support the two views by proving repTree is equivalent to a representation that exposes the leaves as a list.
Sexton and Thielecke [17] formulate B+ trees by defining a language of treeoperations for a stack-machine.Their representation is similar to our own in not using classical conjunction, but they quantify structure in the representation predicate which forces them to state the pure properties there as well.

Conclusions
In this work we have demonstrated a technique for building verified imperative software using theorem proving in the Ynot library for Coq.
We showed how external sharing can be achieved using abstract predicates which quantify over fractional permissions and showed how this technique can be applied to representing multiple views.Further, we showed how ownership types can be applied to make the view's representation predicate precise.
To address internal sharing we suggest simplifying recursive definitions by existentially quantifying all of the salient aspects of the data structure at the beginning of the representation predicate.This makes stating facts such as aliasing equations simple and allows the programmer to implement his or her code to minimize the use of existential quantification which can be difficult for automation to reason about.

Future Work
The use of the separating implication in so many developments [11,17] demonstrates its usefulness.It would benefit our own development by allowing us to

Fig. 1 .
Fig. 1.The shallow embedding of separation logic used in Ynot.