Verified programming in F*
A tutorial
The F* Team
MSR, MSR-Inria, Inria

1. Note: This version is outdated

This is an old tutorial on F*. Some of the material here is outdated and incompatible with the current version of F*. Even though the new online book is still a work in progress, it is a more a comprehensive tutorial on the latest version of F*. This page is present mainly for archival purposes.

2. Introduction

F* is a verification-oriented programming language developed at Microsoft Research, MSR-Inria, and Inria. It follows in the tradition of the ML family of languages in that it is a typed, strict, functional programming language. However, its type system is significantly richer than ML's, allowing functional correctness specifications to be stated and checked semi-automatically. This tutorial provides a first taste of verified programming in F*. More information about F*, including papers and technical reports, can be found on the F* website.

It will help if you are already familiar with functional programming languages in the ML family (e.g., OCaml, F#, Standard ML), or with Haskellwe provide a quick review of some basic concepts if you're a little rusty, but if you feel you need more background, there are many useful resources freely available on the web, e.g., Learn F#, F# for fun and profit, Introduction to Caml, the Real World OCaml book, or the OCaml MOOC.

The easiest way to try F* and solve the verification exercises in this tutorial is directly in your browser by using the online F* editor. You can load the boilerplate code needed for each exercise into the online editor by clicking the “Load in editor” link in the body of each exercise. Your progress on each exercise will be stored in browser local storage, so you can return to your code later (e.g. if your browser crashes, etc).

You can also do this tutorial by installing F* locally on your machine. F* is open source and cross-platform, and you can get binary packages for Windows, Linux, and MacOS X or compile F* from the source code on github using these instructions.

You can edit F* code using your favorite text editor, but for Emacs the community maintains fstar-mode.el, a sophisticated extension that adds special support for F*, including syntax highlighting, completion, type hints, navigation, documentation queries, and interactive development (in the style of CoqIDE or ProofGeneral). You can find more details about editor support on the F* wiki.

The code for the exercises in this tutorial and their solutions are in the F* repository on Github. For some exercises, you have to include additional libraries, as done by the provided Makefiles. To include libraries for the Emacs interactive mode follow the instructions here.

By default F* only verifies the input code, it does not execute it. To execute F* code one needs to extract it to OCaml or F# and then compile it using the OCaml or F# compiler. More details on executing F* code on the F* wiki.

2.1. Your first F* program: a simple model of access control

Let's get started by diving into a simple model of access control on files. Along the way, we'll see some basic concepts from functional programming in general and a couple of F*-specific features.

Let's say we want to model a simple access control discipline on filescertain files are readable or writable, whereas others may be unauthorized. We'd like to write a policy to describe the privileges on each file, and we'd like to write programs whose file accesses are checked against this policy, guaranteeing statically that an unauthorized file is never accessed mistakenly.

Our model of this scenario (which we'll make more realistic in later sections) proceeds in three easy steps.

  1. Define the policy using simple pure functions.
  1. Identify the security-sensitive primitives in the program and protect them with the policy.
  1. Write the rest of the program, resting assured that the F* type-checker will verify statically that the policy-protected primitives are never improperly accessed.

2.1.1. Defining a policy

The syntax of F* is based closely on the syntax of OCaml and the non-light syntax of F#. Our F* program is made up of one module per file, and the body of the module contains a number of definitions, and optionally includes a ‘main’ expression.

Here are the first three definitions from the program:

type filename = string

(** [canWrite] is a function specifying whether a file [f] can be written *)
let canWrite (f:filename) = 
  match f with 
    | "demo/tempfile" -> true
    | _ -> false

(** [canRead] is also a function ... *)
let canRead (f:filename) = 
  canWrite f               (* writeable files are also readable *)
  || f="demo/README"       (* and so is demo/README *)

code/exercises/Ex01a.fstLoad in editor

The first definition defines a type synonym: we'll use strings to model filenames.

After that, we define two boolean functions: canWrite and canRead: The canWrite function inspects its argument f using a pattern matching expression: it returns the boolean true when f equals "demo/tempfile" and false otherwise. The function canRead is similara file is readable if it is writable, or if it is the "demo/README" file.

2.1.2. Tying the policy to the file-access primitives

To enforce the policy, we need to connect the policy to the primitives in our programs that actually implement the file reading and writing operations.

F* provides a facility to specify interfaces to external modules that are implemented elsewhere. For example, operations that perform file input/output are implemented by the operating system and made available to F* programs via the underlying framework, e.g., .NET or OCaml. We can describe a simple interface provided by the framework as follows:

val read  : f:filename{canRead f}  -> ML string
let read f  = FStar.IO.print_string ("Dummy read of file " ^ f ^ "\n"); f

val write : f:filename{canWrite f} -> string -> ML unit
let write f s = FStar.IO.print_string ("Dummy write of string " ^ s ^ " to file " ^ f ^ "\n")

code/exercises/Ex01a.fstLoad in editor

This interface provides two functions, read and write, which in a real setup would implement the corresponding operations on files. In our simplified example they just print things on screen.

The crucial bits of these declarations are, of course, the use of the canRead and canWrite functions in the signature, which allow us to connect the types of these security-sensitive functions to our policy.

2.1.3. Writing programs and verifying their security

Trying to check that a piece of code conforms to an access-control policy is a tedious and error-prone process. Even if every access is guarded by a security check, how can we make sure that each check is adequate? We can use F*'s type-checker to automate this kind of code review.

Here's some simple, untrusted client code. It defines some common file names, and then a function called staticChecking, which tries to read and write a few files.

let passwd : filename = "demo/password"
let readme : filename = "demo/README"
let tmp    : filename = "demo/tempfile"

val staticChecking : unit -> ML unit
let staticChecking () =
  let v1 = read tmp in
  let v2 = read readme in
  (* let v3 = read passwd in // invalid read, fails type-checking *)
  write tmp "hello!"
  (* ; write passwd "junk" // invalid write , fails type-checking *)

code/exercises/Ex01a.fstLoad in editor

The type-checker ensures statically that this untrusted code complies with the security policy. The reads to tmp and readme, and the write to tmp are allowed by the policy, so the program successfully type-checks. On the other hand, if we uncomment the lines that read or write "junk" to the password file, the F* type-checker complains that the passwd file does not have the type f:filename{canRead f}, respectively f:filename{canWrite f}. For instance, if we uncomment the write we get the following error message:

.\Ex01a.fst(40,10-40,16): Subtyping check failed; 
expected type (f:Ex01a.filename{(Prims.b2t (Ex01a.canWrite f))}); 
got type Ex01a.filename

You'll understand more about what that message means as you work through this tutorial, but, for now, take it to mean that F* expected canWrite passwd to evaluate to true, which isn't the case.

The staticChecking function above illustrates how F* can be used to specify a security policy, and statically enforce complete and correct mediation of that policy via type-checking. We will now illustrate that checking of the policy doesn't have to happen all statically, but that the dynamic checks added by the programmer are taken into account by the type system. In particular we implement a checkedRead function that consults the policy and only performs the read if the policy allows it, otherwise it raises an exception.

exception InvalidRead
val checkedRead : filename -> ML string
let checkedRead f =
  if canRead f then read f else raise InvalidRead

code/exercises/Ex01a.fstLoad in editor

Note that the type of checkRead imposes no condition on the the input file f. When the canRead f check succeeds the type-checker knows that the read f call is safe.

Exercise 2a:
Write a function called checkedWrite that takes a filename f and a string s as argument, checks the policy to make sure the file f is writable, and only if that is the case writes s to f. If the file is not writable your checkedWrite should raise an exception. As with checkedRead, your checkedWrite should have no preconditions.

assume val checkedWrite : filename -> string -> ML unit

code/exercises/Ex01a.fstLoad in editor

exercise answer

exception InvalidWrite
let checkedWrite f s =
  if canWrite f then write f s else raise InvalidWrite

code/solutions/Ex01a.fstLoad in editor

You can use checkedRead and your checkedWrite to replace read and write in staticChecking, so that now even the accesses to passwd are well-typed.

let dynamicChecking () =
  let v1 = checkedRead tmp in
  let v2 = checkedRead readme in
  let v3 = checkedRead passwd in (* this raises exception *)
  checkedWrite tmp "hello!";
  checkedWrite passwd "junk" (* this raises exception *)

code/exercises/Ex01a.fstLoad in editor

This is secure because checkedRead and checkedWrite defer to runtime the same checks that were previously performed at compile time by F*, and perform the IO actions only if those checks succeed.

3. Basics: Types and effects

You may not have noticed them all, but our first example already covers several distinctive features of F*. Let's take a closer look.

3.1. Refinement types

One distinctive feature of F* is its use of refinement types. We have already seen two uses of this feature, e.g, the type f:filename{canRead f} is a refinement of the type filename. It's a refinement because only a subset of the filename values have this type, namely those that satisfy the canRead predicate. The refinement in the type of write is similar. In traditional ML-like languages, such types are inexpressible.

In general, a refinement type in F* has the form x:t{phi(x)}, a refinement of the type t to those elements x that satisfy the formula phi(x). For now, you can think of phi(x) as any pure boolean valued expression. The full story is more general (4.5).

3.2. Inferred types and effects for computations

Although we didn't write down any types for functions like canRead or canWrite, F* (like other languages in the ML family) infers types for your program (if F* believes it is indeed type correct). However, what is inferred by F* is more precise than what can be expressed in ML. For instance, in addition to inferring a type, F* also infers the side-effects of an expression (e.g., exceptions, state, etc).

For instance, in ML (canRead "foo.txt") is inferred to have type bool. However, in F*, we infer (canRead "foo.txt" : Tot bool). This indicates that canRead "foo.txt" is a pure total expression, which always evaluates to a boolean. For that matter, any expression that is inferred to have type-and-effect Tot t, is guaranteed (provided the computer has enough resources) to evaluate to a t-typed result, without entering an infinite loop; reading or writing the program's state; throwing exceptions; performing input or output; or, having any other effect whatsoever.

On the other hand, an expression like (read "foo.txt") is inferred to have type-and-effect ML string, meaning that this term may have arbitrary effects (it may loop, do IO, throw exceptions, mutate the heap, etc.), but if it returns, it always returns a string. The effect name ML is chosen to represent the default, implicit effect in all ML programs.

Tot and ML are just two of the possible effects. Some others include:

The primitive effects {Tot, Dv, ST, Exn, ML} are arranged in a lattice, with Tot at the bottom (less than all the others), ML at the top (greater than all the others), and with ST unrelated to Exn. This means that a computation with a particular effect can be treated as having any other effect that is greater than it in the effect orderingwe call this feature sub-effecting.

In fact, you can configure F* with your own family of effects and effect ordering and have F* infer those effects for your programsbut that's more advanced and beyond the scope of this tutorial. (If curious you can check out this paper.)

From now on, we'll just refer to a type-and-effect pair (e.g. Tot int where Tot is the effect and int is the result type), as a computation type.

3.3. Function types

F* is a functional language, so functions are first-class values that are assigned function types. For instance, in ML the function is_positive testing whether an integer is positive is given type int -> bool, i.e. it is a function taking an integer argument and producing a boolean result. Similarly, a maximum function on integers max is given type int -> int -> int, i.e. it is a function taking two integers and producing an integer. Function types in F* are more precise and capture not just the types of the arguments and the result, but also the effect of the function's body. As such, every function value has a type of the form t1 -> ... -> tn -> E t, where t1 .. tn are the types of each of the arguments; E is the effect of its body; and t is the type of its result. (Note to the experts: We will generalize this to dependent functions and indexed effects progressively.) The pure functions above are given a type using effect Tot in F*:

val is_positive : int -> Tot bool
let is_positive i = i > 0

val max : int -> int -> Tot int
let max i1 i2 = if i1 > i2 then i1 else i2

Impure functions, like reading the contents of a file, are given corresponding effects:

val read : filename -> ML string

F* is a call-by-value functional language (like OCaml and F#, and unlike Haskell), so function arguments are fully evaluated to values before being passed to the function. Values have no (immediate) effect, thus the types of a function's arguments have no effect annotation. On the other hand, the body of a function may perform some non-trivial computation and so the result is always given both a type and an effect, as explained above.

Like many functional languages, function application has higher precedence than almost any other operation. Remembering this can help you keep track of when you do (and do not) need parentheses. For example, if, using the definitions for is_positive and max above, you try to write:

let x = is_positive max 1 2

you will get a type resolution failure, since F* will try to apply is_positive to the max function itself, and it will complain that is_positive expects an int argument, whereas max is a function that maps two integers to another integer. In this case, you need one set of parentheses around max:

let x = is_positive (max 1 2)

Exercise 3a:
As you could see above, we use val to ask F* to check that a definition has a certain expected type. Write down the expected types for canRead and canWrite from 2.1.1 and ask F* to check it for you.

(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex02a
//can-read-write-types

type filename = string

//val canWrite : unit (* write a correct and precise type here *)
let canWrite (f:filename) =
  match f with 
    | "demo/tempfile" -> true
    | _ -> false

//val canRead : unit (* write correct and precise type here *)
let canRead (f:filename) =
  canWrite f               (* writeable files are also readable *)
  || f="demo/README"       (* and so is this file *)

code/exercises/Ex02a.fstLoad in editor

exercise answer
val canRead: filename -> Tot bool
val canWrite: filename -> Tot bool

code/solutions/Ex02a.fstLoad in editor

3.3.1. The default effect is Tot

Since total functions are very commonly used in verified programs, writing Tot is optional in F*'s syntax. As such, filename -> bool is a shorthand for filename -> Tot bool. Nevertheless, we consider it good practice to annotate functions with their result effect, even when it is Tot.

For more details on the syntax of function types and definitions, in particular the rules for default effects, see this wiki page.

3.3.2. Computing functions

Skip this section on a first read, if you are new to functional programming.

Sometimes it is useful to return a function after performing some computation. For example, here is a function that makes a new counter, initializing it to init.

val new_counter: int -> St (unit -> St int)
let new_counter init =
  let c = ST.alloc init in
  fun () -> c := !c + 1; !c

F* can show that new_counter has type int -> St (unit -> St int), the type of a function that given an integer, may read, write or allocate references, and then returns a unit -> St unit function. Using sub-effecting, we can also write the type below and F* will check that new_counter has that type as well.

val new_counter: int -> ML (unit -> ML int)

Note, the type above, is not the same as int -> unit -> ML int. The latter is a shorthand for int -> Tot (unit -> ML int).

4. First proofs about functions on integers

Our first example on access control lists illustrated how F*'s type system can be used to automatically check useful security properties for us. But, before we can do more interesting security-related programming and verification, we will need to understand more about the basics of F*. We'll shift back a gear now and look at programming and proving properties of simple functions on integers.

4.1. Statically checked assertions

One way to prove properties about your code in F* is to write assertions.

let _ = assert (max 0 1 = 1)

Our first assertion is an obvious property about max. You may have seen a construct called assert in other languagestypically, it is used to check a property of your code at runtime, raising an assertion-failed exception if the asserted condition is not true.

In F*, assertions have no runtime significance. Instead, F* checks statically (i.e. before compiling your program) that the asserted condition is always true, and hence could never fail.

In this case, the proof that max 0 1 = 1 is fairly easy. Under the covers, F* translates the definition of max and the asserted property to a theorem prover called Z3, which then proves the property automatically.

We can assert and automatically check more general properties about max. For example:

let _ = assert (forall x y. max x y >= x
                && max x y >= y
                && (max x y = x || max x y = y))

4.2. Factorial and Fibonacci functions

Let's take a look at the factorial function: open FStar.Mul let rec factorial n = if n = 0 then 1 else n * (factorial (n - 1))

Remark. By default the op_Star operator of F* (the * above) is the type constructor of product types also known as tuples. If, as in the example above, we are not doing anything with tuples, it is convenient to open the module FStar.Mul to let * be multiplication. Otherwise, one has to write op_Multiply a b instead of a * b.

For a recursive function, without further annotation, F* attempts to automatically prove that the recursion always terminates. In the case of factorial above, F* attempts to prove that it has type int -> Tot int, and fails to do so because, factorial is actually not a total function on arbitrary integers! Think about factorial -1.

Remark. F*'s int type represents unbounded mathematical integers.

The factorial function is, however, a total function on non-negative integers. We can ask F* to check that this is indeed the case by writing:

val factorial: x:int{x>=0} -> Tot int

The line above says several things:

Once we write this down, F* automatically proves that factorial satisfies all these properties. The main interesting bit to prove, of course, is that factorial x actually does terminate when x is non-negative. We'll get into the details of how this is done in section 6, but, for now, it suffices to know that in this case F* is able to prove that every recursive call is on a non-negative number that is strictly smaller than the argument of the previous call. Since this cannot go on forever (we'll eventually hit the base case 0), F* agrees with our claim about the type of factorial.

The type of non-negative integers (or natural numbers) is common enough that you can define an abbreviation for it (in fact, F* already does this for you in its standard prelude).

type nat = x:int{x>=0}

Exercise 4a:
What are some other correct types you can give to factorial? Try writing them and see if F* agrees with you. Try describing in words what each of those types mean.

val factorial: x:int{x>=0} -> Tot int
let rec factorial n = 
  if n = 0 then 1 else n * (factorial (n - 1))

code/exercises/Ex03a.fstLoad in editor

exercise answer

There are many possible correct types for factorial. Here are a few:

val factorial: int -> Dv int                  (* factorial may loop forever, but may return an integer *)
val factorial: nat -> Tot int                 (* total function on nat inputs *)
val factorial: nat -> Tot nat                 (* stronger result type *)
val factorial: nat -> Tot (y:int{y>=1})       (* even stronger result type *)

code/solutions/Ex03a.fstLoad in editor

Exercise 4b:
Give several types for the fibonacci function.

let rec fibonacci n =
  if n <= 1 then 1 else fibonacci (n - 1) + fibonacci (n - 2)

code/exercises/Ex03b.fstLoad in editor

exercise answer
(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex03b
//fibonacci


(* val fibonacci : int -> int *)
(* val fibonacci : int -> ML int (same as above) *)
(* val fibonacci : int -> Tot nat *)
(* val fibonacci : int -> Tot (y:int{y>=1}) *)
(* val fibonacci : x:int -> Tot (y:int{y>=1 /\ y >= x}) *)
(* val fibonacci : x:int -> Tot *)
(*   (y:int{y >= 1 /\ y >= x /\ (x>=3 ==> y>=2)}) *)
val fibonacci : int -> Tot (x:nat{x>0})
let rec fibonacci n =
  if n <= 1 then 1 else fibonacci (n - 1) + fibonacci (n - 2)

code/solutions/Ex03b.fstLoad in editor

4.3. Lemmas

Let's say you wrote the factorial function and gave it the type (nat -> Tot nat). Later, you care about some other property about factorial, e.g., that if x > 2 then factorial x > x. One option is to revise the type you wrote for factorial and get F* to reprove that it has this type. But this isn't always feasible. What if you also wanted to prove that if x > 3 then factorial x > 2 * x: clearly, polluting the type of factorial with all these properties that you may or may not care about is impractical.

However, F* will allow you to write other functions (we call them lemmas) to prove more properties about factorial, after you've defined factorial and given it some generally useful type like nat -> Tot nat.

A lemma is a ghost total function that always returns the single unit value (). As such, computationally, lemmas are uselessthey have no effect and always return the same value, so what's the point? The point is that the type of the value returned by the lemma carries some useful properties about your program.

As a first lemma about factorial, let's prove that it always returns a positive number.

val factorial_is_positive: x:nat -> GTot (u:unit{factorial x > 0})
let rec factorial_is_positive x =
  match x with
  | 0 -> ()
  | _ -> factorial_is_positive (x - 1)

The statement of the lemma is the type given to factorial_is_positive, a ghost total function on x:nat which returns a unit, but with the refinement that factorial x > 0. The next three lines define factorial_is_positive and proves the lemma using a proof by induction on x. The basic concept here is that by programming total functions, we can write proofs about other pure expressions. We'll discuss such proofs in detail in the remainder of this section.

Remark. In F* an expression being ghost (computationally irrelevant) is an effect. The effect GTot in the type of factorial_is_positive indicates that it is a ghost, total function.

4.3.1. Dependent function types

Let's start by looking a little closer at the statement of the lemma.

val factorial_is_positive: x:nat -> GTot (u:unit{factorial x > 0})

This is a dependent function type in F*. Why dependent? Well, the type of the result depends on the value of the parameter x:nat. For example, factorial_is_positive 0 has the type GTot (u:unit{factorial 0 > 0)), which is different from the type of factorial_is_positive 1.

Now that we've seen dependent functions, we can elaborate a bit more on the general form of function types:

x1:t1 -> ... -> En xn:tn[x1 ... xn-1] -> E t[x1 ... xn]

Each of a function's formal parameters are named xi and each of these names is in scope to the right of the first arrow that follows it. The notation t[x1 ... xm] indicates that the variables x1 ... xm may appear free in t.

4.3.2. Some syntactic sugar for refinement types and lemmas

How shall we specify that factorial is strictly greater than its argument when its argument is greater than 2? Here's a first cut:

val factorial_is_greater_than_arg1: x:(y:nat{y > 2}) -> GTot (u:unit{factorial x > x})

This does the job, but the type of the function's argument is a bit clumsy, since we needed two names: y to restrict the domain to natural numbers greater than 2 and x to speak about the argument in the result type. This pattern is common enough that F* provides some syntactic sugar for itwe have seen it already in passing in the first type we gave to factorial. Using it, we can write the following type, which is equivalent to, but more concise than, the type above.

val factorial_is_greater_than_arg2: x:nat{x > 2} -> GTot (u:unit{factorial x > x})

Another bit of clumsiness that you have no doubt noticed is the result type of the lemmas which include a refinement of the unit type. To make this lighter, F* provides the Lemma keyword which can be used in place of the GTot effect.

For example, the type of factorial_is_greater_than_arg2 is equivalent to the type of factorial_is_greater_than_arg3 (below), which is just a little easier to read and write.

val factorial_is_greater_than_arg3: x:nat{x > 2} -> Lemma (factorial x > x)

We can also write it in the following way, when that's more convenient:

val factorial_is_greater_than_arg4: x:nat -> Lemma (requires (x > 2))
                                              (ensures (factorial x > x))

The formula after requires is the pre-condition of the function/lemma, while the one after ensures is its post-condition.

4.3.3. A simple lemma proved by induction, in detail

Now, let's look at a proof of factorial_is_greater_than_arg in detail. Here it is:

let rec factorial_is_greater_than_arg x =
  match x with
  | 3 -> ()
  | _ -> factorial_is_greater_than_arg (x - 1)

The proof is a recursive function (as indicated by the let rec); the function is defined by cases on x.

In the first case, the argument is 3; so, we need to prove that factorial 3 > 3. F* generates a proof obligation corresponding to this property and asks Z3 to see if it can prove it, given the definition of factorial that we provided earlierZ3 figures out that by factorial 3 = 6 and, of course, 6 > 3, and the proof for this case is done. This is the base case of the induction.

The cases are taken in order, so if we reach the second case, then we know that the first case is inapplicable, i.e., in the second case, we know from the type for factorial_is_greater_than_arg that x:nat{x > 2}, and from the inapplicability of the previous case that x <> 3. In other words, the second case is the induction step and handles the proof when x > 3.

Informally, the proof in this case works by using the induction hypothesis, which gives us that

forall n, 2 < n < x ==> factorial n > n

and our goal is to prove factorial x > x. Or, equivalently, that x * factorial (x - 1) > x, knowing that factorial (x - 1) > x - 1; or that x*x - x > xwhich is true for x > 3. Let's see how this works formally in F*.

When defining a total recursive function, the function being defined (factorial_is_greater_than_arg in this case) is available for use in the body of the definition, but only at a type that ensures that the recursive call will terminate. In this case, the type of factorial_is_greater_than_arg in the body of the definition is:

n:nat{2 < n && n < x} -> Lemma (factorial n > n)

This is, of course, exactly our induction hypothesis. By calling factorial_is_greater_than_arg (x - 1) in the second case, we, in effect, make use of the induction hypothesis to prove that factorial (x - 1) > x - 1, provided we can prove that 2 < x - 1 && x - 1 < xwe give this to Z3 to prove, which it can, since in this case we know that x > 3. Finally, we have to prove the goal factorial x > x from the x > 3 and the property we obtained from our use of the induction hypothesis: again, Z3 proves this easily.

Exercise 4c:
Prove the following property for the fibonacci function:

val fibonacci : nat -> Tot nat
let rec fibonacci n =
  if n <= 1 then 1 else fibonacci (n - 1) + fibonacci (n - 2)

val fibonacci_greater_than_arg : n:nat{n >= 2} -> Lemma (fibonacci n >= n)

code/exercises/Ex03c.fstLoad in editor

exercise answer
let rec fibonacci_greater_than_arg n =
  match n with
  | 2 -> ()
  | _ -> fibonacci_greater_than_arg (n-1)

(*     
 Z3 proves the base case.
  
 The proof uses the induction hypothesis:
   forall x, 2 <= x < n ==> fibonacci x >= x
   
 Our goal is to prove that:
  fibonacci n >= n or equivalently fibonacci (n-1) + fibonacci (n-2) > n
  
 We make use of induction hypothesis to prove that 
  fibonacci (n-1) >= n-1
  
 For fibonacci (n-1) we use the property 
   forall x, fibonacci x > 1 
   
 This can be seen by giving fibonacci the stronger type
   val fibonacci : nat -> Tot (r:nat{r>=1})
   
 From this Z3 can proves the post condition 
*)

code/solutions/Ex03c.fstLoad in editor

4.4. Admit

When constructing proofs it is often useful to assume parts of proofs, and focus on other parts. For instance, when doing case analysis we often want to know which cases are trivial and are automatically solved by F*, and which ones need manual work. We can achieve this by admitting all but one case, if F* can prove the lemma automatically, then the case is trivial. For this F* provides an admit function, that can be used as follows:

    let rec factorial_is_greater_than_arg x =
      match x with
      | 3 -> ()
      | _ -> admit()

Since type-checking this succeeds we know that the base case of the induction is trivially proved and need to focus our effort on the inductive case.

4.5. Under the hood: bool versus Type

You can probably skip this section on a first reading, if the title sounds mysterious to you. However, as you start to use quantified formulas in refinement types, you should make sure you understand this section.

We mentioned earlier that a refinement type has the form x:t{phi(x)}. In its primitive form, phi is in fact a type, a predicate on the value x:t. We allow you to use a boolean function in place of a type: under the covers, we implicitly coerce the boolean to a type, by adding the coercion

b2t (b:bool) : Type = (b == true)

The type '==' is the type constructor for the homogeneous equality predicate; it is different from '=‘, which is a boolean function comparing a pair of values (F* also has a separate constructor for heterogeneous equality, ’===').

Remark. Equality is more complex than it looks! See also equality on F* wiki.

The propositional connectives, /\, \/, and ~, for conjunction, disjunction and negations are type constructors. We also include ==> and <==> for single and bidirectional implication. In contrast, &&, ||, and not are boolean functions.

Often, because of the implicit conversion from bool to Type, you can almost ignore the distinction between the two. However, when you start to write refinement formulas with quantifiers, the distinction between the two will become apparent.

Both the universal quantifier forall and the existential quantifier exists, are only available in Type. When you mix refinements that include boolean function and the quantifiers, you need to use the propositional connectives. For example, the following formula is well-formed:

canRead e /\ forall x. canWrite x ==> canRead x

It is a shorthand for the following, more elaborate form:

b2t(canRead e) /\ forall x. b2t(canWrite x) ==> b2t(canRead x)

On the other hand, the following formula is not well-formed; F* will reject it:

canRead e && forall x. canWrite x ==> canRead x

The reason is that the quantifier is a type, whereas the && operator expects a boolean. While a boolean can be coerced to a type, there is no coercion in the other direction.

5. Simple inductive types

Here's the standard definition of (immutable) lists in F*.

type list 'a =
  | Nil  : list 'a
  | Cons : hd:'a -> tl:list 'a -> list 'a

We note some important conventions for the constructors of a data type in F*.

The list type is defined in F*'s standard prelude and is available implicitly in all F* programs. We provide special (but standard) syntax for the list constructors:

You can always just write out the constructors like Nil and Cons explicitly, if you find that useful (e.g., to partially apply Cons).

5.1. Proofs about basic functions on lists

The usual functions on lists are programmed in F* just as in any other ML dialect. Here is the length function:

val length: list 'a -> Tot nat
let rec length l = match l with
  | [] -> 0
  | _ :: tl -> 1 + length tl

The type of length proves it is a total function on lists of 'a-typed values, returning a non-negative integer.

Exercise 5a:
Here's the append function that concatenates two lists.

let rec append l1 l2 = match l1 with
  | [] -> l2
  | hd :: tl -> hd :: append tl l2

code/exercises/Ex04a.fstLoad in editor

Give append a type that proves it always returns a list whose length is the sum of the lengths of its arguments.

exercise answer
    l1:list 'a -> l2:list 'a -> Tot (l:list 'a{length l = length l1 + length l2})

code/solutions/Ex04a.fstLoad in editor

Exercise 5b:
Give append the weak type below, then prove the same property as in the previous exercise using a lemma.

val append:list 'a -> list 'a -> Tot (list 'a)

code/exercises/Ex04b.fstLoad in editor

exercise answer
val append_len: l1:list 'a -> l2:list 'a
             -> Lemma (requires True)
                      (ensures (length (append l1 l2) = length l1 + length l2)))
let rec append_len l1 l2 = match l1 with
    | [] -> ()
    | hd::tl -> append_len tl l2

code/solutions/Ex04b.fstLoad in editor

We wrote the type of the function append above using an ML-like type variable 'a to stand for an arbitrary type. Under the hood F* expands this out to something like this:

val append : #a:Type -> list a -> list a -> Tot (list a)
let rec append #a l1 l2 = match l1 with
  | [] -> l2
  | hd :: tl -> hd :: append #a tl l2

This exposes the fact that a is really just an implicit argument of append. The implicit argument a is preceded by the # symbol both in the type of append, where we say that a has type Type, and in the definition of append, which takes an extra a argument. When calling append recursively we have a choice, we can either pass a explicitly by preceding it with the # sign, or we can leave it out and the F* type-checker will infer it automatically. (Implicit arguments are further discussed in 9.3.1.)

We would like to define a function mem: x:'a -> l:list 'a -> Tot bool that decides whether x is present in a list l or not. However, not all types in F* have decidable equality, so in order to write mem we cannot quantify over arbitrary types, but only over those with decidable equality. Thus the following definition of mem has to use #a:Type{hasEq a} instead of #a:Type (hasEq a holds if a has decidable equalitythat is, if there exists a function eq_a: a -> a -> Tot bool such that eq_a a1 a2 iff. a1 == a2). For convenience, the standard library provides eqtype as an abbreviation of a: Type{hasEq a}.

val mem: #a:eqtype -> a -> list a -> Tot bool
let rec mem #a x xs =
  match xs with
  | [] -> false
  | hd :: tl -> hd = x || mem x tl

Exercise 5c:
Prove that mem satisfies the following property:

val append_mem:  #a:eqtype -> l1:list a -> l2:list a -> x:a
        -> Lemma (mem x (append l1 l2) <==> mem x l1 || mem x l2)

code/exercises/Ex04c.fstLoad in editor

exercise answer
let rec append_mem #a l1 l2 x =
  match l1 with
  | [] -> ()
  | hd::tl -> append_mem tl l2 x

(*
 Z3 proves the base case: mem x (append [] l) <==> mem x l2.
  
 The proof uses the induction hypothesis:
   tl << l1 ==> mem x (append tl l2) <==> mem x tl || mem x l2.
   
 Our goal is to prove that:
  mem x (append l1 l2) <==> mem x l1 || mem x l2) or equivalently 
  let hd::tl = l1 in mem x (hd::(append tl l2)) <==> hd=x || mem x tl || mem x l2.

 From this Z3 can proves the post condition. 
*)

code/solutions/Ex04c.fstLoad in editor

5.2. To type intrinsically, or to prove lemmas?

As the previous exercises illustrate, you can prove properties either by enriching the type of a function or by writing a separate lemma about itwe call these the ‘intrinsic’ and ‘extrinsic’ styles, respectively. Which style to prefer is a matter of taste and convenience: generally useful properties are often good candidates for intrinsic specification (e.g, that length returns a nat); more specific properties are better stated and proven as lemmas. However, in some cases, as in the following example, it may be impossible to prove a property of a function directly in its typeyou must resort to a lemma.

val reverse: list 'a -> Tot (list 'a)
let rec reverse = function
  | [] -> []
  | hd :: tl -> append (reverse tl) [hd]

Let's try proving that reversing a list twice is the identity function. It's possible to specify this property in the type of reverse using a refinement type.

val reverse: #a:Type -> f:(list a -> Tot (list a)){forall l. l == f (f l)}

However, F* refuses to accept this as a valid type for reverse: proving this property requires two separate inductions, neither of which F* can perform automatically.

Instead, one can use two lemmas to prove the property we care about. Here it is:

let snoc l h = append l [h]

val snoc_cons: l:list 'a -> h:'a -> Lemma (reverse (snoc l h) == h :: reverse l)
let rec snoc_cons l h = match l with
  | [] -> ()
  | hd :: tl -> snoc_cons tl h

val rev_involutive: l:list 'a -> Lemma (reverse (reverse l) == l)
let rec rev_involutive l = match l with
  | [] -> ()
  | hd :: tl ->
    // (1) [reverse (reverse tl) == tl]
    rev_involutive tl;
    // (2) [reverse (append (reverse tl) [hd]) == h :: reverse (reverse tl)]
    snoc_cons (reverse tl) hd
    // These two facts are enough for Z3 to prove the lemma:
    //   reverse (reverse (hd :: tl))
    //   =def= reverse (append (reverse tl) [hd])
    //   =(2)= hd :: reverse (reverse tl)
    //   =(1)= hd :: tl
    //   =def= l

In the Cons case of rev_involutive we are explicitly applying not just the induction hypothesis but also the snoc_cons lemma we proved above.

Exercise 5d:
Prove that reverse is injective, i.e.,

val rev_injective : l1:list 'a -> l2:list 'a
                -> Lemma (requires (reverse l1 == reverse l2))
                         (ensures  (l1 == l2))

code/exercises/Ex04d.fstLoad in editor

exercise answer

Perhaps you did it like this:

val snoc_injective: l1:list 'a -> h1:'a -> l2:list 'a -> h2:'a
                 -> Lemma (requires (snoc l1 h1 == snoc l2 h2))
                          (ensures (l1 == l2 /\ h1 == h2))
let rec snoc_injective l1 h1 l2 h2 = match l1, l2 with
  | _::tl1, _::tl2 -> snoc_injective tl1 h1 tl2 h2
  | _ -> ()

val rev_injective1: l1:list 'a -> l2:list 'a
                -> Lemma (requires (reverse l1 == reverse l2))
                         (ensures  (l1 == l2)) (decreases l1)
let rec rev_injective1 l1 l2 =
  match l1,l2 with
  | h1::t1, h2::t2 ->
      // assert(reverse (h1::t1) = reverse (h2::t2));
      // assert(snoc (reverse t1) h1  = snoc (reverse t2) h2);
      snoc_injective (reverse t1) h1 (reverse t2) h2;
      // assert(reverse t1 = reverse t2 /\ h1 = h2);
      rev_injective1 t1 t2
      // assert(t1 = t2 /\ h1::t1 = h2::t2)
  | _, _ -> ()

code/solutions/Ex04d.fstLoad in editor

That's quite a tedious proof, isn't it. Here's a simpler proof.

val rev_injective2: l1:list 'a -> l2:list 'a
                -> Lemma (requires (reverse l1 == reverse l2))
                         (ensures  (l1 == l2))
let rev_injective2 #t l1 l2 = rev_involutive l1; rev_involutive l2

code/solutions/Ex04d.fstLoad in editor

The rev_injective2 proof is based on the idea that every involutive function is injective. We've already proven that reverse is involutive. So, we invoke our lemma, once for l1 and once for l2. This gives to the SMT solver the information that reverse (reverse l1) = l1 and reverse (reverse l2) = l2, which suffices to complete the proof. As usual, when structuring proofs, lemmas are your friends!

5.3. Higher-order functions on lists

5.3.1. Mapping

We can write higher-order functions on lists, i.e., functions taking other functions as arguments. Here is the familiar map:

let rec map f l = match l with
  | [] -> []
  | hd::tl -> f hd :: map f tl

It takes a function f and a list l and it applies f to each element in l producing a new list. More precisely map f [a1; ...; an] produces the list [f a1; ...; f an]. For example:

map (fun x -> x + 1) [0; 1; 2] = [1; 2; 3]

As usual, without any specification, F* will infer for map type ('a -> 'b) -> list 'a -> list 'b. Written explicitly, the type of map is ('a -> Tot 'b) -> Tot (list 'a -> Tot (list 'b)). Notice that the type of f also defaults to a function with Tot effect.

By adding a type annotation we can give a different type to map:

val map: ('a -> ML 'b) -> list 'a -> ML (list 'b)

This type is neither a subtype nor a supertype of the type F* inferred on its own (for example, the manual annotation above allows you to write map (fun x -> FStar.IO.print_any x) l, whereas the automatically inferred type does not.

No parametric polymorphism on effects: You might hope to give map a very general type, one that accepts a function with any effect E and returns a function with that same effect. This would require a kind of effect polymorphism, which F* does not support. This may lead to some code duplication. We plan to address this by allowing you to write more than one type for a value, an ad hoc overloading mechanism.

5.3.2. Finding

Exercise 5e:

We define a find function that given a boolean function f and a list l returns the first element in l for which f holds. If no element is found find returns None; for this we use the usual option type (shown below and available in F*'s standard prelude).

    type option 'a =
       | None : option 'a
       | Some : v:'a -> option 'a

    let rec find f l = match l with
      | [] -> None
      | hd :: tl -> if f hd then Some hd else find f tl

code/exercises/Ex04e.fstLoad in editor

Prove that if find returns Some x then f x = true. Is it better to do this intrinsically or extrinsically? Do it both ways.

exercise answer
(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex04e
//find

type option 'a =  
   | None : option 'a
   | Some : v:'a -> option 'a

(* The intrinsic style is more convenient in this case *)
val find : f:('a -> Tot bool) -> list 'a -> Tot (option (x:'a{f x}))
let rec find f l = match l with
  | [] -> None
  | hd :: tl -> if f hd then Some hd else find f tl

(* Extrinsically things are more verbose, since we are basically duplicating
   the structure of find in find_some. It's still not too bad. *)

val find' : #t:Type -> f:(t -> Tot bool) -> list t -> Tot (option t)
let rec find' #t f l = match l with
  | [] -> None
  | hd :: tl -> if f hd then Some hd else find' f tl

val find_some : f:('a -> Tot bool) -> xs:(list 'a) ->
    Lemma (None? (find' f xs) || f (Some?.v (find' f xs)))
let rec find_some f xs =
    match xs with
    | [] -> ()
    | x :: xs' -> find_some f xs'

(* [Some?.v] and [None?] are convenient ways to manipulate options.  They are
   described later in the tutorial.  Here's a more verbose way to achieve the
   same thing: *)

val find_some' : f:('a -> Tot bool) -> xs:(list 'a) ->
    Lemma (match find f xs with
           | None -> true
           | Some v -> f v)
let rec find_some' f xs =
    match xs with
    | [] -> ()
    | x :: xs' -> find_some' f xs'

code/solutions/Ex04e.fstLoad in editor

Exercise 5f:
Define the fold_left function on lists, so that

fold_left f [b1; ...; bn] a = f (bn, ... (f b2 (f b1 a)))

Prove that (fold_left Cons l [] == reverse l). (The proof is a level harder from what we've done so far. You will need to strengthen the induction hypothesis, and possibly to prove that append is associative)

val fold_left_cons_is_reverse: l:list 'a -> l':list 'a ->
      Lemma (fold_left Cons l l' == append (reverse l) l')

code/exercises/Ex04f.fstLoad in editor

exercise answer
val fold_left: f:('b -> 'a -> Tot 'a) -> l:list 'b -> 'a -> Tot 'a
let rec fold_left f l a = match l with
  | Nil -> a
  | Cons hd tl -> fold_left f tl (f hd a)

val append_assoc : #a:Type -> l1:list a -> l2:list a -> l3: list a ->
  Lemma (append l1 (append l2 l3) == append (append l1 l2) l3)
let rec append_assoc #a l1 l2 l3 =
  match l1 with
  | [] -> ()
  | h1 :: t1 -> append_assoc t1 l2 l3

let rec fold_left_Cons_is_rev #a (l1 l2: list a) :
  Lemma (fold_left Cons l1 l2 == append (reverse l1) l2) =
  match l1 with
  | [] -> ()
  | h1 :: t1 ->
    // (1) [append (append (reverse t1) [h1]) l2
    //      == append (reverse t1) (append [h1] l2)]
    append_assoc (reverse t1) [h1] l2;
    // (2) [fold_left Cons t1 (h1 :: l2) = append (reverse t1) (h1 :: l2)]
    fold_left_Cons_is_rev t1 (h1 :: l2)
    // append (reverse l1) l2
    // =def= append (append (reverse t1) [h1]) l2
    // =(1)= append (reverse t1) (append [h1] l2)
    // =def= append (reverse t1) (h1 :: l2)
    // =(2)= fold_left Cons t1 (h1 :: l2)
    // =def= fold_left Cons l1 l2

code/solutions/Ex04f.fstLoad in editor

5.4. Implicitly defined functions on inductive types

For each definition of an inductive type, F* automatically introduces a number of utility functions: for each constructor, we have a discriminator; and for each argument of each constructor, we have a projector. We describe these next, using the list type (repeated below) as a running example.

type list 'a =
  | Nil  : list 'a
  | Cons : hd:'a -> tl:list 'a -> list 'a

5.4.1. Discriminators

A discriminator is a boolean-valued total function that tests whether a term is the application of a particular constructor. For the list type, we have two discriminators that are automatically generated.

val Nil?: list 'a -> Tot bool
let Nil? xs = match xs with | [] -> true | _ -> false

val Cons?: list 'a -> Tot bool
let Cons? xs = match xs with | _ :: _ -> true | _ -> false

Identifiers like Nil? and Cons? cannot be defined by the user directly. They are reserved to be automatically generated by the compiler. So, the above code will not typecheck as writtenit's only for illustration purposes.

5.4.2. Projectors

From the type of the Cons constructor, F* generates the following signatures:

val Cons?.hd : l:list 'a{Cons? l} -> Tot 'a
val Cons?.tl : l:list 'a{Cons? l} -> Tot (list 'a)

This indicates why the names hd and tl in the types of Cons are significant: they correspond to the names of the auto-generated projectors. If you don't name the arguments of a data constructor, F* will generate names automatically (although these names are derived from the position of the arguments and their form is unspecified; so, it's good practice to name the arguments yourself).

Again, names like Cons?.hd are reserved for internal use.

Exercise 5g:
Write functions (call them hd and tl) whose types match those of Cons?.hd and Cons?.tl.

(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex04g
//hd-tl

val hd : l:list 'a{Cons? l} -> Tot 'a
val tl : l:list 'a{Cons? l} -> Tot (list 'a)

code/exercises/Ex04g.fstLoad in editor

exercise answer
(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex04g
//hd-tl

val hd : l:list 'a{Cons? l} -> Tot 'a
let hd l =
  match l with
  | h :: t -> h

val tl : l:list 'a{Cons? l} -> Tot (list 'a)
let tl l =
  match l with
  | h :: t -> t

code/solutions/Ex04g.fstLoad in editor

Exercise 5h:
Write a function that returns the nth element of a list. Give a type that ensures it is total.

(* Write your code here *)

code/exercises/Ex04h.fstLoad in editor

exercise answer
val nth : l:list 'a -> n:nat{n < length l} -> 'a
let rec nth l n =
  match l with
  | h :: t -> if n = 0 then h else nth t (n - 1)

(* Note how the solution above omits the [] case: that's because whenever you
   pattern match a term (to analyze it by cases), F\* insists on proving that you
   have handled all the cases. We call this the 'exhaustiveness check', and unlike
   in F\# or OCaml, it is implemented semantically. *)

code/solutions/Ex04h.fstLoad in editor

5.5. Tuples and records

F* provides syntactic sugar for tuples. Under the covers, tuples are just inductive types: tuples of size 28 are defined in the standard prelude. (You can add to this, if you like.) Here's the definition of pairs from the prelude:

type tuple2 'a 'b =
  | MkTuple2: _1:'a -> _2:'b -> tuple2 'a 'b

You can write (5, 7) instead of MkTuple2 5 7, and give it the type (int * int) as a shorthand for tuple2 int int.

This notation generalizes to tuples of arbitrary size, so long as the corresponding tupleN data type is defined in the prelude.

F* also provides syntax for records, although internally records (like tuples) boil down to data types with a single constructor. Here's an example:

type triple 'a 'b 'c = { fst:'a; snd:'b; third:'c }
let f x = x.fst + x.snd + x.third

F* accepts this program, giving f the type (triple int int int -> Tot int).

This is equivalent to the following, more verbose, source program:

type triple 'a 'b 'c =
  | MkTriple: fst:'a -> snd:'b -> third:'c -> triple 'a 'b 'c

let f x = MkTriple?.fst x + MkTriple?.snd x + MkTriple?.third x

6. Proving termination

In F* every pure function is proved to terminate. The termination check used by F* is based on a semantic criterion, not syntactic conditions. We quickly sketch the basic structure of the F* termination checkyou'll need to understand a bit of this in order to write more interesting programs.

6.1. A well-founded partial order on values

In order to prove a function terminating in F* one provides a measure: a pure expression depending on the function's arguments. F* checks that this measure strictly decreases on each recursive call. The measure for the arguments of the call is compared to the measure for the previous call according to a well-founded partial order on F* values. We write v1 << v2 when v1 precedes v2 in this order.

A relation R is a well-founded partial order on a set S if, and only if, R is a partial order on S and there are no infinite descending chains in S related by R. For example, taking S to be nat, the set of natural numbers, the integer ordering < is a well-founded partial order (in fact, it is a total order).

Since the measure strictly decreases on each recursive call, and there are no infinite descending chains, this guarantees that the function eventually stops making recursive calls, i.e., it terminates.

6.1.1. The precedes relation

  1. The ordering on integers: Given i, j : nat, we have i << j <==> i < j.

    Note: Negative integers are not related by the << relation, which is only a partial order.

  2. The subterm ordering on inductive types: For any value v = D v1 ... vn of an inductive type (where D is a constructor applied to arguments v1 to vn) we include vi << v, for all i. That is, the sub-terms of a well-typed constructed term precede the constructed term. (The exception to this rule is lex_t below).

  3. The lexicographic ordering on lex_t, as described below.

6.1.2. Lexically ordered pairs and lists

Take a look at the classic function, ackermann:

val ackermann: m:nat -> n:nat -> Tot nat
let rec ackermann m n =
  if m=0 then n + 1
  else if n = 0 then ackermann (m - 1) 1
  else ackermann (m - 1) (ackermann m (n - 1))

Why does it terminate? At each recursive call, it is not the case that all the arguments are strictly less than the arguments at the previous call, e.g, in the second branch, n increases from 0 to 1. However, this function does in fact terminate, and F* proves that it does.

The reason is that although each argument does not decrease in every recursive call, when taken together, the ordered pair of arguments (m,n) does decrease according to a lexical ordering on pairs.

In its standard prelude, F* defines the following inductive type:

type lex_t =
  | LexTop  : lex_t
  | LexCons : #a:Type -> a -> lex_t -> lex_t

(The # sign marks the first argument as implicit, see 9.3.1.) This is a list of heterogenously typed elements.

The lexicographic ordering on lex_t is the following:

We include the following syntactic sugar:

%[v1;...;vn] is shorthand for (LexCons v1 ... (LexCons vn LexTop))

6.2. A termination proof in detail

Suppose we are defining a function by recursion, using the following form:

val f: y1:t1 -> .. -> yn:tn -> Tot t
let rec f x1 .. xn = e

Usually, in ML, when type-checking e, the recursively bound function f is available for use in e at the type y1:t1 -> .. -> yn:tn -> Tot t, exactly the type written by the programmer. This type is, however, not sufficient to guarantee that an invocation of f does not trigger an infinite chain of recursive calls.

To prove termination (i.e., to rule out infinitely deep recursive calls), F* uses a different rule for type-checking recursive functions. The recursively bound name f is available for use in e, but only at the more restrictive type shown below:

   y1:t1
-> ...
-> yn:tn{%[y1;...;yn] << %[x1;...;xn]}
-> Tot t

That is, by default, F* requires the function's parameters to decrease according to the lexical ordering of the parameters (in the order in which they appear, excluding all function-typed parameters). (There is a way to override this defaultsee Section 6.2.2.)

Let's see how this works on a couple of examples.

6.2.1. Ackermann

F* can show automatically that the ackermann function

(* 1 *) let rec ackermann m n =
(* 2 *)   if m=0 then n + 1
(* 3 *)   else if n = 0 then ackermann (m - 1) 1
(* 4 *)   else ackermann (m - 1)
(* 5 *)                  (ackermann m (n - 1))

is terminating and has type:

ackermann : nat -> nat -> Tot nat

F* gives this function a default termination metric, so this expands out to:

ackermann : m:nat -> n:nat -> Tot nat (decreases %[m;n])

The decreases clause will be discussed in the next section. In this example it leads to the F* type-checker using the following more restrictive type for the recursive calls in the body of ackermann:

ackermann: m':nat
        -> n':nat{%[m';n'] << %[m;n]}
        -> Tot nat

At each of three recursive calls to ackermann, we need to prove not only that the arguments are non-negative, but that they precede the previous arguments according to the refinement formula above.

6.2.2. Decreases clauses

Of course, sometimes you will need to override the default measure for recursive functions. If we just swap the arguments of the ackermann function the default measure no longer works and we need to add a decreases clause explicitly choosing a lexicographic ordering that first compares m and then n:

val ackermann_swap: n:nat -> m:nat -> Tot nat (decreases %[m;n])
let rec ackermann_swap n m =
   if m=0 then n + 1
   else if n = 0 then ackermann_swap 1 (m - 1)
   else ackermann_swap (ackermann_swap (n - 1) m)
                       (m - 1)

The optional decreases clause is a second argument to the Tot effect. It is, in general, a total expression over the arguments of the function.

Exercise 6a:
Here is a more efficient (tail-recursive) variant of the list reverse function.

val rev : l1:list 'a -> l2:list 'a -> Tot (list 'a) (decreases l2)
let rec rev l1 l2 =
  match l2 with
  | []     -> l1
  | hd :: tl -> rev (hd :: l1) tl

code/exercises/Ex05a.fstLoad in editor

Prove the following lemma showing the correctness of this efficient implementation with respect to the previous simple implementation:

val rev_is_ok : l:list 'a -> Lemma (rev [] l = reverse l)

code/exercises/Ex05a.fstLoad in editor

exercise answer
val append_assoc : xs:list 'a -> ys:list 'a -> zs:list 'a -> Lemma
      (ensures (append (append xs ys) zs = append xs (append ys zs)))
let rec append_assoc xs ys zs =
  match xs with
  | [] -> ()
  | x :: xs' -> append_assoc xs' ys zs

val rev_is_ok_aux : l1:list 'a -> l2:list 'a -> Lemma
      (ensures (rev l1 l2 = append (reverse l2) l1)) (decreases l2)
let rec rev_is_ok_aux l1 l2 =
  match l2 with
  | [] -> ()
  | hd :: tl  -> rev_is_ok_aux (hd :: l1) tl; append_assoc (reverse tl) [hd] l1

val append_nil : xs:list 'a -> Lemma
      (ensures (append xs [] = xs))
let rec append_nil xs =
  match xs with
  | [] -> ()
  | _ :: tl  -> append_nil tl

val rev_is_ok : l:list 'a -> Lemma (rev [] l = reverse l)
let rev_is_ok l = rev_is_ok_aux [] l; append_nil (reverse l)

code/solutions/Ex05a.fstLoad in editor

Exercise 6b:
(Challenge!) Here is a more efficient (linear-time) variant of the Fibonacci function.

val fib : nat -> nat -> n:nat -> Tot nat (decreases n)
let rec fib a b n =
  match n with
  | 0 -> a
  | _ -> fib b (a+b) (n-1)

code/exercises/Ex05b.fstLoad in editor

Prove the following lemma showing the correctness of this efficient implementation with respect to the previous simple implementation:

val fib_is_ok : n:nat -> Lemma (fib 1 1 n = fibonacci n)

code/exercises/Ex05b.fstLoad in editor

exercise answer
val fib_is_ok_aux : (n: nat) -> (k: nat) ->
  Lemma (fib (fibonacci k) (fibonacci (k + 1)) n == fibonacci (k + n))
let rec fib_is_ok_aux n k =
  if n = 0 then () else fib_is_ok_aux (n - 1) (k + 1)

val fib_is_ok : n:nat -> Lemma (fib 1 1 n = fibonacci n)
let fib_is_ok n = fib_is_ok_aux n 0

code/solutions/Ex05b.fstLoad in editor

6.3. Mutually recursive functions

The same strategy for proving termination also works for mutually recursive functions. In the case of mutually recursive functions, the F* termination checker requires that all directly or mutually recursive calls decrease the termination metric of the called function. This is a quite strong requirement. Consider for instance the following example:

val foo : l:list int -> Tot int
val bar : l:list int -> Tot int
let rec foo l = match l with
    | [] -> 0
    | x :: xs -> bar xs
and bar l = foo l

This function is terminating because foo is always called (via bar) on a sub-list. The call from bar to foo is not on a smaller list though, and with the default metrics this example is rejected.

We can use a custom metric to convince F* that the two functions are terminating:

val foo : l:list int -> Tot int (decreases %[l;0])
val bar : l:list int -> Tot int (decreases %[l;1])

The metrics of foo and bar are both lexicographic tuples with the argument l in the first position and 0 or 1 in the second. The call from foo to bar decreases l, while the call from bar to foo keeps l the same, but decreases the second component from 1 to 0.

7. Putting the pieces together

We now look at two complete programs that exercise the various features we have learned about so far. In each case, there are some exercises to modify or generalize the programs.

7.1. Quicksort on lists

A canonical example for program verification is proving various sorting algorithms correct.

We'll start with lists of integers and describe some properties that we'd like to hold true of a sorting algorithm, starting with a function sorted, which decides when a list of integers is sorted in increasing order.

  val sorted: list int -> Tot bool
  let rec sorted l = match l with
      | [] -> true
      | [x] -> true
      | x :: y :: xs -> x <= y && sorted (y :: xs)

Given a sorting algorithm sort, we would like to prove the following property, where mem is the list membership function from the earlier exercise on lists (Section 5.1).

sorted (sort l) /\ ( forall i. mem i l <==> mem i (sort l) )

We will see below that this specification can still be improved.

7.1.1. Implementing Quicksort

Here's a simple implementation of the quicksort algorithm. It always picks the first element of the list as the pivot; partitions the rest of the list into those elements greater than or equal to the pivot, and the rest; recursive sorts the partitions; and slots the pivot in the middle before returning.

let rec sort l = match l with
  | [] -> []
  | pivot :: tl ->
    let hi, lo  = partition (fun x -> pivot <= x) tl in
    append (sort lo) (pivot :: sort hi)

Exercise 7a:
Write the partition function and prove it total.

val partition: ('a -> Tot bool) -> list 'a -> Tot (list 'a * list 'a)

code/exercises/Ex06a.fstLoad in editor

exercise answer

The following definition is included in F*'s list library.

val partition: ('a -> Tot bool) -> list 'a -> Tot (list 'a * list 'a)
let rec partition f = function
  | [] -> [], []
  | hd::tl ->
     let l1, l2 = partition f tl in
     if f hd
     then hd::l1, l2
     else l1, hd::l2

code/solutions/Ex06a.fstLoad in editor

7.1.2. Specifying sort

We'll verify sort intrinsically, although we'll have to use some additional lemmas. Here's a first-cut at a specification. Try asking F* to prove it.

val sort0: l:list int -> Tot (m:list int{sorted m /\ (forall i. mem i l <==> mem i m) })

Unsurprisingly, it fails, on several counts:

  1. We fail to prove the function total. On what grounds can we claim that each recursive call to sort is on a smaller argument? Intuitively, at each recursive call, the length of the list passed as an argument is strictly smaller, since it is a partition of tl. To express this, we'll need to add a decreases clause.
  1. We fail to prove both the sortedness property and the membership propertywe need some lemmas about partition and also about the sorted function.

Let's try again:

val sort: l:list int -> Tot (m:list int{sorted m
                                     /\ (forall i. mem i l <==> mem i m)})
                            (decreases (length l))

And here are a couple of lemmas about partition and sorted:

 val partition_lemma: f:('a -> Tot bool)
    -> l:list 'a
    -> Lemma (requires True)
             (ensures (
                  let l1, l2 = partition f l in
                  length l1 + length l2 = length l /\
                  (forall x. mem x l1 ==> f x) /\
                  (forall x. mem x l2 ==> not (f x)) /\
                  (forall x. mem x l = (mem x l1 || mem x l2))))
 let rec partition_lemma f l = match l with
   | [] -> ()
   | hd :: tl -> partition_lemma f tl

 val sorted_concat_lemma: l1:list int{sorted l1}
                       -> l2:list int{sorted l2}
                       -> pivot:int
                       -> Lemma (requires ((forall y. mem y l1 ==> not (pivot <= y))
                                        /\ (forall y. mem y l2 ==> pivot <= y)))
                                (ensures (sorted (append l1 (pivot :: l2))))
 let rec sorted_concat_lemma l1 l2 pivot = match l1 with
  | [] -> ()
  | hd :: tl -> sorted_concat_lemma tl l2 pivot

Finally, we also need a lemma about append and mem:

 val append_mem:  l1:list 'a
               -> l2:list 'a
               -> Lemma (requires True)
                        (ensures (forall a. mem a (append l1 l2) = (mem a l1 || mem a l2)))
 let rec append_mem l1 l2 = match l1 with
   | [] -> ()
   | hd :: tl -> append_mem tl l2

Armed with these lemmas, let's return to our implementation of quicksort. Unfortunately, with the lemmas as they are, we still can't prove our original implementation of quicksort correct. As we saw with our proofs of rev_injective (Section 5.1), we need to explicitly invoke our lemmas in order to use their properties. We'll be able to do better in a minute, but let's see how to modify sort by calling the lemmas to make the proof go through.

let cmp i j = i <= j
val sort_tweaked: l:list int -> Tot (m:list int{sorted m /\ (forall i. mem i l = mem i m)})
                                    (decreases (length l))
let rec sort_tweaked l = match l with
  | [] -> []
  | pivot :: tl ->
    let hi', lo' = partition (cmp pivot) tl in
    partition_lemma (cmp pivot) tl;
    let hi = sort_tweaked hi' in
    let lo = sort_tweaked lo' in
    append_mem lo (pivot :: hi);
    sorted_concat_lemma lo hi pivot;
    append lo (pivot :: hi)

That works, but it's pretty ugly. Not only did we have to pollute our implementation with calls to the lemmas, in order to make those calls, we had to further rewrite our code so that we could bind names for the arguments to those lemmas. We can do much better.

7.1.3. SMT Lemmas: Bridging the gap between intrinsic and extrinsic proofs

If only we had given partition and sorted richer intrinsic types, types that captured the properties that we proved using lemmas, we wouldn't have had to pollute our implementation of quicksort. This is a significant advantage of using the intrinsic proof styleevery call of a function like partition (fun x -> pivot <= x) tl immediately carries all the properties that were proven intrinsically about the function.

However, polluting the type of a general-purpose function with a very application-specific type would also be awful. Consider enriching the type of sorted with the property proven by sorted_concat_lemma. The lemma is highly specialized for the quicksort algorithm, whereas sorted is a nice, general specification of sortedness that would work for insertion sort or merge sort, just as well. Polluting its type is a non-starter. So, it seems we are caught in a bind.

Wouldn't it be nice if there were some way to retroactively associate a property proven about a function f by a lemma directly with f, so that every application f x immediately carries the lemma properties? F* provides exactly such a mechanism, which gives a way to bridge the gap between extrinsic and intrinsic proofs.

Let's say we wrote the type of partition_lemma as belowthe only difference is in the very last line, the addition of [SMTPat (partition f l)]. That line instructs F* and the SMT solver to associate with every occurrence of the term partition f l the property proven by the lemma.

val partition_lemma: f:('a -> Tot bool)
  -> l:list 'a
  -> Lemma (requires True)
           (ensures (
               let l1, l2 = partition f l in
               length l1 + length l2 = length l /\
               (forall x. mem x l1 ==> f x) /\
               (forall x. mem x l2 ==> not (f x)) /\
               (forall x. mem x l = (mem x l1 || mem x l2))))
           [SMTPat (partition f l)]

Likewise, here's a revised type for sorted_concat_lemma. This time, we instruct F* and the SMT solver to associate the lemma property only with very specific terms: just those that have the shape

sorted (append l1 (pivot :: l2))

The more specific a pattern you can provide, the more discriminating the SMT solver will be in applying lemmas, keeping your verification times acceptable.

val sorted_concat_lemma: l1:list int{sorted l1}
                      -> l2:list int{sorted l2}
                      -> pivot:int
                      -> Lemma (requires ((forall y. mem y l1 ==> not (pivot <= y))
                                       /\ (forall y. mem y l2 ==> pivot <= y)))
                               (ensures (sorted (append l1 (pivot :: l2))))
                         [SMTPat (sorted (append l1 (pivot :: l2)))]

Finally, we also revise the type for append_mem in the same way:

val append_mem:  l1:list 'a
              -> l2:list 'a
              -> Lemma (requires True)
                       (ensures (forall a. mem a (append l1 l2) = (mem a l1 || mem a l2)))
                       [SMTPat (append l1 l2)]
let rec append_mem l1 l2 = match l1 with
  | [] -> ()
  | hd :: tl -> append_mem tl l2

Remark. Use SMTPat with care! If you specify a bad pattern, you will get very unpredictable performance from the verification engine. To understand the details behind SMTPat, you should read up a bit on patterns (aka triggers) for quantifiers in SMT solvers.

With these revised signatures, we can write sort just as we originally did, and give it the type we want. The SMT solver implicitly derives the properties it needs from the lemmas that we have provided.

let cmp i j = i <= j
val sort: l:list int -> Tot (m:list int{sorted m /\ (forall i. mem i l = mem i m)})
                            (decreases (length l))
let rec sort l = match l with
  | [] -> []
  | pivot :: tl ->
    let hi, lo = partition (cmp pivot) tl in
    append (sort lo) (pivot :: sort hi)

7.1.4. Exercises

Exercise 7b:
Starting from this implementation (given below), generalize sort so that it works on lists with an arbitrary element type, instead of just lists of integers. Prove it correct.

(* Start from the code in the "Load in editor" link *)

code/exercises/Ex06b.fstLoad in editor

exercise answer
(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex06b
//quick-sort-poly

val mem: #t:eqtype -> t -> list t -> Tot bool
let rec mem #t a l = match l with
  | [] -> false
  | hd::tl -> hd=a || mem a tl


val append : list 'a -> list 'a -> Tot (list 'a)
let rec append l1 l2 = match l1 with
  | [] -> l2
  | hd :: tl -> hd :: append tl l2


val append_mem: #t:eqtype 
          -> l1:list t
              -> l2:list t
              -> Lemma (requires True)
                       (ensures (forall a. mem a (append l1 l2) = (mem a l1 || mem a l2)))
                       [SMTPat (append l1 l2)]
let rec append_mem #t l1 l2 = match l1 with
  | [] -> ()
  | hd::tl -> append_mem tl l2


val length: list 'a -> Tot nat
let rec length l = match l with
  | [] -> 0
  | _ :: tl -> 1 + length tl


(* Specification of sortedness according to some comparison function f *)
val sorted: ('a -> 'a -> Tot bool) -> list 'a -> Tot bool
let rec sorted f l = match l with
    | [] -> true
    | [x] -> true
    | x::y::xs -> f x y && sorted f (y::xs)


val partition: ('a -> Tot bool) -> list 'a -> Tot (list 'a * list 'a)
let rec partition f = function
  | [] -> [], []
  | hd::tl ->
     let l1, l2 = partition f tl in
     if f hd
     then hd::l1, l2
     else l1, hd::l2

val partition_lemma: #t:eqtype -> f:(t -> Tot bool)
   -> l:list t
   -> Lemma (requires True)
           (ensures (
             let l1, l2 = partition f l in
              length l1 + length l2 = length l /\
              (forall x. mem x l1 ==> f x) /\
              (forall x. mem x l2 ==> not (f x)) /\
              (forall x. mem x l = (mem x l1 || mem x l2))))
            [SMTPat (partition f l)]
let rec partition_lemma #t f l = match l with
    | [] -> ()
    | hd::tl -> partition_lemma f tl


(* Defining a new predicate symbol *)
type total_order (a:eqtype) (f: (a -> a -> Tot bool)) =
    (forall a. f a a)                                           (* reflexivity   *)
    /\ (forall a1 a2. (f a1 a2 /\ a1<>a2)  <==> not (f a2 a1))  (* anti-symmetry *)
    /\ (forall a1 a2 a3. f a1 a2 /\ f a2 a3 ==> f a1 a3)        (* transitivity  *)


val sorted_concat_lemma: #t:eqtype 
              -> f:(t -> t -> Tot bool)
                      -> l1:list t{sorted f l1}
                      -> l2:list t{sorted f l2}
                      -> pivot:t
                      -> Lemma (requires (total_order t f
                                       /\ (forall y. mem y l1 ==> not (f pivot y))
                                       /\ (forall y. mem y l2 ==> f pivot y)))
                               (ensures (sorted f (append l1 (pivot::l2))))
                               [SMTPat (sorted f (append l1 (pivot::l2)))]
let rec sorted_concat_lemma #t f l1 l2 pivot = match l1 with
    | [] -> ()
    | hd::tl -> sorted_concat_lemma f tl l2 pivot


val sort: #t:eqtype -> f:(t -> t -> Tot bool){total_order t f}
       -> l:list t
       -> Tot (m:list t{sorted f m /\ (forall i. mem i l = mem i m)})
              (decreases (length l))

let rec sort #t f l = match l with
  | [] -> []
  | pivot::tl ->
    let hi, lo = partition (f pivot) tl in
    let m = append (sort f lo) (pivot::sort f hi) in
    assert (forall i. mem i (pivot :: sort f hi) = mem i (append [pivot] (sort f hi)));
    m

code/solutions/Ex06b.fstLoad in editor

Exercise 7c:
Our specification for sort is incomplete. Why? Can you write a variation of sort that has the same specification as the one above, but discards some elements of the list?

(* Start from the code in the "Load in editor" link *)

code/exercises/Ex06b.fstLoad in editor

exercise answer

The specification does not ensure that the resulting list is a permutation of the initial list: it could discard or repeat some elements.

(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex06c


val mem: #t:eqtype-> t -> list t -> Tot bool
let rec mem #t a l = match l with
  | [] -> false
  | hd::tl -> hd=a || mem a tl


(* append now duplicates the elements of the first list *)
val append : list 'a -> list 'a -> Tot (list 'a)
let rec append l1 l2 = match l1 with
  | [] -> l2
  | hd :: tl -> hd :: hd :: append tl l2


val append_mem:  #t:eqtype -> l1:list t
              -> l2:list t
              -> Lemma (requires True)
                       (ensures (forall a. mem a (append l1 l2) = (mem a l1 || mem a l2)))
                       [SMTPat (append l1 l2)]
let rec append_mem #t l1 l2 = match l1 with
  | [] -> ()
  | hd::tl -> append_mem tl l2


val length: list 'a -> Tot nat
let rec length l = match l with
  | [] -> 0
  | _ :: tl -> 1 + length tl


val sorted: list int -> Tot bool
let rec sorted l = match l with
    | [] -> true
    | [x] -> true
    | x::y::xs -> x <= y && sorted (y::xs)


val partition: ('a -> Tot bool) -> list 'a -> Tot (list 'a * list 'a)
let rec partition f = function
  | [] -> [], []
  | hd::tl ->
     let l1, l2 = partition f tl in
     if f hd
     then hd::l1, l2
     else l1, hd::l2


val partition_lemma: #t:eqtype -> f:(t -> Tot bool)
                   -> l:list t
                   -> Lemma (requires True)
                            (ensures ((length (fst (partition f l)) + length (snd (partition f l)) = length l
                                  /\ (forall x. mem x (fst (partition f l)) ==> f x)
                                  /\ (forall x. mem x (snd (partition f l)) ==> not (f x))
                                  /\ (forall x. mem x l = (mem x (fst (partition f l)) || mem x (snd (partition f l)))))))
                            [SMTPat (partition f l)]
let rec partition_lemma #t f l = match l with
    | [] -> ()
    | hd::tl -> partition_lemma f tl


val sorted_concat_lemma: l1:list int{sorted l1}
                      -> l2:list int{sorted l2}
                      -> pivot:int
                      -> Lemma (requires ((forall y. mem y l1 ==> not (pivot <= y))
                                       /\ (forall y. mem y l2 ==> pivot <= y)))
                               (ensures (sorted (append l1 (pivot::l2))))
                               [SMTPat (sorted (append l1 (pivot::l2)))]
let rec sorted_concat_lemma l1 l2 pivot = match l1 with
    | [] -> ()
    | hd::tl -> sorted_concat_lemma tl l2 pivot


type match_head (l1:list int) (l2:list int) =
  (l1 = [] /\ l2 = []) \/
  (exists h t1 t2. l1 = h::t1 /\ l2 = h::t2)

val dedup: l:list int{sorted l} -> Tot (l2:list int{sorted l2 /\ (forall i. mem i l = mem i l2) /\ match_head l l2})
  let rec dedup l =
    match l with
    | [] -> []
    | [x] -> [x]
    | h::h2::t ->
      if h = h2 then dedup (h2::t)
      else  h::dedup (h2::t)


let cmp i j = i <= j
val sort: l:list int -> Tot (m:list int{sorted m /\ (forall i. mem i l = mem i m)})
                            (decreases (length l))
let rec sort l = match l with
  | [] -> []
  | pivot::tl ->
    let hi, lo = partition (cmp pivot) tl in
    let l' = append (sort lo) (pivot::sort hi) in
    assert (forall i. mem i (pivot::sort hi) = mem i (append [pivot] (sort hi)));
    dedup l'

code/solutions/Ex06c.fstLoad in editor

Exercise 7d:
Fix the specification to ensure that sort cannot discard any elements in the list. Prove that sort meets the specification.

(* Start from either the problem or the solution of exercise 6b *)

code/exercises/Ex06b.fstLoad in editor

exercise answer
(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex06d
#set-options "--z3rlimit_factor 2"

(* Instead of a Boolean check that an element belongs to a list, count
the number of occurrences of an element in a list *)
val count : #t:eqtype -> t -> list t -> Tot nat
let rec count #t (x:t) (l:list t) = match l with
  | hd::tl -> if hd=x then 1 + count x tl else count x tl
  | [] -> 0


let mem #t x l = count #t x l > 0


val append : list 'a -> list 'a -> Tot (list 'a)
let rec append l1 l2 = match l1 with
  | [] -> l2
  | hd :: tl -> hd :: append tl l2


val append_count: #t:eqtype -> l:list t -> m:list t
               -> Lemma (requires True)
                        (ensures (forall x. count x (append l m) = (count x l + count x m)))
let rec append_count #t l m = match l with
  | [] -> ()
  | hd::tl -> append_count tl m


val append_mem: #t:eqtype -> l:list t -> m:list t
               -> Lemma (requires True)
                        (ensures (forall x. mem x (append l m) = (mem x l || mem x m)))
let append_mem #t l m = append_count l m


val length: list 'a -> Tot nat
let rec length l = match l with
  | [] -> 0
  | _ :: tl -> 1 + length tl


(* Specification of sortedness according to some comparison function f *)
val sorted: ('a -> 'a -> Tot bool) -> list 'a -> Tot bool
let rec sorted f l = match l with
    | [] -> true
    | [x] -> true
    | x::y::xs -> f x y && sorted f (y::xs)


val partition: ('a -> Tot bool) -> list 'a -> Tot (list 'a * list 'a)
let rec partition f = function
  | [] -> [], []
  | hd::tl ->
     let l1, l2 = partition f tl in
     if f hd
     then hd::l1, l2
     else l1, hd::l2


val partition_lemma: #t:eqtype -> f:(t -> Tot bool)
                   -> l:list t
                   -> Lemma (requires True)
                            (ensures ((length (fst (partition f l)) + length (snd (partition f l)) = length l
                                  /\ (forall x. mem x (fst (partition f l)) ==> f x)
                                  /\ (forall x. mem x (snd (partition f l)) ==> not (f x))
                                  /\ (forall x. mem x l = (mem x (fst (partition f l)) || mem x (snd (partition f l))))
                                  /\ (forall x. count x l = (count x (fst (partition f l)) + count x (snd (partition f l)))))))
let rec partition_lemma #t f l = match l with
    | [] -> ()
    | hd::tl -> partition_lemma f tl


(* Defining a new predicate symbol *)
type total_order (a:eqtype) (f: (a -> a -> Tot bool)) =
    (forall a. f a a)                                           (* reflexivity   *)
    /\ (forall a1 a2. (f a1 a2 /\ a1<>a2)  <==> not (f a2 a1))  (* anti-symmetry *)
    /\ (forall a1 a2 a3. f a1 a2 /\ f a2 a3 ==> f a1 a3)        (* transitivity  *)


val sorted_concat_lemma: #a:eqtype
                      -> f:(a -> a -> Tot bool)
                      -> l1:list a{sorted f l1}
                      -> l2:list a{sorted f l2}
                      -> pivot:a
                      -> Lemma (requires (total_order a f
                                       /\ (forall y. mem y l1 ==> not (f pivot y))
                                       /\ (forall y. mem y l2 ==> f pivot y)))
                               (ensures (sorted f (append l1 (pivot::l2))))

let rec sorted_concat_lemma #a f l1 l2 pivot = 
  match l1 with
  | [] -> ()
  | hd::tl -> sorted_concat_lemma f tl l2 pivot


val sort: #t:eqtype -> f:(t -> t -> Tot bool){total_order t f}
       -> l:list t
       -> Tot (m:list t{sorted f m /\ (forall i. mem i l = mem i m) /\ (forall i. count i l = count i m)})
              (decreases (length l))
let rec sort #t f l = match l with
  | [] -> []
  | pivot::tl ->
    let hi, lo = partition (f pivot) tl in
    partition_lemma (f pivot) tl;
    let lo' = sort f lo in
    let hi' = sort f hi in
    append_count lo' (pivot::hi');
    append_mem lo' (pivot::hi');
    sorted_concat_lemma f lo' hi' pivot;
    append lo' (pivot::hi')

code/solutions/Ex06d.fstLoad in editor

Exercise 7e:
Implement insertion sort and prove the same properties.

val sort : l:list int -> Tot (m:list int{sorted m /\ (forall x. mem x l == mem x m)})

code/exercises/Ex06e.fstLoad in editor

exercise answer
(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex06e
//insertion-sort

(* Check that a list is sorted *)
val sorted: list int -> Tot bool
let rec sorted l = match l with
    | [] -> true
    | [x] -> true
    | x::y::xs -> (x <= y) && (sorted (y::xs))

(* Definition of the [mem] function *)
val mem: int -> list int -> Tot bool
let rec mem a l = match l with
  | [] -> false
  | hd::tl -> hd=a || mem a tl

(* A lemma to help Z3 *)
val sorted_smaller: x:int
                ->  y:int
                ->  l:list int
                ->  Lemma (requires (sorted (x::l) /\ mem y l))
                          (ensures (x <= y))
                          [SMTPat (sorted (x::l)); SMTPat (mem y l)]
let rec sorted_smaller x y l = match l with
    | [] -> ()
    | z::zs -> if z=y then () else sorted_smaller x y zs

(* Insertion function *)
val insert : i:int -> l:list int{sorted l} -> Tot (m:list int{sorted m /\ (forall x. mem x m <==> x==i \/ mem x l)})
let rec insert i l = match l with
  | [] -> [i]
  | hd::tl ->
     if i <= hd then
       i::l
     else
       (* Solver implicitly uses the lemma: sorted_smaller hd (Cons.hd i_tl) tl *)
       hd::(insert i tl)

(* Insertion sort *)
val sort : l:list int -> Tot (m:list int{sorted m /\ (forall x. mem x l == mem x m)})
let rec sort l = match l with
    | [] -> []
    | x::xs -> insert x (sort xs)

code/solutions/Ex06e.fstLoad in editor

8. Case study: simply-typed lambda-calculus

We now look at a larger case study: proving the soundness of a type-checker for the simply-typed $\lambda$-calculus (STLC). If you're not familiar with STLC, you can have a look at the Software Foundations book for a gentle introduction given by the textual explanations (you can ignore the Coq parts there). The formalization and proof here closely follows the one in Software Foundations. Our proofs are, however, shorter and much more readable than Coq proofs.

8.1. Syntax

We represent STLC types by the F* inductive datatype ty.

type ty =
  | TBool  : ty
  | TArrow : tin:ty -> tout:ty -> ty

We consider Booleans as the only base type (TBool). Function types are represented by the TArrow constructor taking two type arguments. For instance we write TArrow TBool TBool for the type of functions taking a Boolean argument and returning a Boolean result. This would be written as bool -> bool in F* syntax, and $\mathsf{bool} \to \mathsf{bool}$ in paper notation.

We represent the expressions of STLC by the datatype exp.

type exp =
  | EVar   : v:int -> exp
  | EApp   : fn:exp -> arg:exp -> exp
  | EAbs   : v:int -> vty:ty -> body:exp -> exp
  | ETrue  : exp
  | EFalse : exp
  | EIf    : test:exp -> btrue:exp -> bfalse:exp -> exp

Variables are represented as integer “names” decorated by the constructor EVar. Variables are “bound” by lambda abstractions (EAbs). For instance the identity function on Booleans is written EAbs 0 TBool (EVar 0). In paper notation one would write this function as $(\lambda x:\mathsf{bool}.~x)$. The type annotation on the argument (TBool) allows for very simple type-checking. We are not considering type inference here, to keep things simple. The expression that applies the identity function to the ETrue constant is written

let stlc_app_id_to_true = EApp (EAbs 0 TBool (EVar 0)) ETrue

(in paper notation $(\lambda x:\mathsf{bool}.~x)~\mathsf{true}$).

The language also has a conditional construct (if-then-else). For instance, the Boolean “not” function can be written as

let stlc_not = EAbs 0 TBool (EIf (EVar 0) EFalse ETrue)

(in paper notation $\lambda x:\mathsf{bool}.~\mathsf{if }~x~\mathsf{ then~false~else~true}$).

8.2. Operational semantics

We define a standard small-step call-by-value interpreter for STLC. The final result of successfully evaluating an expression is called a value. We postulate that functions and the Boolean constants are values by defining is_value, a boolean predicate on expressions (a total function):

val is_value : exp -> Tot bool
let is_value e =
  match e with
  | EAbs _ _ _
  | ETrue
  | EFalse     -> true
  | _          -> false

The EAbs, ETrue, and EFalse cases share the same right-hand-side (true), which is a way to prevent duplication in definitions.

In order to give a semantics to function applications we define a function subst x e e' that substitutes x with e in e':

val subst : int -> exp -> exp -> Tot exp
let rec subst x e e' =
  match e' with
  | EVar x' -> if x = x' then e else e'
  | EAbs x' t e1 ->
      EAbs x' t (if x = x' then e1 else (subst x e e1))
  | EApp e1 e2 -> EApp (subst x e e1) (subst x e e2)
  | ETrue -> ETrue
  | EFalse -> EFalse
  | EIf e1 e2 e3 -> EIf (subst x e e1) (subst x e e2) (subst x e e3)

We traverse the expression and when we reach a variable (EVar x') we check whether this is the variable x we want to substitute, and if it is we replace it by e. For lambda abstractions (EAbs x' t e1) we only substitute inside the body e1 if x and x' are different; if they are the same we leave the body unchanged. The reason for this is that the x in e1 is bound by the abstraction: it is a new, local name that just happens to be spelled the same as some global name x. The global x is no longer accessible in this scope, since it is shadowed by the local x. The other cases are straightforward.

Note for experts: Because we will only reduce closed expressions, where all variables are bound by previous lambdas, we will only ever substitute closed expressions e, and this naive definition of substitution is good enough. Substitution would become trickier to define or the representation of variables would have to change if we were considering the case where e, the expression replacing a variable in some other expression, may itself contain free variables.

Given the definition of values and of substitution we can now define a small-step interpreter, as a function step that takes an expression e and it either returns the expression to which e reduces in a single step, or it returns None in case of an error (all errors in this language are typing errors, and will be prevented statically by the type system).

val step : exp -> Tot (option exp)
  let rec step e =
  match e with
  | EApp e1 e2 ->
      if is_value e1 then
        if is_value e2 then
          match e1 with
          | EAbs x t e' -> Some (subst x e2 e')
          | _           -> None
        else
          match (step e2) with
          | Some e2' -> Some (EApp e1 e2')
          | None     -> None
      else
        (match (step e1) with
        | Some e1' -> Some (EApp e1' e2)
        | None     -> None)
  | EIf e1 e2 e3 ->
      if is_value e1 then
        match e1 with
        | ETrue   -> Some e2
        | EFalse  -> Some e3
        | _       -> None
      else
        (match (step e1) with
        | Some e1' -> Some (EIf e1' e2 e3)
        | None     -> None)
  | _ -> None

We execute an application expression EApp e1 e2 in multiple steps by first reducing e1 to a value, then reducing e2 to a value (following a call-by-value evaluation order), and if additionally e1 is an abstraction EAbs x t e' we continue by substituting the formal argument x by the actual argument e2. If not we signal a dynamic typing error (a non-functional value is applied to arguments) by returning None. For EIf e1 e2 e3 we first reduce the guard e1: if the guard reduces to true then we continue with e2, if the guard reduces to false we continue with e3, and if the guard reduces to something else (e.g. a function) we report a dynamic typing error. The None -> None cases simply propagate errors to the top level.

let _ = assert (step (EApp (EAbs 0 TBool (EVar 0)) ETrue) = Some ETrue)
let _ = assert (step (EApp ETrue ETrue) = None)

8.3. Type-checker

In order to assign a type to a term, we need to know what assumptions we should make about the types of its free variables. So typing happens with respect to a typing environmenta mapping from the variables in scope to their types. We represent such partial maps as functions taking an integer variable name and returning an optional type:

type env = int -> Tot (option ty)

We start type-checking closed terms in the empty environment, i.e. initially no variables are in scope.

val empty : env
let empty = fun _ -> None

When we move under a binder we extend the typing environment.

val extend : env -> int -> ty -> Tot env
let extend g x t = fun x' -> if x = x' then Some t else g x'

For instance we type-check EAbs x t e in typing environment g by first type-checking the body e in the environment extend g x t.

The type-checker is a total function taking an environment g and an expression e and producing either the type of e or None if e is not well-typed.

val typing : env -> exp -> Tot (option ty)
let rec typing g e =
  match e with
  | EVar x -> g x
  | EAbs x t e1 ->
      (match typing (extend g x t) e1 with
      | Some t' -> Some (TArrow t t')
      | None    -> None)
  | EApp e1 e2 ->
      (match typing g e1, typing g e2 with
      | Some (TArrow t11 t12), Some t2 -> if t11 = t2 then Some t12 else None
      | _                    , _       -> None)
  | ETrue  -> Some TBool
  | EFalse -> Some TBool
  | EIf e1 e2 e3 ->
      (match typing g e1, typing g e2, typing g e3 with
      | Some TBool, Some t2, Some t3 -> if t2 = t3 then Some t2 else None
      | _         , _      , _       -> None)

Variables are simply looked up in the environment. For abstractions EAbs x t e1 we type-check the body e1 under the environment extend g x t, as explained above. If that succeeds and produces a type t', then the whole abstraction is given type TArrow t t'. For applications EApp e1 e2 we type-check e1 and e2 separately, and if e1 has a function type TArrow t11 t12 and e2 has type t11, then the whole application has type t12. ETrue and EFalse have type TBool. For conditionals, we require that the guard has type TBool and the two branches have the same type, which is also the type of the conditional.

8.4. Soundness proof

We prove progress and preservation for STLC. The progress theorem tells us that closed, well-typed terms do not produce (immediate) dynamic typing errors: either a well-typed term is a value, or it can take an evaluation step. The proof is a relatively straightforward induction.

val progress : e:exp -> Lemma
      (requires (Some? (typing empty e)))
      (ensures (is_value e \/ (Some? (step e))))
let rec progress e =
  match e with
  | EApp e1 e2 -> progress e1; progress e2
  | EIf e1 e2 e3 -> progress e1; progress e2; progress e3
  | _ -> ()

Variables are not well-typed in the empty environment, so the theorem holds vacuously for variables. Boolean constants and abstractions are values, so the theorem holds trivially for these. All these simple cases are proved automatically by F*. For the remaining cases we need to use the induction hypothesis, but otherwise the proofs are fully automated. Under the hood F* and Z3 are doing quite a bit of work though.

In case e = (EApp e1 e2) F* and Z3 automate the following intuitive argument: We case split on the first instance of the induction hypothesis (is_value e1 \/ (Some? (step e1))).

The intuitive proof of the EIf case is similar.

The preservation theorem (sometimes also called “subject reduction”) states that when a well-typed expression takes a step, the result is a well-typed expression of the same type. In order to show preservation we need to prove a couple of auxiliary results for reasoning about variables and substitution.

The case for function application has to reason about “beta reduction” steps, i.e. substituting the formal argument of a function with an actual value. To see that this step preserves typing, we need to know that the substitution itself does. So we prove a substitution lemma, stating that substituting a (closed) term v for a variable x in an expression e preserves the type of e. The tricky cases in the substitution proof are the ones for variables and for function abstractions. In both cases, we discover that we need to take an expression e that has been shown to be well-typed in some context g and consider the same expression e in a slightly different context g'. For this we prove a context invariance lemma, showing that typing is preserved under “inessential changes” to the context gin particular, changes that do not affect any of the free variables of the expression. For this, we need a definition of the free variables of an expressioni.e., the variables occurring in the expression that are not bound by an abstraction.

A variable x appears free in e if e contains some occurrence of x that is not under an abstraction labeled x:

val appears_free_in : x:int -> e:exp -> Tot bool
let rec appears_free_in x e =
  match e with
  | EVar y -> x = y
  | EApp e1 e2 -> appears_free_in x e1 || appears_free_in x e2
  | EAbs y _ e1 -> x <> y && appears_free_in x e1
  | EIf e1 e2 e3 ->
      appears_free_in x e1 || appears_free_in x e2 || appears_free_in x e3
  | ETrue
  | EFalse -> false

We also need a technical lemma connecting free variables and typing contexts. If a variable x appears free in an expression e, and if we know that e is well typed in context g, then it must be the case that g assigns some type to x.

val free_in_context : x:int -> e:exp -> g:env -> Lemma
      (requires (Some? (typing g e)))
      (ensures (appears_free_in x e ==> Some? (g x)))
let rec free_in_context x e g =
  match e with
  | EVar _
  | ETrue
  | EFalse -> ()
  | EAbs y t e1 -> free_in_context x e1 (extend g y t)
  | EApp e1 e2 -> free_in_context x e1 g; free_in_context x e2 g
  | EIf e1 e2 e3 -> free_in_context x e1 g;
                    free_in_context x e2 g; free_in_context x e3 g

The proof is a straightforward induction. As a corollary for g == empty we obtain that expressions typable in the empty environment have no free variables.

val typable_empty_closed : x:int -> e:exp -> Lemma
      (requires (Some? (typing empty e)))
      (ensures (not(appears_free_in x e)))
      [SMTPat (appears_free_in x e)]
let typable_empty_closed x e = free_in_context x e empty

The quantifier pattern [SMTPat (appears_free_in x e)] signals to Z3 that it should consider applying this lemma when its context contains a term of the form appears_free_in

Sometimes, we know that typing g e = Some t, and we will need to replace g by an “equivalent” context g'. We still need to define formally when two environments are equivalent. A natural definition is extensional equivalence of functions:

logic type equal (g1:env) (g2:env) = forall (x:int). g1 x = g2 x

According to this definition two environments are equivalent if have the same domain and they map all variables in the domain to the same type. We remark equal in particular and logical formulas in general are types in F*, thus the different syntax for this definition.

The context invariance lemma uses in fact a weaker variant of this equivalence in which the two environments only need to agree on the variables that appear free in an expression e:

logic type equalE (e:exp) (g1:env) (g2:env) =
  forall (x:int). appears_free_in x e ==> g1 x = g2 x

The context invariance lemma is then easily proved by induction:

val context_invariance : e:exp -> g:env -> g':env -> Lemma
  (requires (equalE e g g'))
  (ensures (typing g e == typing g' e))
let rec context_invariance e g g' =
  match e with
  | EAbs x t e1 ->
     context_invariance e1 (extend g x t) (extend g' x t)

  | EApp e1 e2 ->
     context_invariance e1 g g';
     context_invariance e2 g g'

  | EIf e1 e2 e3 ->
     context_invariance e1 g g';
     context_invariance e2 g g';
     context_invariance e3 g g'

  | _ -> ()

Because equal is a stronger relation than equalE we obtain the same property for equal:

val typing_extensional : g:env -> g':env -> e:exp -> Lemma
  (requires (equal g g'))
  (ensures (typing g e == typing g' e))
let typing_extensional g g' e = context_invariance e g g'

We can use these results to show the following substitution lemma:

val substitution_preserves_typing : x:int -> e:exp -> v:exp -> g:env -> Lemma
  (requires (Some? (typing empty v) /\
             Some? (typing (extend g x (Some?.v (typing empty v))) e)))
  (ensures (typing g (subst x v e) ==
            typing (extend g x (Some?.v (typing empty v))) e))
let rec substitution_preserves_typing x e v g =
  let Some t_x = typing empty v in
  let gx = extend g x t_x in
  match e with
  | ETrue -> ()
  | EFalse -> ()
  | EVar y ->
     if x=y
     then context_invariance v empty g (* uses lemma typable_empty_closed *)
     else context_invariance e gx g

  | EApp e1 e2 ->
     substitution_preserves_typing x e1 v g;
     substitution_preserves_typing x e2 v g

  | EIf e1 e2 e3 ->
     substitution_preserves_typing x e1 v g;
     substitution_preserves_typing x e2 v g;
     substitution_preserves_typing x e3 v g

  | EAbs y t_y e1 ->
     let gxy = extend gx y t_y in
     let gy = extend g y t_y in
     if x=y
     then typing_extensional gxy gy e1
     else
       (let gyx = extend gy x t_x in
        typing_extensional gxy gyx e1;
        substitution_preserves_typing x e1 v gy)

The proof proceeds by induction on the expression e; we give the intuition of the two most interesting cases:

We now have the tools we need to prove preservation: if a closed expression e has type t and takes an evaluation step to e', then e' is also a closed expression with type t. In other words, the small-step evaluation relation preserves types.

val preservation : e:exp -> Lemma
  (requires (Some? (typing empty e) /\ Some? (step e) ))
  (ensures (Some? (step e) /\
            typing empty (Some?.v (step e)) == typing empty e))
let rec preservation e =
  match e with
  | EApp e1 e2 ->
     if is_value e1
     then (if is_value e2
           then let EAbs x _ ebody = e1 in
                substitution_preserves_typing x ebody e2 empty
           else preservation e2)
     else preservation e1

  | EIf e1 _ _ ->
      if not (is_value e1) then preservation e1

We only have two cases to consider, since only applications and conditionals can take successful execution steps. The case for e = EIf e1 e2 e3 is simple: either e1 is a value and thus the conditional reduces to e2 or e3 which by the typing hypothesis also have type t, or e1 takes a successful step and we can apply the induction hypothesis. We use the Some?.v projector, which requires F* to prove that indeed e1 can take a step; this is immediate since we know that the whole conditional takes a step and we know that e1 is not a value.

The case for e = EApp e1 e2 is a bit more complex. If e1 steps or if e1 is a value and e2 steps then we just apply the induction hypothesis. If both e1 and e2 are values it needs to be the case that e1 is an abstraction EAbs x targ ebody and step e = subst x e2 ebody. From the typing assumption we have typing (extend empty x tags) ebody = Some t and typing empty e2 = Some targ, so we can use the substitution lemma to obtain that typing empty (subst x e2 ebody) = Some t, which concludes the proof.

8.5. Exercises for STLC

Exercise 8a:
Define a typed_step step function that takes a well-typed expression e that is not a value and produces the expression to which e steps. Give typed_step the following strong type (basically this type captures both progress and preservation):

  val typed_step : e:exp{Some? (typing empty e) /\ not(is_value e)} ->
                   Tot (e':exp{typing empty e' = typing empty e})

code/exercises/Ex07a.fstLoad in editor

(Hint: the most direct solution to this exercise fits on one line)

exercise answer
    let typed_step e = progress e; preservation e; Some?.v (step e)

code/solutions/Ex07a.fstLoad in editor

Exercise 8b:
To add pairs to this formal development we add the following to the definition of types, expressions, values, substitution, and step:

type ty =
  ...
  | TPair : ty -> ty -> ty

type exp =
  ...
  | EPair  : exp -> exp -> exp
  | EFst   : exp -> exp
  | ESnd   : exp -> exp

code/exercises/Ex07b.fstLoad in editor

(note the extra rec in is_value below!)

let rec is_value e =
  match e with
  ...
  | EPair e1 e2 -> is_value e1 && is_value e2

let rec subst x e e' =
  match e' with
  ...
  | EPair e1 e2 -> EPair (subst x e e1) (subst x e e2)
  | EFst e1 -> EFst (subst x e e1)
  | ESnd e1 -> ESnd (subst x e e1)

let rec step e =
  ...
  | EPair e1 e2 ->
      if is_value e1 then
        if is_value e2 then None
        else
          (match (step e2) with
          | Some e2' -> Some (EPair e1 e2')
          | None     -> None)
      else
        (match (step e1) with
        | Some e1' -> Some (EPair e1' e2)
        | None     -> None)
  | EFst e1 ->
      if is_value e1 then
        (match e1 with
        | EPair v1 v2 -> Some v1
        | _           -> None)
      else
        (match (step e1) with
        | Some e1' -> Some (EFst e1')
        | None     -> None)
  | ESnd e1 ->
      if is_value e1 then
        (match e1 with
        | EPair v1 v2 -> Some v2
        | _           -> None)
      else
        (match (step e1) with
        | Some e1' -> Some (ESnd e1')
        | None     -> None)

code/exercises/Ex07b.fstLoad in editor

Add cases to typing for the new constructs and fix all the proofs.

exercise answer

We extend the typing and appears_free_in functions with cases for EPair, EFst, and ESnd:

val typing : env -> exp -> Tot (option ty)
let rec typing g e =
...
  | EPair e1 e2 ->
      (match typing g e1, typing g e2 with
      | Some t1, Some t2 -> Some (TPair t1 t2)
      | _      , _       -> None)
  | EFst e1 ->
      (match typing g e1 with
      | Some (TPair t1 t2) -> Some t1
      | _                  -> None)
  | ESnd e1 ->
      (match typing g e1 with
      | Some (TPair t1 t2) -> Some t2
      | _                  -> None)

val appears_free_in : x:int -> e:exp -> Tot bool
...
  | EPair e1 e2 -> appears_free_in x e1 || appears_free_in x e2
  | EFst e1 -> appears_free_in x e1
  | ESnd e1 -> appears_free_in x e1

code/solutions/Ex07b.fstLoad in editor

The proofs of the lemmas are also easy to extend by just calling the induction hypothesis:

val free_in_context : x:int -> e:exp -> g:env -> Lemma
      (requires (Some? (typing g e)))
      (ensures (appears_free_in x e ==> Some? (g x)))
let rec free_in_context x e g =
...
  | EPair e1 e2 -> free_in_context x e1 g; free_in_context x e2 g
  | EFst e1
  | ESnd e1 -> free_in_context x e1 g

val context_invariance : e:exp -> g:env -> g':env
                     -> Lemma
                          (requires (equalE e g g'))
                          (ensures (typing g e == typing g' e))
let rec context_invariance e g g' =
...
| EPair e1 e2 ->
     context_invariance e1 g g';
     context_invariance e2 g g'

  | EFst e1
  | ESnd e1 -> context_invariance e1 g g'

val substitution_preserves_typing : x:int -> e:exp -> v:exp ->
      g:env{Some? (typing empty v) &&
            Some? (typing (extend g x (Some?.v (typing empty v))) e)} ->
      Tot (u:unit{typing g (subst x v e) ==
                  typing (extend g x (Some?.v (typing empty v))) e})
let rec substitution_preserves_typing x e v g =
...
  | EPair e1 e2 ->
     (substitution_preserves_typing x e1 v g;
      substitution_preserves_typing x e2 v g)

  | EFst e1
  | ESnd e1 ->
      substitution_preserves_typing x e1 v g

code/solutions/Ex07b.fstLoad in editor

As for the other cases, the preservation proof when e = EPair e1 e2 follows the structure of the step function. If e1 is not a value then it further evaluates, so we apply the induction hypothesis to e1. If e1 is a value, then since we know that the pair evaluates, it must be the case that e2 is not a value and further evaluates, so we apply the induction hypothesis to it. The cases for EFst and ESnd are similar.

val preservation : e:exp{Some? (typing empty e) /\ Some? (step e)} ->
      Tot (u:unit{typing empty (Some?.v (step e)) == typing empty e})
let rec preservation e =
...
  | EPair e1 e2 ->
      (match is_value e1, is_value e2 with
      | false, _     -> preservation e1
      | true , false -> preservation e2)

  | EFst e1
  | ESnd e1 ->
      if not (is_value e1) then preservation e1

code/solutions/Ex07b.fstLoad in editor

Exercise 8c:
We want to add a let construct to this formal development. We add the following to the definition of expressions:

type exp =
  ...
  | ELet  : int -> exp -> exp -> exp

code/exercises/Ex07c.fstLoad in editor

Add cases for ELet to all definitions and proofs.

Load answer in editor

Exercise 8d:
Define a big-step interpreter for STLC as a recursive function eval that given a well-typed expression e either produces the well-typed value v to which e evaluates or it diverges if the evaluation of e loops. Give eval the following strong type ensuring that v has the same type as e (basically this type captures both progress and preservation):

  val eval : e:exp{Some? (typing empty e)} ->
             Dv (v:exp{is_value v && typing empty v = typing empty e})

code/exercises/Ex07d.fstLoad in editor

The Dv effect is that of potentially divergent computations. We cannot mark this as Tot since a priori STLC computations could loop, and it is hard to prove that well-typed ones don't.

exercise answer

Here is a solution that only uses typed_step (suggested by Santiago Zanella):

val eval : e:exp{Some? (typing empty e)} ->
           Dv (v:exp{is_value v && typing empty v = typing empty e})
let rec eval e =
  if is_value e then e
  else eval (typed_step e)

code/solutions/Ex07d.fstLoad in editor

or using the progress and preservation lemmas instead of typed_step (suggested by Guido Martinez):

val eval : e:exp{Some? (typing empty e)} ->
  Dv (v:exp{is_value v && typing empty v = typing empty e})
let rec eval e = match step e with
  | None -> progress e; e
  | Some e' -> preservation e; eval e'

Here is another solution that only uses the substitution lemma:

val eval' : e:exp{Some? (typing empty e)} ->
            Dv (v:exp{is_value v && typing empty v = typing empty e})
let rec eval' e =
  let Some t = typing empty e in
  match e with
  | EApp e1 e2 ->
     (let EAbs x _ e' = eval' e1 in
      let v = eval' e2 in
      substitution_preserves_typing x e' v empty;
      eval' (subst x v e'))
  | EAbs _ _ _
  | ETrue
  | EFalse     -> e
  | EIf e1 e2 e3 ->
     (match eval' e1 with
      | ETrue  -> eval' e2
      | EFalse -> eval' e3)
  | EPair e1 e2 -> EPair (eval' e1) (eval' e2)
  | EFst e1 ->
     let EPair v1 _ = eval' e1 in v1
  | ESnd e1 ->
     let EPair _ v2 = eval' e1 in v2
  | ELet x e1 e2 ->
     (let v = eval' e1 in
      substitution_preserves_typing x e2 v empty;
      eval' (subst x v e2))

code/solutions/Ex07d.fstLoad in editor

9. Higher kinds, indexed types, implicit arguments, and type-level functions

So far, we have mostly programmed total functions and given them specifications using rich types. Under the covers, F* has a second level of checking at work to ensure that the specifications that you write are also meaningfulthis is F*'s kind system. Before moving on to more advanced F* features, you'll have to understand the kind system a little. In short, just as types describe properties of programs, kinds describe properties of types.

9.1. Type: The type of types

The basic construct in the kind system is the constant Type, which is the basic type of a type. For example, just as we say that 0 : int (zero has type int), we also say int : Type (int has kind Type), bool : Type, nat : Type etc.

9.2. Arrow kinds

Beyond the base types, consider types like list int, list bool etc. These are also types, of course, and in F* list int : Type and list bool : Type. But, what about the unapplied type constructor listit's not a Type on its own; it only produces a Type when applied to a Type. In that sense, list is type constructor. To describe this, just as we use the -> type constructor to describe functions at the level of programs, we use the -> kind constructor to describe functions at the level of types. As such, we write list : Type -> Type, meaning that it produces a Type when applied to a Type.

9.3. Indexed types and implicit arguments

Beyond inductive types like list, F* also supports inductive type families, or GADTs. For example, here is one definition of a vector, a length-indexed list.

type vector (a:Type) : nat -> Type =
   | Nil :  vector a 0
   | Cons : hd:a -> n:nat -> tl:vector a n -> vector a (n + 1)

Here, we define a type constructor vector (a:Type) : nat -> Type, which takes two arguments: it produces a Type when applied first to a Type and then to a nat.

The point is to illustrate that just as functions at the level of programs may take either type- or term-arguments, types themselves can take either types or pure expressions as arguments.

9.3.1. Implicit arguments

When writing functions over terms with indexed types, it is convenient to mark some arguments as implicit, asking the type-checker to infer them rather than having to provide them manually.

For example, the type of a function that reverses a vector can be written as follows:

val reverse_vector: #a:Type -> #n:nat -> vector a n -> Tot (vector a n)

This indicates that reverse_vector expects three arguments, but the # sign mark the first two arguments as implicit. Given a term v : vector t m, simply writing reverse_vector v will type-check as vector t minternally, F* elaborates the function call to reverse_vector #t #m v, instantiating the two implicit arguments automatically. To view what F* inferred, you can provide the --print_implicits argument to the compiler, which will cause all implicitly inferred arguments to be shown in any message from the compiler. Alternatively, you can write reverse_vector #t #m v yourself to force the choice of implicit arguments manually. Applying this to the Cons constructor's n argument we get:

type vector (a:Type) : nat -> Type =
   | Nil :  vector a 0
   | Cons : hd:a -> #n:nat -> tl:vector a n -> vector a (n + 1)

With this, given v: vector int m, we can write Cons 0 v to build a vector int (m + 1).

When you write a type like 'a -> M t wp (note the “ticked” variable 'a), this is simply syntactic sugar for #a:Type -> a -> M t wp, with every occurrence of 'a replaced by a in M t wpin other words, the “ticked” type variable is an implicit type argument. So the signature below is equivalent to the previous one.

val reverse_vector' : #n:nat -> vector 'a n -> Tot (vector 'a n)

9.4. Dependent kind arrows and type function

Even some of the built-in type constructors can be given kinds. Consider the refinement type constructor: _:_{_}, whose instances include types we've seen before, like x:int{x >= 0}, which is the type nat, of course. With less syntactic sugar, you can view the type x:int{phi(x)} as the application of a type constructor refinement to two arguments int and a refinement formula fun x -> phi(x). As discussed in Section 4.5, the formula phi(x) is itself a Type. Thus, the kind of the refinement constructor is a:Type -> (a -> Type) -> Type, meaning it takes two arguments; first, a type a; then a second argument, a predicate on a (whose kind depends on a); and then builds a Type.

The type fun x -> phi(x) is type function, a lambda abstraction at the level of types, which when applied to a particular value, say, 0, reduces as usual to phi(0).

10. Specifying effects

F* provides a mechanism to define new effects. As mentioned earlier, the system is already configured with several basic effects, including non-termination, state and exceptions. In this section, we look briefly at how effects are defined, using the PURE and STATE effects for illustration.

On a first reading, you may wish to skip this section and proceed directly to the next chapter.

10.1. Computation types

Expressions in F* are given computation types C. In its most primitive form, each effect in F* introduces a computation type of the form M t1..tn t wp. The M t1..tn indicates the name of the effect M and several user-defined indexes t1..tn. The type t indicates the type of the result of the computation; wp is a predicate transformer that is not weaker than the weakest pre-condition of e. For the sake of brevity, we call wp the weakest pre-condition.

For the purposes of the presentation here, we start by discussing a simplified form of these C types, i.e, we will consider computation types of the form M t wp, those that have a non-parameterized effect constructor M (and generalize to the full form towards the end of this chapter).

A computation e has type M t wp if when run in an initial configuration satisfying wp post, it (either runs forever, if allowed, or) produces a result of type t in a final configuration f such that post f is true, all the while exhibiting no more effects than are allowed by M. In other words, M is an upper bound on the effects of e; t is its result type; and wp is a predicate transformer that for any post-condition of the final configuration of e, produces a sufficient condition on the initial configuration of e.

These computation types (specifically their predicate transformers) form what is known as a Dijkstra monad. Some concrete examples should provide better intuition.

Remark. We follow an informal syntactic convention that the non-abbreviated all-caps names for computation types are the primitive formsother forms are derived from these.

For example, the computation type constructor PURE is a more primitive version of Pure, which we have used earlier in this tutorial. By the end of this chapter, you will see how Pure, Tot, etc. are expressed in terms of PURE. Likewise STATE is the primitive form of ST; EXN is the primitive form of Exn; etc.

10.2. The PURE effect

The PURE effect is primitive in F*. No user should ever have to (re-)define it. Its definition is provided once and for all in lib/prims.fst. It's instructive to look at its definition inasmuch as it provides some insight into how F* works under the covers. It also provides some guidance on how to define new effects. We discuss a fragment of it here.

The type of pure computations is PURE t wp where wp : (t -> Type) -> Type. This means that pure computations produce t-typed results and are described by predicate transformers that transform result predicates (post-conditions of kind t -> Type) to pre-conditions, which are propositions of kind Type.

Seen another way, the signatures of the wp has the following formthey transform pure post-conditions to pure pre-conditions.

let pure_post (a:Type) = a -> Type
let pure_pre = Type
let pure_wp (a:Type) = pure_post a -> pure_pre

For example, one computation type for 0 is PURE int return_zero where

return_zero = fun (p:(int -> Type)) -> p 0

This means that in order to prove any property p about the result of the computation 0 is suffices to prove p 0which is of course what you would expect. The type return_zero is an instance of the more general form below:

type return_PURE (#a:Type) (x:a) = fun (post: pure_post a) -> post x

When sequentially composing two pure computations using let x = e1 in e2, if e1: PURE t1 wp1 and e2: (x:t1 -> PURE t2 (wp2 x)), we type the composed computation as

PURE t2 (bind_PURE wp1 wp2)

where:

type bind_PURE (#a:Type) (#b:Type)
               (wp1: pure_wp a)
               (wp2: a -> pure_wp b)
 = fun (post:pure_post b) -> wp1 (fun (x:a) -> wp2 x post)

One can show that return_PURE and bind_PURE together form a monad over PureWP.

10.2.1. Pure and Tot

The Pure effect is an abbreviation for PURE that allows you to write specifications with pre- and post-conditions instead of predicate transformers. It is defined as follows:

effect Pure (a:Type) (pre:Type) (post: a -> Type) =
       PURE a (fun (p:pure_post a) -> pre /\ forall (x:a). post x ==> p x)

That is Pure a pre post is a computation which when pre is true produces a result v:a such that post v is true.

The Tot effect is defined below:

effect Tot (a:Type) =
       PURE a (fun (p:pure_post a) -> forall (x:a). p x)

This means that a computation type Tot a only reveals that the computation always terminates with an a-typed result.

10.3. The STATE effect

Stateful programs operate on an input heap producing a result and an output heap. The computation type STATE t wp describes such a computation, where wp : state_wp t has the signature below.

let state_post t = t -> heap -> Type
let state_pre    = heap -> Type
let state_wp t   = state_post t -> state_pre

In other words, WPs for stateful programs transform stateful post-conditions (relating the t-typed result of a computation to the final heap) into a pre-condition, a predicate on the input heap.

The type heap is axiomatized in lib/heap.fst and is one model of the primitive heap provided by the F* runtime system. Specifically, we have functions to select, update, and test the presence of a reference in a heap.

module Heap
assume type heap
assume val sel : #a:Type -> heap -> ref a -> Tot a
assume val upd : #a:Type -> heap -> ref a -> a -> Tot heap
assume val contains : #a:Type -> heap -> ref a -> Tot bool

These functions are interpreted using standard axioms, which allows the SMT solver to reason about heaps, for instance:

assume SelUpd1:       forall (a:Type) (h:heap) (r:ref a) (v:a).
                      {:pattern (sel (upd h r v) r)}
                      sel (upd h r v) r = v

assume SelUpd2:       forall (a:Type) (b:Type) (h:heap) (k1:ref a) (k2:ref b) (v:b).
                      {:pattern (sel (upd h k2 v) k1)}
                      k2=!=k1 ==> sel (upd h k2 v) k1 = sel h k1

The Heap library axiomatizes several other functionsthe interested reader refer to lib/heap.fst for more details.

Remark. Note, although sel is marked as a total function, the axioms underspecify its behavior, e.g., sel h r has no further interpretation unless h can be proven to be an upd.

Also note the use of the {:pattern ...} form in the axioms above. This provides the SMT solver with a trigger for the quantifiers, in effect orienting the equalities in SelUpd1 and SelUpd2 as left-to-right rewritings.

The functions sel and upd provide a logical theory of heaps. At the programmatic level, we require stateful operations to allocate, read and write references in the current heap of the program. The signatures of these stateful primitives are provided in lib/st.fst, as illustrated (in slightly simplified form) below.

Let's start with the signature of read:

let wp_read r = fun (post:state_post a) (h0:heap) ->  post (sel h0 r) h0

assume val read: #a:Type -> r:ref a -> STATE a (wp_read r)

This says that read r returns a result v:a when r:ref a; and, to prove for any post-condition post:state_post a relating the v to the resulting heap, it suffices to prove post (sel h0 r) h0 when read r is run in the initial heap h0.

Next, here's the signature of write:

let wp_write r v = fun (post:state_post a) (h0:heap) -> post () (upd h0 r v)

assume val write: #a:Type -> r:ref a -> v:a -> STATE unit (wp_write r v)

This says that write r v returns a a unit-typed result when r:ref a and v:a; and, to prove any post-condition post:state_post a relating the () to the resulting heap, it suffices to prove post () (upd h0 r v) when write r v is run in the initial heap h0.

Remark. F* provides support for user-defined custom operators. To allow you to write !r for read r and r := v instead of ST.write r v, we define

let op_Bang x = ST.read x

and

let op_Colon_Equals x v = ST.write x v

As with the pure_wp, the predicate transformers state_wp t form a monad, as shown by the combinators below.

type return_STATE (#a:Type) (v:a) = fun (post:state_post a) (h0:heap) -> post v h0

type bind_STATE (#a:Type) (wp1:state_wp a) (wp2:a -> state_wp b)
     = fun (post:state_post b) (h0:heap) -> wp1 (fun (x:a) -> wp2 x post) h0

10.3.1. The ST effect

ST is to STATE what Pure is to PURE: it provides a way to write stateful specifications using pre- and post-conditions instead of predicate transformers. It is defined as shown below (in lib/st.fst):

effect ST (a:Type) (pre:state_pre) (post: (heap -> state_post a)) = STATE a
  (fun (p:state_post a) (h:heap) ->
     pre h /\ (forall a h1. (pre h /\ post h a h1) ==> p a h1)) (* WP *)

10.3.2. The St effect

We define another abbreviation on top of ST for stateful programs with trivial pre- and post-conditions.

  effect St (a:Type) = ST a (fun h -> True) (fun _ _ _ -> True)

10.4. Lifting effects

When programming with multiple effects, you can instruct F* to automatically infer a specification for your program with a predicate transformer that captures the semantics of all the effects you use. The way this works is that F* computes the least effect for each sub-term of your program, and then lifts the specification of each sub-term into some larger effect, as may be needed by the context.

For example, say you write:

let y = !x in y + 1

The sub-term !x has the STATE effect; whereas the sub-term y+1 is PURE. Since the whole term has at least STATE effect, we'd like to lift the pure sub-computation to STATE. To do this, you can specify that PURE is a sub-effect of STATE, as follows (adapted from lib/prims.fst):

sub_effect
  PURE   ~> STATE = fun (a:Type) (wp:pure_wp a) (p:state_post a) (h:heap) ->
                    wp (fun a -> p a h)

This says that in order to lift a PURE computation to STATE, we lift its wp:pure_wp a to a state_wp a that says that that pure computation leaves the state unmodified.

Note that the type of PURE ~> STATE is (a:Type) -> pure_wp a -> state_post a -> heap -> Type which corresponds to (a:Type) -> pure_wp a -> state_wp a since:

let state_wp a   = state_post a -> state_pre
let state_pre    = heap -> Type

10.5. Indexed computation types

Finally, as mentioned earlier, in its full form a computation type has the shape M t1..tn t wp, where t1..tn are some user-chosen indices to the effect constructor M.

This is convenient in that it allows effects to be defined parametrically, and specific effects derived just by instantiation.

For example, the STATE effect is actually defined as an instance of a more general parameterized effect STATE_h (mem:Type), which is parametric in the type of the memory used by the state monad.

Specifically, we have in lib/prims.fst

new_effect STATE = STATE_h heap

However, as we will see in subsequent sections, it is often convenient to define other variants of the state monad using different memory structures.

11. Verifying Stateful Programs

In the previous chapter, we saw how F*'s ST monad was specified. We now look at several programs illustrating its use. If you skipped the previous chapter, you should be okwe'll explain what's needed about the ST monad as we go along, although, eventually, you'll probably want to refer to the previous chapter for more details.

11.1. Stateful access control

We'll start with a simple example, based on the access control example from 2.1. A restriction of that example was that the access control policy was fixed by the two functions canWrite and canRead. What if we want to administer access rights using an access control list (ACL)? Here's one simple way to model itof course, a full implementation of stateful access control would have to pay attention to many more details.

The main idea is to maintain a reference that holds the current access control list. In order to access a resource, we need to prove that the current state contains the appropriate permission.

Here's how it goes:

(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex10a
open FStar.All
open FStar.List.Tot
open FStar.Ref

type file = string

(* Each entry is an element in our access-control list *)
type entry =
  | Readable of string
  | Writable of string
type db = list entry

(* We define two pure functions that test whether
   the suitable permission exists in some db *)
let canWrite db file =
  Some? (tryFind (function Writable x -> x=file | _ -> false) db)

let canRead db file =
  Some? (tryFind (function Readable x | Writable x -> x=file) db)

(* The acls reference stores the current access-control list, initially empty *)
val acls: ref db
let acls = ST.alloc []

(*
   Here are two stateful functions which alter the access control list.
   In reality, these functions may themselves be protected by some
   further authentication mechanism to ensure that an attacker cannot
   alter the access control list

   F* infers a fully precise predicate transformer semantics for them.
*)

(* 
// Uncomment these types and make them precise enough to pass the test
// BEGIN: Ex10aExercise
val grant : e:entry -> ST unit (requires (fun h -> True))
                               (ensures (fun h x h' -> True))
val revoke: e:entry -> ST unit (requires (fun h -> True))
                               (ensures (fun h x h' -> True))
// END: Ex10aExercise
*)

let grant e = acls := e::!acls

let revoke e =
  let db = filter (fun e' -> e<>e') !acls in
  acls := db

(* Next, we model two primitives that provide access to files *)

(* We use two heap predicates, which will serve as stateful pre-conditions *)
type canRead_t f h  = canRead  (sel h acls) f  == true
type canWrite_t f h = canWrite (sel h acls) f == true

(* In order to call `read`, you need to prove
   the `canRead f` permission exists in the input heap *)
assume val read:   file:string -> ST string
                                     (requires (canRead_t file))
                                     (ensures (fun h s h' -> modifies_none h h'))

(* Likewise for `delete` *)
assume val delete: file:string -> ST unit
                                     (requires (canWrite_t file))
                                     (ensures (fun h s h' -> modifies_none h h'))

(* Then, we have a top-level API, which provides protected
   access to a file by first checking the access control list.

   If the check fails, it raises a fatal exception using `failwith`.
   As such, it is defined to have effect `All`, which combines
   both state and exceptions.

   Regardless, the specification proves that `safe_Delete`
   does not change the heap.
 *)
val safe_delete: file -> All unit 
                (requires (fun h -> True))
                (ensures (fun h x h' -> modifies_none h h'))


let safe_delete file =
  if canWrite !acls file
  then delete file
  else failwith "unwritable"

(* Finally, we have a top-level client program *)
val test_acls: file -> ML unit
let test_acls f =
  grant (Readable f);     (* ok *)
  let _ = read f in       (* ok --- it's in the acl *)
  //delete f;               (* not ok --- we're only allowed to read it *) 
  safe_delete f;          (* ok, but fails dynamically *)
  revoke (Readable f);
  //let _ = read f in       (* not ok any more *) 
  ()

Exercise 11a:
Write down some types for grant and revoke that are sufficiently precise to allow the program to continue to type-check.

val grant : e:entry -> ST unit (requires (fun h -> True))
                               (ensures (fun h x h' -> True))
val revoke: e:entry -> ST unit (requires (fun h -> True))
                               (ensures (fun h x h' -> True))

code/exercises/Ex10a.fstLoad in editor

exercise answer
val grant : e:entry -> ST unit (requires (fun h -> True))
                               (ensures (fun h _ h' -> sel h' acls == e::sel h acls))
val revoke: e:entry -> ST unit (requires (fun h -> True))
                               (ensures (fun h x h' -> not(mem e (sel h' acls))))

code/solutions/Ex10a.fstLoad in editor

11.2. Reasoning about (anti-)aliased heap structures

A central problem in reasoning about heap-manipulating programs has to do with reasoning about updates to references in the presence of aliasing. In general, any two references (with compatible types) can alias each other (i.e, they may refer to the same piece of memory), so updating one can also change the contents of the other. In this section, we'll look at two small but representative samples that illustrate how this form of reasoning can be done in F*.

11.2.1. Dynamic frames

For our first example, we define the type of a 2-dimensional mutable point. The refinement on y ensures that the references used to store each coordinate are distinct.

type point =
  | Point : x:ref int -> y:ref int{y<>x} -> point

Allocating a new point is straightforward:

let new_point x y =
  let x = ST.alloc x in
  let y = ST.alloc y in
  Point x y

As is moving a point by a unit along the x-axis.

let shift_x p = Point?.x p := !(Point?.x p) + 1

Now, things get a bit more interesting. Let's say we have a pair of points, p1 and p2, we want to shift p1, and reason that p2 is unchanged. For example, we'd like to identify some conditions that are sufficient to prove that the assertion in the program below always succeeds.

let shift_x_p1 p1 p2 =
    let p2_0 = !(Point?.x p2), !(Point?.y p2)  in //p2 is initially p2_0
    shift_x p1;
    let p2_1 = !(Point?.x p2), !(Point?.y p2) in
    assert (p2_0 = p2_1)                        //p2 is unchanged

If you give this program (reproduced in its entirety below) to F*, it will report no errors. This may surprise you: after all, if you call shift_x_p1 p p the assertion will indeed fail. What's going on here is that for functions in the ST effect, F* infers a most precise type for it, and if the programmer did not write down any other specification, this precise inferred type is what F* will use. This means that without an annotation on shift_x_p1, F* does not check that the assertion will succeed on every invocation of shift_x_p1; instead, using the inferred type it will aim to prove that the assertion will succeed whenever the function is called. Let's try:

val test0: unit -> St unit
let test0 () =
  let p1 = new_point 0 0 in
  shift_x_p1 p1 p1

Here, we wrote a specification for test0, asking it to have trivial pre-condition. When we ask F* to check this, it complains at the call to shift_x_p1 x x, saying that the assertion failed.

If we try calling shift_x_p1 with two distinct points, as below, then we can prove that the assertion succeeds.

val test1: unit -> St unit
let test1 () =
  let p1 = new_point 0 0 in
  let p2 = new_point 0 0 in
  shift_x_p1 p1 p2

In simple code like our example above, it is a reasonable trade-off to let F* infer the most precise type of a function, and to have it check later, at each call site, that those calls are safe.

As you write larger programs, it's a good idea to write down the types you expect, at least for tricky top-level functions. To see what these types look like for stateful programs, let's decorate each of the top-level functions in our example with the typesfor these tiny functions, the types can often be larger than the code itself; but this is just an exercise. Typically, one wouldn't write down the types of such simple functions.

11.2.2. A type for shift_x and the Heap.modifies predicate

Let's start with shift_x, since it is the simplest:

Here's one very precise type for it:

val shift_x : p:point -> ST unit
  (requires (fun h -> True))
  (ensures (fun h0 _ h1 -> h1=upd h0 (Point?.x p) (sel h0 (Point?.x p) + 1)))

Another, less precise but more idiomatic type for it is:

val shift_x: p:point -> ST unit
  (requires (fun h -> True))
  (ensures (fun h0 x h1 -> modifies (only (Point?.x p)) h0 h1))

Informally, ensures-clause above says that the heap after calling this function (h1) differs from the initial heap h0 only at Point?.x (and possibly some newly allocated locations). Let's look at it in more detail.

The predicate modifies is defined in the module Heap (lib/heap.fst). First, let's look at its signature, which says that it relates a set aref to a pair of heaps. The type aref is the type of a reference, but with the type of referent abstracted. As such, the first argument to modifies is a set of heterogenously typed references.

  type aref =
    | Ref : #a:Type -> ref a -> aref
  let only x = Set.singleton (Ref x)
  type modifies : set aref -> heap -> heap -> Type

The definition of modifies follows:

type modifies mods h0 h1 =
  Heap.equal h1 (Heap.concat h1 (Heap.restrict (Set.complement mods) h0))

Heap.equal: Heap.equal is an equivalence relation on heap capturing the notion of extensional equality; i.e., two heaps are extensionally equal if they map each reference to the same value.

Heap.concat: Each reference r in the heap concat h h' contains a value that is the same as in h, except if the r is in h', in which case it's value is whatever value h' contains at r. More formally:

   sel r (concat h h')
   = if contains h' r then sel h' r else sel h r

Heap.restrict The function restrict s h provides a way to shrink the domain of the heap h to be at most s. Specifically:

  contains (restrict s h) r = Set.mem (Ref r) s && contains h r

Altogether, the definition of modifies says that for every reference r, if r is not in the set mods and r if the heap h0 contains r, then sel h1 r = sel h0 r, i.e., it hasn't changed. Otherwise, sel h1 r = sel h1 rmeaning we know nothing else about it.

Remark. We define the following infix operators to build sets of heterogenous references.

let op_Hat_Plus_Plus (#a:Type) (r:ref a) (s:set aref) = // ^++
    Set.union (Set.singleton (Ref r)) s

let op_Plus_Plus_Hat (#a:Type) (s:set aref) (r:ref a) = // ++^
    Set.union s (Set.singleton (Ref r))

let op_Hat_Plus_Hat (#a:Type) (#b:Type) (r1:ref a) (r2:ref b) = // ^+^
    Set.union (Set.singleton (Ref r1)) (Set.singleton (Ref r2))

11.2.3. A type for new_point and freshness of references

Here's a precise type for new_point, with its definition reproduced for convenience.

val new_point: x:int -> y:int -> ST point
  (requires (fun h -> True))
  (ensures (fun h0 p h1 ->
                modifies Set.empty h0 h1
                /\ Heap.fresh (Point?.x p) h0 h1
        /\ Heap.fresh (Point?.y p) h0 h1
                /\ Heap.sel h1 (Point?.x p) = x
                /\ Heap.sel h1 (Point?.y p) = y
                /\ Heap.contains h1 (Point?.x p) //these two lines should be captures by fresh
                /\ Heap.contains h1 (Point?.y p)))

let new_point x y =
  let x = ST.alloc x in
  let y = ST.alloc y in
  Point x y

The modifies clause says the function modifies no existing referencethat's easy to see, it only allocates new references.

In the body of the definition, we build Point x y and the type of Point requires us to prove that x <> y. The only way to prove that is if we know that ref returns a distinct reference each time, i.e., we need freshness.

The type of ST.alloc gives us the freshness we need to prove that x <> y. The type below says that the returned reference does not exists in the initial heap and does exist in the final heap (initialized appropriately).

val alloc:  #a:Type -> init:a -> ST (ref a)
    (requires (fun h -> True))
    (ensures (fun h0 r h1 -> fresh (only r) h0 h1 /\ h1==upd h0 r init))

type fresh (refs:set aref) (h0:heap) (h1:heap) =
  (forall (a:Type) (r:ref a). mem (Ref r) refs
                         ==> not(contains h0 r) /\ contains h1 r)

Coming back to the type of new_point: By stating that the new heap contains both Point?.x and Point?.y, we provide the caller with enough information to prove that any references that it might allocate subsequently will be different from the references in the newly allocated point. For example, the assertion in the code below succeeds, only because new_point guarantees that the heap prior to the allocation of z contains Point?.x p; and ST.alloc guarantees that z is different from any reference that exists in the prior heap. Carrying Heap.contains predicates in specifications is important for this reason.

val test2: unit -> St unit
let test2 () =
  let p = new_point 0 0 in
  let z :ref nat = ST.alloc 0 in
  assert (addr_of (Point?.x p) <> addr_of z)

On the other hand, since the ST library we are working with provides no ability to deallocate a reference (other versions of it do; see, for example, lib/stperm.fst), we know that if we have a value v:ref t, that the current heap must contain it. Capturing this invariant, the ST library also provides the following primitive (whose implementation is a noop).

val recall: #a:Type -> r:ref a -> ST unit
  (requires (fun h -> True))
  (ensures (fun h0 _ h1 -> h0=h1 /\ Heap.contains h1 r))

Using this, we can write the following code.

val f : ref int -> St unit
let f (x:ref int) =
  recall x;         //gives us that the initial heap contains x, for free
  let y = ST.alloc 0 in
  assert (y <> x)

As a matter of style, we prefer to keep the use of recall to a minimum, and instead carry Heap.contains predicates in our specifications, as much as possible.

11.2.4. A type for shift_x_p1

Now, we have all the machinery we need to give shift_x_p1 a sufficiently strong type to prove that its assertion will never fail (if its pre-condition is met).

val shift_x_p1: p1:point
           -> p2:point{   addr_of (Point?.x p2) <> addr_of (Point?.x p1)
                       /\ addr_of (Point?.y p2) <> addr_of (Point?.x p1) }
           -> ST unit
    (requires (fun h -> Heap.contains h (Point?.x p2)
                    /\  Heap.contains h (Point?.y p2)))
    (ensures (fun h0 _ h1 -> modifies (only (Point?.x p1)) h0 h1))

In order to prove that the assignment to Point?.x p1 did not change p2, we need to know that Point?.x p1 does not alias either component of p2. In general, if p1 can change arbitrarily, we also need the other two conjuncts, i.e., we will need that the references of p1 and p2 are pairwise distinct.

Remark. A note on style: we could have placed the refinement on p2 within the requires clause. However, since the refinement is independent of the state, we prefer to write it in a manner that makes its state independence explicit.

Exercise 11b:
Change shift_x to the function shift below that moves both the x and y components of its argument by some amount. Prove that calling this function in a function like shift_p1 does not change p2.

let shift p = Point?.x p := ...; Point?.y p := ...

code/exercises/Ex10b.fstLoad in editor

exercise answer
let shift p =
  Point?.x p := !(Point?.x p) + 1;
  Point?.y p := !(Point?.y p) + 1


val shift_p1: p1:point
           -> p2:point{   addr_of (Point?.x p2) <> addr_of (Point?.x p1)
                       /\ addr_of (Point?.y p2) <> addr_of (Point?.x p1)
                       /\ addr_of (Point?.x p2) <> addr_of (Point?.y p1)
                       /\ addr_of (Point?.y p2) <> addr_of (Point?.y p1) }
           -> ST unit
    (requires (fun h -> Heap.contains h (Point?.x p2)
                    /\  Heap.contains h (Point?.y p2)))
    (ensures (fun h0 _ h1 -> modifies (Point?.x p1 ^+^ Point?.y p1) h0 h1))

let shift_p1 p1 p2 =
    let p2_0 = !(Point?.x p2), !(Point?.y p2)  in //p2 is initially p2_0
    shift p1;
    let p2_1 = !(Point?.x p2), !(Point?.y p2) in
    assert (p2_0 = p2_1)                        //p2 is unchanged

code/solutions/Ex10b.fstLoad in editor

12. Lightweight framing with hyper-heaps

The style of the previous section is quite general. But, as programs scale up, specifying and reasoning about anti-aliasing can get to be quite tedious. In this chapter, we look at an alternative way of proving stateful programs correct by making use of a more structured representation of memory that we call a “HyperHeap”. First, we illustrate the problem of scaling up the approach of the previous chapter to even slightly larger examples

12.1. Flying robots

johnnysokko

Let's have some fun with flying robots. Maybe you've seen this classic Japanese TV series from the 60s. https://​en.​wikipedia.​org/​wiki/​Giant_​Robo_(​tokusatsu)​. Let's build a tiny model of this flying robot.

Our bot has a 3d-point representing its position; and two arms, both in polar coordinates.

  type bot = {
    pos:point;
    left:arm;
    right:arm
  }
  and point = {
    x:ref int;
    y:ref int;
    z:ref int;
  }
  and arm = {
    polar:ref int;
    azim:ref int
  }

Our bot has an invariant: whenever it is flying, it has its arms up.

flying

We can write a function to make our robot fly, which sets its arms up and then shoots the robot into the sky.

  let fly b =
    b.left.polar := 0;
    b.right.polar := 0;
    b.z := 100

Now, we'd like to prove properties like the following:

  //initially, in h0: b1 not flying, b2 not flying
      fly b1
  //in h1: b1 flying, b2 not flying

To prove this property, and to prove that the bot's invariant is maintained, we need many anti-aliasing invariants, including the following (stated informally).

forall r1, r2. r1 in refs_in b1
              /\ r2 in refs_in b2 ==> r1 <> r2
refs in b.left.arm `disjoint` refs in b.pos
refs in b.right.arm `disjoint` refs in b.pos
refs in b.left.arm `disjoint` refs in b.right.arm

As you can see, what started out as a bit of fun programming is now a painful exercise in managing anti-aliasing invariants. Worse, even if we state all these invariants, proving that they are preserved involves reasoning about a quadratic number of inequalities between all the references in each robot, as well as a further quadratic inequalities among the references of each bot.

F*'s effect mechanism provides an alternative to this tedium. Instead of working with the primitive ST effect (and its flat model of memory via the heap type) that we have been working with so far, we can easily define a different, more structured view of memory as a new effect. One such alternative is the HyperHeap, a model of memory which provides a convenient built-in notion of disjoint sets of references.

12.2. Hyper-heaps

So far, we have been working with the heap type, a flat, typed-map from references to their contents. The ref type is abstractit has no structure and only supports equality operations. Conceptually, it looks like this:

heap

A hyper-heap provides organizes the heap into many disjoint fragments, or regions. Each region is collectively addressed by a region identifier, and these identifiers are organized in a tree-shaped hierarchy. Each region is itself a map from typed references to values, just like our heaps from before. The picture below depicts the structure.

hyperheap

At the top-level we have regions R0 ... Rn, each associated with a fragment of heap references. Region identifiers R0.0 ... R0.m are within the hierarchy rooted at R0; each has a set of references distinct from all other regions (including its parent R0); and so on.

The HyperHeap module in F*'s standard library defines this structure, a type rid for region ids, and a type rref (r:rid) (a:Type), which is the type of a reference in region r that points to an a-typed value. We prove that values of type rref r a are in an injection with values of type ref ameaning that hyper-heaps are in 1-1 correspondence with the underlying heap, except, as we will see, the additional structure provided helps with stating and proving memory invariants.

The module HyperHeap defines:

type t = Map.t rid heap
new_effect STATE = STATE_h t
effect ST (a:Type) (pre:t -> Type) (post:t -> a -> t -> Type) =
       STATE a (fun 'p m0 -> pre m0 /\ (forall x m1. post m0 x m1 ==> 'p x m1))

That is HyperHeap.STATE and HyperHeap.ST are just like the STATE and ST we've been working with, except that instead of the type of memory being a flat heap, the memory has type HyperHeap.t, a map from region ids rid to heaps.

12.2.1. The type of region identifiers, rid

The type rid is abstract to clients, with a few operations on it revealing its hierarchical structure. Specifically, we have three functions:

type rid
val root : rid
val includes : rid -> rid -> Tot bool
val extends  : rid -> rid -> Tot bool
let disjoint i j = not (includes i j) && not (includes j i)

One way to think of rid is as a list int, where includes i j if and only if i is a prefix of j; and extends i j is true if i = _ :: j. However, while this may provide you with some intuition, formally, the only properties that these functions are known to satisfy are revealed by the following lemmas:

val lemma_includes_refl: i:rid
-> Lemma (requires (True))
         (ensures (includes i i))
         [SMTPat (includes i i)]

val lemma_disjoint_includes: i:rid -> j:rid -> k:rid
-> Lemma (requires (disjoint i j /\ includes j k))
         (ensures (disjoint i k))
         [SMTPat (disjoint i j);
          SMTPat (includes j k)]

val lemma_extends_includes: i:rid -> j:rid
-> Lemma (requires (extends j i))
         (ensures (includes i j))
         [SMTPat (extends j i)]

val lemma_extends_disjoint: i:rid -> j:rid -> k:rid
-> Lemma (requires (extends j i /\ extends k i /\ j<>k))
         (ensures (disjoint j k))
         [SMTPat (extends j i);
          SMTPat (extends k i)]

12.2.2. The type of region-indexed references: rref

Next, we have a type rref for region-indexed references, as a function as_ref that coerces an rref to its underlying heap reference.

Remark. The functions as_ref and ref_as_rref have effect GTot, indicating that they are ghostly total functions. A subsequent chapter will describe ghost effects in more detail. For now, it suffices to know that computations with effect GTot can only be used in specifications. Concrete executable programs cannot use GTot. In this case, the GTot effect is used to ensure that client programs of HyperHeap cannot break the abstraction of the rref type, except in their specifications.

  type rref : rid -> Type -> Type
  val as_ref      : #a:Type -> #id:rid -> r:rref id a -> GTot (ref a)
  val ref_as_rref : #a:Type -> i:rid -> r:ref a -> GTot (rref i a)
  val lemma_as_ref_inj: #a:Type -> #i:rid -> r:rref i a
      -> Lemma (requires (True))
               (ensures ((ref_as_rref i (as_ref r) = r)))
         [SMTPat (as_ref r)]

The functions Heap.sel and Heap.upd are adapted to work with hyper-heaps instead; in effect, we select first the region i corresponding to rref i a and then perform the select/update within the heap within that region. An important point to note is that although the rid type is structured hierarchically, the map of all regions is itself flat. Thus, selecting the region corresponding to i is constant time operation (rather than requiring a tree traversal). As we will see below, this is crucial for good automation in proofs.

let sel (#a:Type) (#i:rid) (m:t) (r:rref i a) = Heap.sel (Map.sel m i) (as_ref r)
let upd (#a:Type) (#i:rid) (m:t) (r:rref i a) (v:a)
  = Map.upd m i (Heap.upd (Map.sel m i) (as_ref r) v)

12.2.3. Stateful operations on regions and references

Creating a new region HyperHeap also provides four main stateful functions. The first of these, new_region r0, allocates a new, empty region rooted at r0

val new_region: r0:rid -> ST rid
      (requires (fun m -> True))
      (ensures (fun (m0:t) (r1:rid) (m1:t) ->
                           extends r1 r0
                        /\ fresh_region r1 m0 m1
                        /\ m1=Map.upd m0 r1 Heap.emp))

Allocating a reference in a region The function ralloc r v allocates a new reference in region r and initializes its contents to v.

val ralloc: #a:Type -> i:rid -> init:a -> ST (rref i a)
    (requires (fun m -> True))
    (ensures (fun m0 x m1 ->
                    Let (Map.sel m0 i) (fun region_i ->
                    not (Heap.contains region_i (as_ref x))
                    /\ m1=Map.upd m0 i (Heap.upd region_i (as_ref x) init))))

Reading and writing references These next two operations are written !r and r := v, for dereference and assignment of rref's, respectively.

val op_Bang: #a:Type -> #i:rid -> r:rref i a -> ST a
  (requires (fun m -> True))
  (ensures (fun m0 x m1 -> m1=m0 /\ x=Heap.sel (Map.sel m0 i) (as_ref r)))

val op_Colon_Equals: #a:Type -> #i:rid -> r:rref i a -> v:a -> ST unit
  (requires (fun m -> True))
  (ensures (fun m0 _u m1 -> m1=Map.upd m0 i (Heap.upd (Map.sel m0 i) (as_ref r) v)))

12.2.4. Framing conditions for hyper-heaps: modifies revised

Finally, we redefine modifies clauses to work with hyper-heaps instead of heaps. Crucially, given region ids r1, ..., rn, we intend to write modifies (r1 ^++ ... ++^ rn) h0 h1 to mean that h1 may differ from h0 in all regions rooted at some region in the set {r1, ..., rn} (and, as before, any other regions that are newly allocated in h1). As we will see, generalizing modifies clauses to work with hierarchical region names is a crucial feature, both for compactness of specifications and for better automation.

type modifies (s:Set.set rid) (m0:t) (m1:t) =
  Map.equal m1 (Map.concat m1 (Map.restrict (Set.complement (mod_set s)) m0))

where
val mod_set : Set.set rid -> GTot (Set.set rid)
and forall (x:rid) (s:Set.set rid).
           Set.mem x (mod_set s) <==> (exists (y:rid). Set.mem y s /\ includes y x)

Remark. The interpretation of Map.restrict, Map.concat and Map.equal is analogous to the counterparts for Heap discussed in the previous section.

The definition of mod_set s is non-constructive, since deciding Set.mem x (mod_set s) requires finding a witness in s, and element y of s, such that includes y x. As such, we mark it as a ghost function with the GTot effect.

12.3. Robot, fly!

We now look at the use of hyper-heaps to solve our problem of programming our robots.

The main idea is to define our data structures in a way that distinct components of the robot are allocated in distinct regions, thereby maintaining an invariant that the components do not interfere. For example, by allocating the left arm and the right arm of the robot in distinct regions, we know that the arms do not alias each othermoving one arm does not disturb the other.

Here's how it goes:

Each type carries an extra implicit parameter, recording the region in which its mutable references reside. Within a region, we get no free anti-aliasing propertiesso, for those references, we write explicit inequalities that say, for example, that the x, y and z fields of a point are not aliased.

For the bot itself, we say that its components are allocated in disjoint sub-regions of r, the region of the bot. Note that the hierarchical region names allow us to state this conveniently. With just a flat namespace of regions, we would had to have explicitly specified the set of regions that are transitively allocated within a bot (possibly breaking the abstraction of sub-objects, which would have to reveal how many regions they used.)

module Robot
open FStar.Heap
open FStar.HyperHeap
open FStar.Set

noeq type point =
  | Point : r:rid
          -> x:ref int{frameOf x = r}
          -> y:ref int{frameOf y = r}
          -> z:ref int{frameOf z = r /\ as_addr x <> as_addr y /\ as_addr y <> as_addr z /\ as_addr x <> as_addr z}
          -> point

noeq type arm =
  | Arm : r:rid
       -> polar:ref int{frameOf polar = r}
       -> azim:ref int{frameOf azim = r /\ as_addr polar <> as_addr azim}
       -> arm

noeq type bot =
  | Bot : r:rid
        -> pos  :point{includes r (Point?.r pos)}
        -> left :arm  {includes r (Arm?.r left)}
        -> right:arm  {includes r (Arm?.r right)
                      /\ disjoint (Point?.r pos) (Arm?.r left)
                      /\ disjoint (Point?.r pos) (Arm?.r right)
                      /\ disjoint (Arm?.r left)  (Arm?.r right)}
        -> bot

Next, we define our robot invariant: whenever it is flying, its arms are up. Additionally, we state that all the regions of the bot are included in the heap. This is a matter of taste: alternatively, we could have omitted the contains predicates in our invariant, and resorted to ST.recall (its analog for hyper-heaps), to prove that a region exists whenever we need that property.

let flying (b:bot) (h:mem) =
  sel h (Point?.z (Bot?.pos b)) > 0

let arms_up (b:bot) (h:mem) =
    sel h (Arm?.polar (Bot?.right b)) = 0
  /\ sel h (Arm?.polar (Bot?.left b))  = 0

type robot_inv (b:bot) (h:mem) =
    Map.contains (get_hmap h) (Bot?.r b)
  /\ Map.contains (get_hmap h) (Point?.r (Bot?.pos b))
  /\ Map.contains (get_hmap h) (Arm?.r (Bot?.left b))
  /\ Map.contains (get_hmap h) (Arm?.r (Bot?.right b))
  /\ (flying b h ==> arms_up b h)

Now, we come to the code that builds new points, arms, and robots. In each case, the caller specifies r0, the region within which the object is to be allocated. We create a new sub-region of r0, allocate the object within, prove that the region is fresh (meaning the initial heap does not contain that region, while the final heap does), and that the object is initialized as expected. In the case of new_robot, we also prove that the the returned bot satisfies the robot invariant.

val new_point: r0:rid{is_eternal_region r0} -> x:int -> y:int -> z:int -> ST point
  (requires (fun h0 -> True))
  (ensures (fun h0 p h1 ->
              modifies empty h0 h1
            /\ extends (Point?.r p) r0
            /\ fresh_region (Point?.r p) h0 h1
            /\ sel h1 (Point?.x p) = x
            /\ sel h1 (Point?.y p) = y
            /\ sel h1 (Point?.z p) = z))
let new_point r0 x y z =
  let r = new_region r0 in
  let x = ralloc r x in
  let y = ralloc r y in
  let z = ralloc r z in
  Point r x y z

val new_arm: r0:rid{is_eternal_region r0} -> ST arm
  (requires (fun h0 -> True))
  (ensures (fun h0 x h1 ->
              modifies empty h0 h1
            /\ extends (Arm?.r x) r0
            /\ fresh_region (Arm?.r x) h0 h1))
let new_arm r0 =
  let r = new_region r0 in
  let p = ralloc r 0 in
  let a = ralloc r 0 in
  Arm r p a

val new_robot: r0:rid{is_eternal_region r0} -> ST bot
  (requires (fun h0 -> True))
  (ensures (fun h0 x h1 ->
          modifies empty h0 h1
            /\ extends (Bot?.r x) r0
            /\ fresh_region (Bot?.r x) h0 h1
            /\ robot_inv x h1))
let new_robot r0 =
  let r = new_region r0 in
  let p = new_point r 0 0 0 in
  let left = new_arm r in
  let right = new_arm r in
  Bot r p left right

Next, we write our function fly, which sets the robots arms up, and shoots it up to fly. As an illustration, we show that we can also modify some other reference arbitrarily, and the anti-aliasing properties of our invariant suffice to prove that this assignment does not mess with the robot invariant.

val fly: b:bot -> ST unit
  (requires (fun h -> robot_inv b h))
  (ensures (fun h0 _u h1 ->
              modifies_transitively (only (Bot?.r b)) h0 h1
            /\ robot_inv b h1
            /\ flying b h1))
#set-options "--z3rlimit 20"
let fly b =
  recall (Arm?.azim (Bot?.right b));
  recall (Arm?.polar (Bot?.left b));
  recall (Arm?.polar (Bot?.right b));
  recall (Point?.z (Bot?.pos b));
  Arm?.azim (Bot?.right b)  := 17;
  Arm?.polar (Bot?.left b)  := 0;
  Arm?.polar (Bot?.right b) := 0;
  Point?.z (Bot?.pos b)     := 100

Finally, if we're given two robots, b0 and b1, we know that they are allocated in disjoint regions, and satisfy the robot invariant, then we can fly one, not impact the invariant of the other; then fly the other, and have both of them flying at the end.

val fly_both: b0:bot -> b1:bot{disjoint (Bot?.r b0) (Bot?.r b1)}
  -> ST unit
  (requires (fun h -> robot_inv b0 h /\ robot_inv b1 h))
  (ensures (fun h0 x h1 ->
              modifies_transitively (Bot?.r b0 ^+^ Bot?.r b1) h0 h1
            /\ robot_inv b0 h1
            /\ robot_inv b1 h1
            /\ flying b0 h1
            /\ flying b1 h1))
#set-options "--z3rlimit 50"
let fly_both b0 b1 =
  fly b0;
  fly b1

Exercise 12a:
Extend this example by adding a function which takes a list of robots, each in a region disjoint from all others, and fly all of them.

val fly_robot_army: ...

code/exercises/Ex11a.fstLoad in editor

exercise answer
val fly_robot_army:  #rs:Set.set rid
        -> bs:bots rs
        -> ST unit
           (requires (fun h -> (forall b.{:pattern (trigger b)}
                                            (trigger b /\ mem b bs ==> robot_inv b h))))
           (ensures  (fun h0 _u h1 ->
                          modifies rs h0 h1
                           /\ (forall b.{:pattern (trigger b)}
                                  (trigger b /\ mem b bs ==> robot_inv b h1 /\ flying b h1))))

code/solutions/Ex11a.fstLoad in editor

12.4. Why hyper-heaps work

The style of dynamic frames and explicit inequalities between objects is not incompatible with hyper-heaps: as we've also seen, within a region, we resort to pairwise inequalities, as usual. However, using hyper-heaps where possible, as we've just seen, makes specifications much more concise. As it turns out, they also make verifying programs much more efficient. On some benchmarks, we've noticed a speedup in verification time of more than a factor of 20x when using hyper-heaps for disjointness invariants. In this section, we look a bit more closely at what's happening under the covers and why hyper-heaps help.

First, without hyper heaps, consider a computation f () run in a heap h0 and producing a heap h1 related by modifies {x1,...,xn} h0 h1, for some set of references {x1...xn}. Consider also some predicate Q = fun h -> P (sel h y1) ... (sel h yn), which is initially true in h0. We would like to prove that if Q h1 is also true. In other words, we need to prove the formula:

Expanding definitions:
  P (sel h0 y1) ... (sel h0 ym)
  /\ h1 = concat h1 (restrict (complement {x1..xn}) h0)
  ==> P (sel h1 y1) ... (sel h1 ym)

In general, for an arbitrary uninterpreted predicate P, to prove this, one must prove an quadratic number of inequalities, e.g., to prove that sel h1 y1 = sel h0 y1, requires proving that y1 is not equal to any the {x1..xn}.

However, if one can group related references into regions (meaning related references are generally read and updated together), one can do much better. For example, moving to hyper-heaps, suppose we place all the {x1..xn} in a region included by rx. Suppose {y1,...,ym} are all allocated in some region ry. Now, our goal is to prove

  P (Heap.sel (Map.sel h0 ry) y1) ... (Heap.sel (Map.sel h0 ry) ym)
  /\ h1 = concat h1 (restrict (mod_set {rx}) h0)
  ==> P (Heap.sel (Map.sel h1 ry) y1) ... (Heap.sel (Map.sel h1 ry) ym)

To prove this, we only need to prove that Map.sel h0 ry = Map.sel h1 ry, which involves proving not (included rx ry). Having proven this fact, the SMT solver simply unifies the representation of all occurrences of these sub-terms everywhere in the formula above and concludes the proof immediately. Thus, in such (arguably common) cases, what initially required quadratic number of reasoning steps, now only requires a constant number of steps!

In more general scenarios, we may still have to perform a quadratic number of reasoning steps, but with hyper-heaps, we are quadratic only in the number of regions involved, which are often much smaller than the number of references that they may contain. The use of region hierarchies serves to further reduce the number of region identifiers that one refers to, making the constants smaller still. Of course, in the degenerate case where one has just one reference per region, this devolves back to the performance one would get without regions at all.

13. Cryptography examples for authentication

Cryptographic reasoning is inherently modular reasoning. We want to show that using some cryptographic primitive we can secure some application goal. We will formalize both the guarantees afforded by the cryptographic module, as well as the goals of the application employing the module using types.

13.1. Message authentication codes (MAC)

We begin with a simple cryptographic library for message authentication codes (MAC) in module MAC.

MACs provide integrity protection using shared key cryptography, often using keyed cryptographic hashes.

A MAC scheme typically consists of three algorithms, a keygen algorithm for generating the shared key, a mac function for generating a MAC tag on message text using key, and a verify function for verifying that tag is a valid MAC for text and key.

We say that messages for which verify key text tag=true have been authenticated. But what does this mean formally? Does it mean that the owner of key endorsed the message text. But who is the owner of key, which after all is a bitstring that could be learned and copied. And for deployed MAC functionalities, verify is a pure function, so even if the key was kept perfectly secret, a randomly picked tag will verify with some probability.

Cryptographers grappled with these problems and specify the properties of a MAC negatively by bounding the probability of realistic adversaries forging a tag. This is typically done in the form of a random experiment:

13.1.1. The message authentication experiment

A MAC scheme implementing keygen, mac, verify is unforgeable against chosen message attacks (UF-CMA) if the success probability of all ‘realistic’ adversaries A (modelled as F* programs) is small in the following probabilistic experiment:

We refer to the success probability of A in this experiment as its UF-CMA advantage $\epsilon_{ufcma}(A)$.

Does the appearance of probabilities in this section make you suspicious? It should, as nowhere in this tutorial have we so far discussed the use of F* for probabilistic reasoning. The F* type checker is a deductive reasoning tool, and trades in certainties and not in probabilities. While we will see that there are ways to extend F* semantics with probabilities to express probabilistic experiments in the language of F*, we will now provide an abstraction or idealization of the guarantees of the MAC functionality. We will discuss below, how one can formally establish the soundness of this idealization.

13.1.2. The MAC type interface

We define an interface that models perfect authentication guarantees. We will get back to how it can be (approximately) implemented by an F* module later.

val keygen: p:(text -> Type0) -> ML (pkey p)
val mac:    k:key -> t:text{key_prop k t} -> ST tag
  (requires (fun h -> True))
  (ensures (fun h x h' -> modifies (Set.singleton (addr_of log)) h h'))
val verify: k:key -> t:text -> tag -> ST (b:bool{b ==> key_prop k t})
  (requires (fun h -> True))
  (ensures (fun h x h' -> modifies Set.empty h h'))

code/solutions/Ex12.MAC.fstLoad in editor

The specification of keygen attaches an authenticated property p: text -> Type to each key using function key_prop k and type pkey p:

The function key_prop : key -> p:(text -> Type) maps each key to an application defined authentication property p and the type pkey p = k:key{key_prop k == p} denotes keys with property p.

The property (key_prop k) t is used as a pre-condition for MACing and as a postcondition for verifying a message t with key k.

When implementing the message authentication experiment using a key_prop k that witnesses the monotone property (mem text log) before answering a mac k query, the postcondition of verify guarantees that A has UF-CMA advantage 0. This confirms that the interface models perfect authentication guarantees.

In the RPC example below (13.3) we will define key_prop k == reqresp and uses keys of type pkey reqresp where

type reqresp text = 
     (exists s.   text = Formatting.request  s   /\ pRequest s )
  \/ (exists s t. text = Formatting.response s t /\ pResponse s t)

which states that messages that verify with that key are either Requests or Respones. The formatting functions use tags to disambiguate between requests and responses.

13.2. Cryptographic Capabilities for Accessing Files

We extend the access control example (2.1.1) with cryptographic capabilities, so that our trusted ACLs library can now issue and accept MAC tags as evidence that a given file is readable. Recall that this exercise employed predicates ACLS.canRead f and ACLS.canWrite f on filenames that determined whether a file could be read or written.

We now attach these predicates to unforgeable cryptographic tokens also called capabilities.

What is the advantage of capabilities over the dynamic access control based on state of Section 11.1?

Capabilities need to be kept secret, as everyone knowing the capabilities is granted access to the corresponding files, however, parties no longer need to keep a synchronized access control database. In order to verify capabilities, they only need to store the cryptographic key used to generate the capability. The tokens generated in this fashion work in a similar fashion as tokens in the Kerberos authentication protocol.

Exercise 13a:
Relying on ACLs and MAC, implement a pair of functions with the following specification, and verify them by typing under the security assumption that the MAC is UF-CMA. Use failwith "bad capability" in case the verification fails.

val issue: f:string{ ACLs.canRead f } -> ML MAC.tag
val redeem: f:string -> m:MAC.tag -> ML (u:unit{ ACLs.canRead f })

code/exercises/Ex12a1.Cap.fstLoad in editor

To this end, assume given a UTF8 encoding function from strings to bytes, with the following specification:

assume val utf8: s:string  -> Tot bytes

assume UTF8_inj:
  forall s0 s1.{:pattern (utf8 s0); (utf8 s1)}
     b2t (equalBytes (utf8 s0) (utf8 s1)) ==> s0==s1

This says that if two utf8 encodings are equal, then so are the values that were encoded. The patterns instructs the SMT solver to associate the injectivity axiom only with very specific terms.

exercise answer
val issue: f:string{ ACLs.canRead f } -> ML MAC.tag
val redeem: f:string -> m:MAC.tag -> ML (u:unit{ ACLs.canRead f })

let issue f = 
  assert(ACLs.canRead f);
  let bs = (utf8 f) in
  assert(capRead bs);
  MAC.mac k bs

let redeem f t =
  let bs = (utf8 f) in
  if MAC.verify k bs t then
    (MAC.from_key_prop k bs ;
    assert(ACLs.canRead f))
  else
    failwith "bad capability"

code/solutions/Ex12a1.Cap.fstLoad in editor

How would we extend it to also cover write access?

exercise answer
type capWrite (msg:bytes) = (forall f. msg = utf8 f ==> ACLs.canWrite f)

let k_write = MAC.keygen capWrite

val issue_write: f:string{ ACLs.canWrite f } -> ML MAC.tag
val redeem_write: f:string -> m:MAC.tag -> ML(u:unit{ ACLs.canWrite f })

let issue_write f =
  assert(ACLs.canWrite f);
  let bs = (utf8 f) in
  assert(capWrite bs);
  MAC.mac k_write bs


let redeem_write f t =
  let bs = (utf8 f) in
  if MAC.verify k_write bs t then
    (MAC.from_key_prop k_write bs ;
    assert(ACLs.canWrite f))
  else
    failwith "bad capability"

code/solutions/Ex12a2.Cap.fstLoad in editor

13.3. Secure RPC

We describe a simple secure RPC implementation, which consists of three modules: MAC, our library for MACs, Format, a module for message formatting, and RPC, a module for the rest of the protocol code. (Placing the formatting functions request and response in a separate module is convenient to illustrate modular programming and verification.)

13.3.1. Secure RPC protocol

To start with we consider a single client and server. The security goals of our RPC protocol are that (1) whenever the server accepts a request message s from the client, the client has indeed sent the message to the server and, conversely, (2) whenever the client accepts a response message t from the server, the server has indeed sent the message in response to a matching request from the client.

To this end, the protocol uses message authentication codes (MACs) computed as keyed hashes, such that each symmetric MAC key kab is associated with (and known to) the pair of principals a and b. Our protocol may be informally described as follows. An Authenticated RPC Protocol:

1. client->server  utf8 s | (mac kab (request s))
2. server->client  utf8 t | (mac kab (response s t))

In the protocol narration, each line indicates an intended communication of data from one principal to another.

utf8 marshalls strings such as s and t into byte arrays that can be send over the network. Operator | written also append is a function concatenating bytes (In F* we will use the infix operator @|). Functions request and response build message digests (the authenticated values). These functions may for instance be implemented as tagged concatenations of their utf8-encoded arguments. These functions will be detailed in the Format module.

Below we give an implementation of the protocol using these functions, the functions of the ‘MAC’ module, and networking functions

private val send: message -> St unit
private val recv: (message -> ML unit) -> ML unit

These functions model an untrusted network controlled by the adversary, who decides through their implementation when to deliver messages.

Two events, record genuine requests and responses

val pRequest : string -> Type0
val pResponse : string -> string -> Type0

in the following protocol

val client : string16 -> ML unit
let client (s:string16) =
  client_send s;
  client_recv s

val server : unit -> ML unit
let server () =
  recv (fun msg ->
    if length msg < macsize then failwith "Too short"
    else
      let (v,m) = split msg (length msg - macsize) in
      if length v > 65535 then failwith "Too long"
      else
        let s = iutf8 v in
        if verify k (Formatting.request s) m
        then
          (
            from_key_prop k (Formatting.request s);
            assert (pRequest s);
            recall_token log (req s);
            let xs = !log in
            assert (Request s `mem` xs);
            print_string "\nserver verified:";
            print_string s;
            let t = "42" in
            add_to_log log (Response s t);
            witness_token log (resp s t);
            print_string "\nserver sent:";
            print_string t;
            send ( (utf8 t) @| (mac k (Formatting.response s t)))
            )
        else failwith "Invalid MAC" )

code/solutions/Ex12b.RPC.fstLoad in editor

For simplicity we initially elide the client code which is similar, but you can open the module in the online editor to look at the full protocol.

The asserts guarantee that messages that are received by a party have been sent by its peer.

Remark. One can imagine an extended protocol where we have a population of principals, represented as concrete strings ranged over by a and b, that intend to perform various remote procedure calls (RPCs) over a public network. The RPCs consist of requests and responses, both also represented as strings. The security goals of our RPC protocol are that (1) whenever a principal b accepts a request message s from a, principal a has indeed sent the message to b and, conversely, (2) whenever a accepts a response message t from b, principal b has indeed sent the message in response to a matching request from a.

13.3.2. Formatting code

The Format module used by the protocol implementation exports among others the following types and functions.

type message = bytes
type msg (l:nat) = lbytes l

val request : string -> Tot message
val response: string16 -> string -> Tot message

(* -------- implementation *)

let tag0 = createBytes 1 0uy
let tag1 = createBytes 1 1uy

let request s = tag0 @| (utf8 s)

let response s t =
  lemma_repr_bytes_values (length (utf8 s));
  let lb = uint16_to_bytes (length (utf8 s)) in
  tag1 @| (lb @| ( (utf8 s) @| (utf8 t)))

We require 3 lemmas on message formats:

For authentication properties, we require both functions to be injective, so that authenticating the bytes unambiguously authenticate the text before marshalling.

Exercise 13b:
Try to state this formally as three lemmas.

val req_resp_distinct:
  s:string -> s':string16 -> t':string ->
  Lemma (requires True)
        (ensures  True) //replace with actual lemma post-condition
        [SMTPat (request s); SMTPat (response s' t')]

val req_injective:
  s0:string -> s1:string ->
  Lemma (requires True) //replace with actual lemma pre-condition
        (ensures  True) //replace with actual lemma post-condition
        (*[SMTPat (request s0); SMTPat (request s1)]*)

val resp_injective:
  s0:string16 -> t0:string -> s1:string16 -> t1:string ->
  Lemma (requires True) //replace with actual lemma pre-condition
        (ensures  True) //replace with actual lemma post-condition
        [SMTPat (response s0 t0); SMTPat (response s1 t1)]

code/exercises/Ex12b1.Format.fstLoad in editor

exercise answer
val req_resp_distinct:
  s:string -> s':string16 -> t':string ->
  Lemma (requires True)
        (ensures (request s <> response s' t'))
        [SMTPat (request s); SMTPat (response s' t')]

val req_injective:
  s0:string -> s1:string ->
  Lemma (requires (b2t (Seq.eq (request s0) (request s1))))
        (ensures  (s0==s1))
        (*[SMTPat (request s0); SMTPat (request s1)]*)

val resp_injective:
  s0:string16 -> t0:string -> s1:string16 -> t1:string ->
  Lemma (requires (b2t (Seq.eq (response s0 t0) (response s1 t1))))
        (ensures  (s0==s1 /\ t0==t1))
        [SMTPat (response s0 t0); SMTPat (response s1 t1)]

code/solutions/Ex12b1.Format.fstLoad in editor

Prove these three lemmas.

let req_resp_distinct s s' t' = admit()

let req_injective s0 s1 = admit()

let resp_injective s0 t0 s1 t1 = admit()

code/exercises/Ex12b2.Format.fstLoad in editor

exercise answer
let req_resp_distinct s s' t' = 
  lemma_repr_bytes_values (length (utf8 s));
  lemma_repr_bytes_values (length (utf8 s'));
  assert (Seq.index (request s) 0 == 0uy);
  assert (Seq.index (response s' t') 0 == 1uy)

let req_injective s0 s1 = ()

let resp_injective s0 t0 s1 t1 =
  lemma_repr_bytes_values (length (utf8 s0));
  lemma_repr_bytes_values (length (utf8 s1))

code/solutions/Ex12b2.Format.fstLoad in editor

These lemmas rely on a lemma about the injectivity of append:

val append_inj_lemma: b1:message -> b2:message
                   -> c1:message -> c2:message
                   -> Lemma (requires (length b1==length c1 /\ b2t (Seq.eq (b1 @| b2) (c1 @| c2))))
                            (ensures (b2t (Seq.eq b1 c1) /\ b2t (Seq.eq b2 c2)))
                            [SMTPat (b1 @| b2); SMTPat (c1 @| c2)] (* given to the SMT solver *)

Exercise 13c:
Complete the proof of the lemma.

val append_inj_lemma: b1:message -> b2:message
                   -> c1:message -> c2:message
                   -> Lemma (requires (length b1==length c1 
                              /\ b2t (Seq.eq (b1 @| b2) (c1 @| c2))))
                           (ensures (b2t (Seq.eq b1 c1) /\ b2t (Seq.eq b2 c2)))
                           [SMTPat (b1 @| b2); SMTPat (c1 @| c2)] 
               (* given to the SMT solver *)
let append_inj_lemma b1 b2 c1 c2 =
  lemma_append_len_disj b1 b2 c1 c2;
  Classical.forall_intro 
    #_ 
    #(fun (x:(i:nat{i < length b1})) -> index b1 x == index c1 x) 
    (lemma_append_inj_l b1 b2 c1 c2); 
    //currently the 2nd implicit argument has to be provided explicitly
  admit()

code/exercises/Ex12c.Format.fstLoad in editor

13.4. Verified Padding

We intend to MAC and/or encrypt plaintexts of variable sizes using fixed-sized algorithms, such as those that operate on 256-bit blocks, such as AES.

To this end, we use the following classic PKCS#7 padding scheme (used for instance in TLS).

let pad n = 
  Seq.create n (n2b (n-1))  

let encode a = append a (pad (blocksize - length a))

let decode (b:block) = 
  let padsize = b2n(index b (blocksize - 1)) + 1 in
  if padsize <= blocksize then 
    let (plain,padding) = split b (blocksize - padsize) in
    if  Seq.eq padding (pad padsize)
    then Some plain
    else None   
  else None

The function n2b converts numbers less than 256 into a byte; pad creates bytes consisting of n bytes encoding the number n; encode fills up bytes smaller than a block to a full block using such pads; and decode removes pads.

Exercise 13d:
Write a typed specifications for these functions. In particular show that decoding cancels out encoding.

(* fill in type here and remove the assumption*)
let pad (n:int) =
  assume (1 <= n /\ n < 256);
  Seq.create n (n2b (n-1))  

(* pad 1 = [| 0 |]; pad 2 = [| 1; 1 |]; ... *)

(* fill in type here *)
let encode a = append a (pad (blocksize - length a))


(* fill in type here *)
let decode (b:block) = 
  let padsize = b2n(index b (blocksize - 1)) + 1 in
  if op_LessThan padsize blocksize then 
    let (plain,padding) = split b (blocksize - padsize) in
    if  Seq.eq padding (pad padsize)
    then Some plain
    else None   
  else None

code/exercises/Ex12d.Pad.fstLoad in editor

exercise answer
val pad: n:nat { 1 <= n /\ n <= blocksize } -> Tot (nbytes n)
val encode: a: text -> Tot block
val decode: b:block -> option (t:text { equal b (encode t) })

code/solutions/Ex12d.Pad.fstLoad in editor

Exercise 13e:
Is it a good padding scheme? What are its main properties?

Prove a lemma that expresses its injectivity.

Provide examples of other schemes, both good and bad.

exercise answer

A padding scheme should be invertible. For authentication properties, we require it to be injective, so that authenticating a padded text unambiguously authenticate the text before padding. For instance:

val inj: a: text -> b: text -> Lemma (requires (equal (encode a) (encode b)))
                                     (ensures (equal a b))
                                     [SMTPat (encode a); SMTPat (encode b)]

val lemma_append_inj: #a:Type -> s1:seq a -> s2:seq a -> t1:seq a -> t2:seq a 
                      {length s1 = length t1 \/ length s2 = length t2}
  -> Lemma (requires (equal (append s1 s2) (append t1 t2)))
           (ensures (equal s1 t1 /\ equal s2 t2))
           [SMTPat (append s1 s2); SMTPat (append t1 t2)]
           //good example to explain patterns.
let lemma_append_inj #a s1 s2 t1 t2 = Seq.lemma_append_inj #a s1 s2 t1 t2

let inj a b = 
  if length a = length b
  then () 
       //lemma_append_inj a (pad (blocksize - length a)) b  (pad (blocksize - length a))
       //no longer needed because of pattern
  else let aa = encode a in
       let bb = encode b in
       cut (index aa 31 <> index bb 31)

code/solutions/Ex12e.Pad.fstLoad in editor

In this context, another good padding scheme is to append a 1 then enough zeros to fill the block. One could also format the message by explicitly including its length. A bad scheme is to fill the rest of the block with zeros (or random values), as the same padded text may then be parsed into several texts ending with 0s.

Exercise 13f:
Relying on that specification of encode, implement and verify a MAC interface for messages of type text of less than blocksize bytes, using the MAC interface below, which only supports messages of type block.

To this end, we assume given an implementation of the following module (adapted from the MAC module)

(*
   Copyright 2008-2018 Microsoft Research

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
*)
module Ex12.BMAC

open Ex12.Pad 

let keysize = 16 (* these are the sizes for SHA1 *) 
let macsize = 20  
type key = nbytes keysize
type tag = nbytes macsize

assume type key_prop : key -> block -> Type
type pkey (p:(block -> Type)) = k:key{key_prop k == p}

assume val keygen: p:(block -> Type) -> pkey p
assume val mac:    k:key -> t:block{key_prop k t} -> tag
assume val verify: k:key -> t:block -> tag -> b:bool{b ==> key_prop k t}


Write a similar module that accepts texts shorter than blocksize instead of blocks:

Write the implementations for keygen, mac, and verify, first then devise type annotations to derive their authentication property (on texts) from the one of BMAC (on blocks).

val keygen: p:(text -> Type) -> pkey p
val mac:    p:(text -> Type) -> k:pkey p -> t:text{p t} -> tag
val verify: p:(text -> Type) -> k:pkey p -> t:text -> tag -> b:bool{b ==> p t}
(*implement these functions*)

code/exercises/Ex12f.TMAC.fstLoad in editor

exercise answer
let keygen (spec: text -> Type) = 
  let k = BMAC.keygen (bspec spec) in
  assume (key_prop k == spec);
  k

let mac (p:text -> Type) k t = BMAC.mac k (encode t)

let verify p k t tag = BMAC.verify k (encode t) tag

code/solutions/Ex12f.TMAC.fstLoad in editor

Exercise 13g:
1. Can we similarly construct a MAC scheme for texts with type text:bytes {length text <= n}?
2. Implement, then verify this construction.

exercise answer
  1. Yes, using two keys: one for text:bytes {length b = n} and another for text:bytes {length b}.

  2. Code:

type text2 = b:bytes { length b <=  blocksize } 

let keysize = 2 * BMAC.keysize
let macsize = BMAC.macsize
noeq type key = | Keys: k0:BMAC.key -> k1:BMAC.key -> key
type tag = BMAC.tag

type bspec0 (spec: (text2 -> Type)) (b:block) =
  (forall (t:text). equal b (encode t) ==> spec t)

type bspec1 (spec: (text2 -> Type)) (b:block) = 
  spec b

assume type key_prop : key -> text2 -> Type

type pkey (p:(text2 -> Type)) = 
  k:key{ key_prop k == p
      /\ BMAC.key_prop (Keys?.k0 k) == bspec0 p
      /\ BMAC.key_prop (Keys?.k1 k) == bspec1 p }

val keygen: p:(text2 -> Type) -> pkey p
val mac:    p:(text2 -> Type) -> k:pkey p -> t:text2{p  t} -> tag
val verify: p:(text2 -> Type) -> k:pkey p -> t:text2 -> tag -> b:bool{b ==> p t}


let keygen (spec: text2 -> Type) = 
  let k0 = BMAC.keygen (bspec0 spec) in
  let k1 = BMAC.keygen (bspec1 spec) in
  let k = Keys k0 k1 in
  assert (BMAC.key_prop k0 == bspec0 spec);
  assert (BMAC.key_prop k1 == bspec1 spec);
  assume (key_prop k == spec);
  k
 

let mac p (Keys k0 k1) t = 
  if length t < blocksize 
  then BMAC.mac k0 (encode t)
  else BMAC.mac k1 t

let verify p (Keys k0 k1) t tag =   
  if length t < blocksize
  then BMAC.verify k0 (encode t) tag
  else BMAC.verify k1 t tag

code/solutions/Ex12g.TMAC2.fstLoad in editor

Exercise 13h:
What about MAC schemes for texts of arbitrary lengths? Hint: have a look e.g. at the HMAC construction.

14. Cryptography examples for confidentiality

14.1. Theoretical background on cryptographic verification in F*

Those unfamiliar with functional programming may appreciate the following security centered look at the differences between a functional and an imperative programming style.

We use a cryptographic verification by typing approach adapted to make use of new F* features. We follow their notation and recall them briefly.

a) F* Types: A program is a sequential composition of modules, written $A_1 \cdot A_2 \cdot \dots \cdot A_n$. A module $A$ depends on types, values, and functions from its environment and exports types, values, and functions in its own interface $I$. A module is well typed, written $I_1 ,\dots ,I_\ell \vdash A \leadsto I$ if it correctly implements $I$ using modules with interfaces $I_1 ,\dots,I_\ell$. The simplest well typed program is an expression $A$ that has type $t$ in typing environment $I$, written $I \vdash A : t$. A program is closed if it is well typed in the empty environment $\emptyset$. A program is safe if, in every run of the program, all runtime values satisfy their type. A key property of a type system is that well-typed programs are always safe.

Theorem 1 (Safety by Typing): If $\emptyset \vdash A : t$, then $A$ is safe.

In F* randomness originates from calls to effectful external functions and at the meta level we model these functions probabilistically via a sample expression, which reduces to either true or false with probability 1/2. This expression models a perfect source of random bits. In this probabilistic interpretation, two expressions $A_0$ and $A_1$ are equivalent, written $A_0 \approx A_1$, when they return the same probability distribution of values.

An interface or module can declare a type as abstract, e.g. abstract type key = bytes, and keep its representation hidden. The qualified name is bound outside the scope of the module, but its definition is hidden. This guarantees that modules using this interface will treat key values as opaque and guarantees their integrity and secrecy.

As in the example above, one use of abstract types is to protect keys from being leaked by programming mistakes. Another use of abstract types is to verify simple equivalences for programs $P \cdot A$ written such that operations on secrets are performed by a pure (side-effect free) module $P$ that exports a restricted interface $I_\alpha$ with an abstract type $\alpha$ for secrets. The rest of the program can be passed secrets and can pass them back to $P$ but cannot access the representation of secrets. For instance, it can never branch on secrets.

Theorem 2 (Equivalence by Typing): If $\emptyset \vdash P_b \leadsto I_\alpha$ for $b = 0,1$ and $I_Ξ± \vdash A : bool$, then $P_0 \cdot A \approx P_1 \cdot A$.

b) $\epsilon$-Safety and $\epsilon$-Equivalence: Instead of relying on the asymptotic safety and equivalence notions of the original cryptographic verification by typing approach we use a concrete $\epsilon$ loss term. A module A is $\epsilon$-safe if the probability of an assertion failing in $A$ is $\epsilon$. Two closed boolean expressions $A_0$ and $A_1$ are $\epsilon$-equivalent when the statistical distance $\frac{1}{2} \sum_{M=true,false} |Pr[A_0 \Downarrow M]-Pr[A_1 \Downarrow M]|$ is $\epsilon$. If $A$ is not closed, then the value of $\epsilon$ may depend on the context in which it is executed, in this case we model $\epsilon(\cdot)$ as a function that depends on the context and call it an advantage function.

c) Keyed module: Keyed modules export gen, leak, and coerce functions and provide some cryptographic functionality modeled using idealization for honest keys and strong algorithms. To this end keys can be associated with a unique index upon generation. Keyed modules enforce that gen is called at most once per index. The index describes the context the key was created in and the algorithms it is to be used with.

d) Plain module: To define confidentiality using types, consider a module $P$ that is typeable using an interface $IP$ of the form

(* abstract *) type plain = bytes

assume Plain_hasEq: hasEq plain

(* abstract *) val repr: p:plain{not conf} -> Tot bytes
let repr p = p

(* abstract *) val coerce: r:bytes{not auth} -> Tot plain
let coerce r = r

(* abstract *) val length: plain -> Tot nat
let length p = length p

If conf is true, the function repr cannot be called, intuitively making the interface parametric in type plain except for the information leaked by length.

When auth is true, the function coerce cannot be called, guaranteeing authenticity. The module $P$ does not hide the length of plain. as modelled by the function length returning its length.

To achieve its strong confidentiality strong property conf=true$\cdot C$ must not be able to access the representation of plain and thus needs to be idealized. For encryption modules this is typically done by encrypting a value that is independent of the plaintext, such as a string of zeros. To make also the decryption of such encryptions of zero indistinguishable from decryptions of the acutal plaintext, the idealization of $C$ decrypts using a table lookup.

Let's make the theory more concrete by looking at Encrypt-then-MAC based authenticated encryption.

14.2. Encrypt-then-MAC

At the high level, we define 5 modules:

1. the Ideal module defines auth and conf flags and relates them to assumptions that allow idealization of the underlying cryptographic building blocks. Flag ind_cpa controls the idealization for an encryption module that is indistinguishable under chosen plaintext attack (IND-CPA), and flag uf_cma controls the idealization for a UF-CMA secure MAC module. UF-CMA security was defined above using a cryptographic game (13.1.1), we will se a similar game for IND-CPA below.

2. the Plain module is a secrecy module defining a plain type. Its representation is abstract and can only be accessed using the repr function when conf=false.

3. the CPA module models an IND-CPA secure encryption scheme. Intuitively, when the adversary is restricted to only query decrypt on ciphertexts resulting from encrypt (under the same key), then the idealization of CPA can encrypt zeros and CPA protects the secrecy of plain. This is modelled by setting conf=ind_cpa.

4. the MAC module models an UF-CMA secure message authentication scheme. The idealization of MAC guarantees that only message and tag pairs that were maced can verify.

5. the AE module composes CPA and MAC. An AE key k consists of a key k.ke for a CPA secure encryption scheme, and a MAC key k.km. The AE ciphertext is computed by encrypting the plaintext and then macing the ciphertext:

let encrypt k plain =
  (* let h0 = ST.get () in *)
  let c = CPA.encrypt k.ke plain in
  let t = MAC.mac k.km c in
  write_at_end k.log (plain, (c, t));
  (c, t)

code/solutions/EtM.AE.fstLoad in editor

The monotonic log k.log is only used to model security properties and does not exist in concrete schemes.

Before decrypting the CPA ciphertext, AE verifies the corresponding MAC tag:

let decrypt k (c,tag) =
  if MAC.verify k.km c tag
  then ( Some (CPA.decrypt k.ke c) )
  else ( None )

code/solutions/EtM.AE.fstLoad in editor

AE treats CPA and MAC keys abstractly, and never reveals them to an adversary $A$. This together with the fact that AE only ever macs after encrypting, guarantees that AE$\cdot A$ is a restricted CPA adversary.

Let us now look in more detail at the CPA and MAC modules and how they model the security of the underlying primitives.

14.2.1. Chosen Plaintext Attack (CPA) secure encryption

Our concrete implementation of CPA secure encryption uses AES_128_CBC. It first samples a random initialization vector iv; obtains a representation of the key using k.raw and the plaintext using repr m; and calls a library function implementing AES_128_CBC:

let encrypt k m =
  recall k.log;
  let iv = random ivsize in
  let text = if ind_cpa then createBytes (length m) 0z else repr m in
  let c = CoreCrypto.block_encrypt AES_128_CBC k.raw iv text in
  let c = iv@|c in
  write_at_end k.log (m,c);
  c

Calling the library function which expects bytes as input, requires breaking the abstraction barriers. This is not a problem for the AES key, as its type is declared in the same module. It is however forbidden for the Plain module in which plain is defined as calls to repr are only allowed when priv=false.

Instead, when priv=true module CPA computes text in the following way:

  let text = if ind_cpa then createBytes (length m) 0z else repr m in

Here we make use of the fact that we defined priv=ind_cpa in the Ideal module. This change is justified by plaintext indistinguishability, as adversaries cannot distinguish encryptions of m from encryptions of (length m) 0z bytes.

For IND-CPA secure schemes this indistinguishability is only guaranteed to hold for adversaries that can provide plaintext to encrypt, but cannot provide ciphertexts to decrypt.

To see this consider the following classical definition.

The CPA indistinguishability experiment

A CPA scheme encryption scheme $\Pi=($keygen,encrypt,decrypt$)$ is IND-CPA secure if the advantage of ‘realistic’ adversaries $A$ in the following probabilistic experiment is small:

The advantage of $A$ is defined as $\epsilon(A) = 2 Pr[b=b']-1$.

The CPA encryption type interface

Without loss of generality, we can allow ‘decryption’ of ciphertexts resulting from encrypt. To model this we change the last line of encrypt to be 'write_at_end k.log (m,c); c' to update the log of ciphertext. By decrypting using table lookup as follows we still hide whether it was m or a string of 0zs that was encrypted:

let decrypt k c =
  if ind_cpa_rest_adv then
    let log = !k.log in
    match seq_find (fun mc -> snd mc = c) log with
    | Some mc -> fst mc
  else
    let iv,c' = split c ivsize in
    coerce (CoreCrypto.block_decrypt AES_128_CBC k.raw iv c')

Note that in the match statement, we did not define the '| None' branch. This is because the precondition of decrypt requires for IND-CPA restricted adversaries (ind_cpa_rest_adv=true) that table lookup in log always succeeds.

This can be expressed as a stateful precondition for decrypt:

val decrypt: k:key -> c:cipher -> ST msg
  (requires (fun h0 ->
    Map.contains (get_hmap h0) k.region /\
    (let log0 = sel h0 k.log in
      (b2t ind_cpa_rest_adv) ==> Some? (seq_find (fun mc -> snd mc = c) log0 )
    )
  ))

code/solutions/EtM.CPA.fstLoad in editor

where function m_sel h0 k.log retrieves the content of k.log as stored in heap h0.

Exercise 14a:
Play with the type of EtM.CPA.decrypt above. What fails if you remove the precondition. What is the connection with chosen plaintext attacks?

exercise answer

The type checker fails with a “Patterns are incomplete” error as it can no longer guarantee that it will find the ciphertext in the log. To preserve functionality one would have to use the block_decrypt functionality as in the else branch which could leak information to the attacker which might make the scheme insecure. The stronger security notion of indistinguishability against chosen ciphertext attacks (IND-CCA) guarantees security in such a setting.

14.2.2. Using authentication to achieve stronger confidentiality

We already introduced the MAC unforgeabiltiy property in (13.1). Here we model MAC security using an idealization involving a log of maced messages and their tags:

let mac k m =
  let t = hmac_sha1 k.raw m in
  write_at_end k.log (m,t);
  t

Instead of modelling unforgeability using a predicate that is required upon macing and guaranteed upon verifying, we now express the unforgeability guarantee directly on the log. In the idealization, only (m,t) pairs that are in the log verify.

val verify: k:key -> m:msg -> t:tag -> ST bool
  (requires (fun h -> True))
  (ensures  (fun h0 res h1 ->
     modifies_none h0 h1 /\
     (( Ideal.uf_cma && res ) ==> mem (m,t) (sel h0 k.log))))

As for encryption, we also give an ideal functionality that is well typed with respect to this specification:

let verify k m t =
  let t' = hmac_sha1 k.raw m in
  let verified = (t = t') in
  let log = !k.log in
  let found = mem (m,t) log in
  if Ideal.uf_cma then
    verified && found
  else
    verified

code/solutions/EtM.MAC.fstLoad in editor

As any adversary that causes a difference between verified and found is also a successful UF-CMA adversary we have that uf_cma=false, MACβ‹…$A$ $\approx_{\epsilon_{ufcma}}(A)$ uf_cma=true, MACβ‹…$A$.

Said otherwise, as long as the MAC construction is UF-CMA secure, the adversaries behaviour cannot differ substantially between the ideal world and the real world.

For our Encrypt-then-MAC scheme we establish the invariant that all maced messages correspond to encrypted ciphertexts. This allows us to establish the precondition of decrypt after MAC verify.

When the MAC is idealized, adversaries cannot compute ciphertexts that pass MAC verification.

The invariant is expressed over logs and at first sight looks quite daunting:

(* abstract *) let invariant (h:mem) (k:key) =
  let log = get_log h k in
  let mac_log = get_mac_log h k in
  let cpa_log = get_cpa_log h k in
  Map.contains (get_hmap h) k.region /\
  Map.contains (get_hmap h) (MAC.Key?.region k.km) /\
  Map.contains (get_hmap h) (CPA.Key?.region k.ke) /\
  Seq.length log = Seq.length mac_log /\
  Seq.length mac_log = Seq.length cpa_log /\
  (forall (i:int). indexable log i ==>
    (let m1,t = Seq.index mac_log i in
     let m2,c = Seq.index cpa_log i in
     m1 = c /\
     Seq.index log i == (m2,(c,t))
    )
  )

code/solutions/EtM.AE.fstLoad in editor

Crucially all messages in the MAC log correspond to ciphertexts that resulted from previous encryptions. This allows us to prove that uf_cma=true $\cdot$ MAC $\cdot$ AE$\cdot A$ is a restricted CPA adversary. We can thus set ind_cpa_rest_adv=uf_cma.

14.2.3. The computational security proof

We want to prove that: For all 𝐴 that are well typed with respect to an unrestricted Plain and AE interface, that is conf=false, auth=false, $IP$, $IC \vdash A$, the following computational equivalence holds:

(1) conf=auth=ind_cpa_rest_adv=uf_cma=ind_cpa=false, Plain β‹… CPA β‹… MAC β‹… AE β‹… A

$\approx_{\epsilon_{indcpa}(B_1)+\epsilon_{ufcma}(B_2)}$

(2) conf=auth=ind_cpa_rest_adv=uf_cma=ind_cpa=true, Plain β‹… CPA β‹… MACβ‹… AE β‹… A.

In other words, the distinguishing advantage of an adversary A between the concrete implementation and the idealized implementation of Encrypt-then-MAC is bounded by $\epsilon_{indcpa}(B_1)+\epsilon_{ufcma}(B_2)$.

To see this we reason about several game transformations.

Game 1 is the same as program (1), except that we swap the order of MAC and Plainβ‹…CPA with the goal of moving all modules to the right of the MAC module. This change does not affect the distribution output of A and we thus have

(1) β‰ˆ conf=false, auth=false, MACβ‹…Plainβ‹…CPAβ‹…AEβ‹… A

Game 2 is the same as Game 1, except that we activate the idealizations of the MAC module by setting uf_cma=true.

conf=false, auth=false, MACβ‹…Plainβ‹…CPAβ‹…AEβ‹… A

$\approx_{\epsilon_{ufcma}}(B_1)$

conf=false, auth=false, uf_cma=true, MACβ‹…Plainβ‹…CPAβ‹…AEβ‹… A

$B_1$ corresponds to Plainβ‹…CPAβ‹…AEβ‹… A, and can be interpreted as a reduction to UF-CMA security.

Game 3 is the same as Game 2, except that we swap the order of MAC and Plainβ‹…CPA back and also set auth and ind_cpa_rest_adv to true. These changes have no effect on the distribution of the program.

β‰ˆ conf=false, auth=ind_cpa_rest_adv=uf_cma=true, Plainβ‹…CPAβ‹…MACβ‹…AEβ‹… A

Because of the MAC guarantees we can establish the main invariant which is sufficient to prove that we never call CPA.decrypt on ciphertexts that are not in the log. This allows us type the program with auth and ind_cpa_rest_adv set to true.

Game 4 is the same as Game 3, except that we activate the idealizations of the CPA module by setting ind_cpa=true.

conf=false, auth=ind_cpa_rest_adv=uf_cma=true, Plainβ‹…CPAβ‹…MACβ‹…AEβ‹… A

$\approx_{\epsilon_{indcpa}}(B_2)$

conf=false, auth=ind_cpa_rest_adv=uf_cma=ind_cpa=true, Plainβ‹…CPAβ‹…MACβ‹…AEβ‹… A β‰ˆ (2)

$B_2$ corresponds to MACβ‹…AEβ‹… A, and can be interpreted as a reduction to IND-CPA security.

Finally, CPAβ‹…MACβ‹…AEβ‹…A does not make use of Plain.repr and we can thus type it as

conf=auth=ind_cpa_rest_adv=uf_cma=ind_cpa=true, Plainβ‹…CPAβ‹…MACβ‹…AEβ‹…A = (2)

$\qed$

Exercise 14b:
Explain the behaviour of the type checker when you try to prove that AE is confidential (conf=true), without the MAC being uf_cma secure.

Set the flags explicitly to see what happens:

let ind_cpa = true
let uf_cma = false
let ind_cpa_rest_adv = uf_cma
let conf = true
let auth = false
exercise answer

The type checker fails to prove that plaintexts are not conf:

.\EtM.CPA.fst(70,85-70,86): Subtyping check failed; 
expected type (p#25:EtM.Plain.plain{Prims.op_Negation EtM.Ideal.conf}); 
got type EtM.Plain.plain

This is needed to call the repr function in the encrypt function:

let text = if ind_cpa && ind_cpa_rest_adv then createBytes (length m) 0z else repr m 

This is because we cannot encrypt 0zs when we do not have a restricted CPA adversary (ind_cpa_rest_adv=false).

If we set ind_cpa_rest_adv=true even though uf_cma=false we get a type error

.\EtM.AE.fst(139,11-139,31): failed to prove a pre-condition
Error: 1 errors were reported (see above)

This results from failing to prove the pre-condition of CPA.decrypt in AE.decrypt.

15. Low* Examples

15.1. Exercise 1: Machine integers

module LowStar.Ex1

/// Programming with machine integers

code/exercises/LowStar.Ex1.fstLoad in editor

15.1.1. Answer

module LowStar.Ex1

/// Programming with machine integers

code/solutions/LowStar.Ex1.fstLoad in editor

15.2. Exercise 2: References and buffers

module LowStar.Ex2

/// Programming with references and buffers

code/exercises/LowStar.Ex2.fstLoad in editor

15.2.1. Answer

module LowStar.Ex2

/// Programming with references and buffers

code/solutions/LowStar.Ex2.fstLoad in editor

15.3. Exercise 3: Finite lists

module LowStar.Ex3

/// Implementing a finite list ADT using in fixed-size buffer

code/exercises/LowStar.Ex3.fstLoad in editor

15.3.1. Answer

module LowStar.Ex3

/// Implementing a finite list ADT using in fixed-size buffer

code/solutions/LowStar.Ex3.fstLoad in editor

16. A note on this document

This document was created using Daan Leijen's https://​madoko.​net markdown processor.

If you find any errors in it, or would like to improve it in any way, this is how you do it:

  1. Point your browser at https://​madoko.​net
  2. Select the folder icon in the top-left corner, then “Open … (Alt-O)”
  3. Select the “GitHub” icon
  4. Authorize the Madoko application to access your GitHub repositories
  5. Open /FStarLang/FStar/doc/tutorial/tutorial.mdk, or your fork of the FStar repo if you don't have access, and start editing.
  6. Ctrl+S (or Synchronize from the folder tab) will bring up a dialog to commit and push this file to GitHub after merging with the latest changes there.

For included code snippets:

The snippets must be derived from files that are stored in the same subtree of the git repo i.e, all the samples should be in sub-directories of FStarLang/FStar/doc/tutorial/code

Note: in this mode (i.e., github instead of dropbox), the document will not live update with the changes that anyone else may be making to the document concurrently. You will only see others' changes when you synchronize with github.

If you use Emacs to edit this you can use the markdown-mode https://​jblevins.​org/​projects/​markdown-​mode/​ for syntax highlighting.

If you experience problems with the online editor, e.g., many examples failing with strange error messages, please ring the bells at the mailing list or raise a github issue to investigate the problem further. Someone else might have filed one before, or an old bug crawled back in.