# Erasure and the Ghost Effect

When writing proof-oriented programs, inevitably, some parts of the program serve only to state and prove properties about the code that actually executes. Our first non-trivial effect separates the computationally relevant parts of the program from the computationally irrelevant (i.e., specificational or ghost) parts of a program. This separation enables the F* compiler to guarantee that all the ghost parts of a program are optimized away entirely.

For a glimpse of what all of this is about, let’s take a look again at length-indexed vectors—we saw them first here.

```type vec (a:Type) : nat -> Type =
| Nil : vec a 0
| Cons : #n:nat -> hd:a -> tl:vec a n -> vec a (n + 1)
```

and a function to concatenate two vectors:

```let rec append #a #n #m (v1:vec a n) (v2:vec a m)
: vec a (n + m)
= match v1 with
| Nil -> v2
| Cons hd tl -> Cons hd (append tl v2)
```

Compare this with concatenating two lists:

```let rec list_append #a (l1 l2:list a) =
match l1 with
| [] -> []
| hd::tl -> hd :: list_append tl l2
```

Superficially, because of the implicit arguments, it may look like concatenating vectors with `append` is just as efficient as a concatenating lists—the length indexes seem to impose no overhead. But, let’s look at the code that F* extracts to OCaml for length-indexed vectors.

First, in the definition of the `vec` type, since OCaml is not dependently typed, the `nat`-index of the F* `vec` is replaced by a `'dummy` type argument—that’s fine. But, notice that the `Cons` constructor contains three fields: a `Prims.nat` for the length of the tail of the list, the head of the list, and then then tail, i.e., the length of the tail of the list is stored at every `Cons` cell, so the `vec` type is actually less space efficient than an ordinary `list`.

```type ('a, 'dummy) vec =
| Nil
| Cons of Prims.nat * 'a * ('a, unit) vec
```

Next, in the OCaml definition of `append`, we see that it receives additional arguments `n` and `m` for the lengths of the vectors, and worse, in the last case, it incurs an addition to sum `n' + m` when building the result vector. So, `append` is also less time-efficient than `List.append`.

```let rec append :
'a .
Prims.nat ->
Prims.nat -> ('a, unit) vec -> ('a, unit) vec -> ('a, unit) vec
=
fun n ->
fun m ->
fun v1 ->
fun v2 ->
match v1 with
| Nil -> v2
| Cons (n', hd, tl) ->
Cons ((n' + m), hd, (append n' m tl v2))
```

This is particularly unfortunate, since the computational behavior of `append` doesn’t actually depend on the length indexes of the input vectors. What we need is a principled way to indicate to the F* compiler that some parts of a computation are actually only there for specification or proof purposes and that they can be removed when compiling the code, without changing the observable result computed by the program. This is what erasure is about—removing the computationally irrelevant parts of a term for compilation.

Here’s a revised version of vectors, making use of the `erased` type from the `FStar.Ghost` library to indicate to F* which parts must be erased by the compiler.

```module VecErased
open FStar.Ghost

noeq
type vec a : nat -> Type =
| Nil : vec a 0
| Cons : #n:erased nat -> hd:a -> tl:vec a n -> vec a (n + 1)

let rec append #a (#n #m:erased nat) (v0:vec a n) (v1:vec a m)
: vec a (n + m)
= match v0 with
| Nil -> v1
| Cons hd tl -> Cons hd (append tl v1)
```

We’ll look into this in much more detail in what follows, but notice for now that:

1. The first argument of `Cons` now has type `erased nat`.

2. The implicit arguments of `append` corresponding to the indexes of the input vectors have type `erased nat`.

If we extract this code to OCaml, here’s what we get:

```type ('a, 'dummy) vec =
| Nil
| Cons of unit * 'a * ('a, unit) vec
```
```let rec append :
'a . unit -> unit -> ('a, unit) vec -> ('a, unit) vec -> ('a, unit) vec =
fun n ->
fun m ->
fun v0 ->
fun v1 ->
match v0 with
| Nil -> v1
| Cons (uu___, hd, tl) -> Cons ((), hd, (append () () tl v1))
```

Notice that the erased arguments have all been turned into the unit value `()`, and the needless addition in `append` is gone too.

Of course, the code would be cleaner if F* were to have entirely removed the argument instead of leaving behind a unit term, but we leave it to the downstream compiler, e.g., OCaml itself, to remove these needless units. Further, if we’re compiling the ML code extracted from F* to C, then KaRaMeL does remove these additional units in the C code it produces.

## Ghost: A Primitive Effect

The second, primitive effect in F*’s effect system is the effect of ghost computations, i.e., computation types whose effect label is `GTot`. 1 The label `GTot` is strictly above `Tot` in the effect hierarchy, i.e., `Tot < GTot`. This means that a term with computation type `GTot t` cannot influence the behavior of a term whose type is `Tot s`. Conversely, every `Tot` computation can be implicitly promoted to a `GTot` computation.

Ghost computations are just as well-behaved as pure, total terms—they always terminate on all inputs and exhibit no observable effects, except for the value they return. As such, F*’s logical core really includes both `Tot` and `GTot` computations. The distinction between `Tot` and `GTot` is only relevant when considering how programs are compiled. Ghost computations are guaranteed to be erased by the the compiler, while `Tot` computations are retained.

Since `Tot` terms are implicitly promoted to `GTot`, it is easy to designate that some piece of code should be erased just by annotating it with a `GTot` effect label. For example, here is an ghost version of the factorial function:

```let rec factorial (n:nat)
: GTot nat
= if n = 0 then 1 else n * factorial (n - 1)
```

Its definition is identical to the corresponding total function that we saw earlier, except that we have annotated the return computation type of the function as `GTot nat`. This indicates to F* that `factorial` is to be erased during compilation, and the F* type-and-effect checker ensures that `Tot` computation cannot depend on an application of `factorial n`.

1

The name `GTot` is meant to stand for “Ghost and Total” computations, and is pronounced “gee tote”. However, it’s a poor name and is far from self-explanatory. We plan to change the name of this effect in the future (e.g., to something like `Spec`, `Ghost`, or `Erased`), though this is a breaking change to a large amount of existing F* code.

## Ghost Computations as Specifications

A ghost function like `factorial` can be used in specifications, e.g., in a proof that a tail recursion optimization `factorial_tail` is equivalent to `factorial`.

```let rec factorial_tail (n:nat) (out:nat)
: Tot (r:nat { r == out * factorial n })
= if n = 0 then out
else factorial_tail (n - 1) (n * out)

let fact (n:nat)
: Tot (r:nat { r == factorial n })
= factorial_tail n 1
```

This type allows a client to use the more efficient `fact`, but for reasoning purposes, one can use the more canonical `factorial`, proven equivalent to `fact`.

In contrast, if we were to try to implement the same specification by directly using the factorial ghost function, F* complains with a effect incompatibility error.

```[@@expect_failure]
: Tot (r:nat { r == out * factorial n })
= out * factorial n
```

The error is:

```Computed type "r: nat{r == out * factorial n}" and
effect "GTot" is not compatible with the annotated
type "r: nat{r == out * factorial n}" effect "Tot"
```

So, while F* forbids using ghost computations in `Tot` contexts, it seems to be fine with accepting a use of factorial in specifications, e.g., in the type `r:nat { r == out * factorial n }`. We’ll see in a moment why this is permitted.

## Erasable and Non-informative Types

In addition to using the `GTot` effect to classifies computations that must be erased, F* also provides a way to mark certain value types as erasable.

Consider introducing an inductive type definition that is meant to describe a proof term only and for that proof term to introduce to runtime overhead. In a system like Coq, the type of Coq propositions `Prop` serves this purpose, but `prop` in F* is quite different. Instead, F* allows an inductive type definition to be marked as `erasable`.

For example, when we looked at the simply typed lambda calculus (STLC), we introduced the inductive type below, to represent a typing derivation for an STLC term. One could define a typechecker for STLC and give it the type shown below to prove it correct:

```val check (g:env) (e:exp) : (t : typ & typing g e t)
```

However, this function returns both the type `t:typ` computed for `e`, we well as the typing derivation. Although the typing derivation may be useful in some cases, often returning the whole derivation is unnecessary. By marking the definition of the `typing` inductive as shown below (and keeping the rest of the definition the same), F* guarantees that the compiler will extract `typing g e t` to the `unit` type and correspondinly, all values of ```typing g e t``` will be erased to the unit value `()`

```[@@erasable]
noeq
type typing : env -> exp -> typ -> Type = ...
```

Marking a type with the `erasable` attribute and having it be erased to `unit` is safe because F* restricts how `erasable` types can be used. In particular, no `Tot` computations should be able to extract information from a value of an erasable type.

Closely related to erasable types are a class of types that are called non-informative, defined inductively as follows:

1. The type `Type` is non-informative

2. The type `prop` is non-informative (i.e., unit and all its subtypes)

3. An erasable type is non-informative

4. A function type `x:t -> Tot s` is non-informative, if `s` is non-informative

5. A ghost function type `x:t -> GTot s` is non-informative

6. A function type `x:t -> C`, with user-defined computation type `C`, is non-informative if the effect label of `C` has the erasable attribute.

Intuitively, a non-informative type is a type that cannot be case-analyzed in a `Tot` context.

With this notion of non-informative types, we can now define the restrictions on an `erasable` type:

1. Any computation that pattern matches on an erasable type must return a non-informative type.

2. Inductive types with the `erasable` attribute do not support built-in decidable equality and must also be marked `noeq`.

## The erased type, reveal, and hide

The `erasable` attribute can only be added to new inductive type definitions and every instance of that type becomes erasable. If you have a type like `nat`, which is not erasable, but some occurrences of it (e.g., in the arguments to `Vector.append`) need to be erased, the F* standard library `FStar.Ghost.fsti` offers the following:

```(** [erased t] is the computationally irrelevant counterpart of [t] *)
[@@ erasable]
val erased (t:Type u#a) : Type u#a
```

`FStar.Ghost` also offers a pair of functions, `reveal` and `hide`, that form a bijection between `a` and `erased a`.

```val reveal (#a: Type u#a) (v:erased a) : GTot a

val hide (#a: Type u#a) (v:a) : Tot (erased a)

val hide_reveal (#a: Type) (x: erased a)
: Lemma (ensures (hide (reveal x) == x))
[SMTPat (reveal x)]

val reveal_hide (#a: Type) (x: a)
: Lemma (ensures (reveal (hide x) == x))
[SMTPat (hide x)]
```

Importantly, `reveal v` breaks the abstraction of `v:erased a` returning just an `a`, but doing so incurs a `GTot` effect—so, `reveal` cannot be used in an arbitrary `Tot` contexts.

Dually, `hide v` can be used to erase `v:a`, since a `Tot` context cannot depend on the value of an `erased a`.

The SMT patterns on the two lemmas allow F* and Z3 to automatically instantiate the lemmas to relate a value and its hidden counterpart—this chapter provides more details on how SMT patterns work.

Implicit coercions

`FStar.Ghost.erased`, `reveal`, and `hide` are so commonly used in F* that the compiler provides some special support for it. In particular, when a term `v:t` is used in a context that expects an `erased t`, F* implictly coerces `v` to `hide v`. Likewise, when the context expects a `t` where `v:erased t` is provided, F* implicitly coerces `v` to `reveal v`.

The following examples illustrates a few usages and limitations. You can ask F* to print the code with implicits enabled by using `--dump_module RevealHideCoercions --print_implicits`.

```module RevealHideCoercions
open FStar.Ghost

let auto_hide (x:nat) : erased nat = x
let auto_reveal (x:erased nat) : GTot nat = x

[@@expect_failure] //Expect GTot
let auto_reveal_2 (x:erased nat) : Tot nat = x

let incr (x:nat) : nat = x + 1
let incr_e (x:erased nat) : erased nat = incr x

let incr' (x:nat) : GTot nat = incr_e x

[@@expect_failure]
let poly (x:nat) (y:erased nat) = x == y
```

A few comments on these examples:

• The first two functions illustrate how a `nat` is coerced implicitly to `erased nat`. Note, the effect of `auto_reveal` is `GTot`

• `auto_reveal_2` fails, since the the annotation claims, incorrectly, that the effect label is `Tot`

• `incr` is just a `nat -> nat` function.

• `incr_e` is interesting because it calls `incr` with an `erased nat` and the annotation expects an `erased nat` too. The body of `incr_e` is implicitly coerced to ```hide (incr (reveal x))```

• `incr'` is interesting, since it calls `incr_e`: its body is implicitly coerced to `reveal (incr_e (hide x))`

• Finally, `poly` shows the limitations of implicit coercion: F* only inserts coercions when the expected type of the term in a context and the type of the term differ by an `erased` constructor. In `poly`, since `==` is polymorphic, the expected type of the context is just an unresolved unification variable and, so, no coercion is inserted. Instead, F* complains that `y` has type `erased nat` when the type `nat` was expected.

## Using Ghost Computations in Total Contexts

We have already noted that `Tot < GTot`, enabling `Tot` computations to be re-used in `GTot` contexts. For erasure to be sound, it is crucial that `GTot` terms cannot be used in `Tot` contexts, and indeed, F* forbids this in general. However, there are two exceptions.

### Effect Promotion for Non-informative Types

Consider a term `f` with type `GTot s`, where `s` is a non-informative type. Since `s` is non-informative, no total context can extract any information from `f`. As such, F* allows implicitly promoting `GTot s` to `Tot s`, when `s` is a non-informative type.

For instance, the following is derivable, `hide (factorial 0) : Tot (erased nat)`: let’s work through it in detail.

1. We know that that `factorial n : GTot nat`

2. Recall from the discussion on evaluation order and the application of functions to effectful arguments, `hide (factorial 0)` is equivalent to ```let x = factorial 0 in hide x```, where `x:nat` and `hide x : Tot (erased nat)`.

3. From the rule for sequential composition of effectful terms, the type of `let x = factorial 0 in hide x` should be ```GTot (erased nat)```, since `GTot = lub GTot Tot`.

4. Since `erased nat` is a non-informative type, ```GTot (erased nat)``` is promoted to `Tot (erased nat)`, which is then the type of `hide (factorial 0)`.

Effect promotion for ghost functions returning non-informative types is very useful. It allows one to mix ghost computations with total computations, so long as the result of the ghost sub-computation is hidden with an erased type. For instance, in the code below, we use `hide (factorial (n - 1))` and use the result `f_n_1` in an assertion to or some other proof step, all within a function that is in the `Tot` effect.

```let rec factorial_tail_alt (n:nat) (out:nat)
: Tot (r:nat { r == out * factorial n })
= if n = 0 then out
else (
let f_n_1 = hide (factorial (n - 1)) in
let result = factorial_tail_alt (n - 1) (n * out) in
assert (result == (n * out) * f_n_1);
result
)
```

### FStar.Ghost.Pull.pull

Consider applying `map factorial [0;1;2]`, where ```map : (a -> Tot b) -> list a -> list b```. F* refuses to typecheck this written, since `factorial : nat -> GTot nat`, which is not a subtype of ```nat -> Tot nat```.

Using effect promotion, one could write ```map (fun x -> hide (factorial x)) [0;1;2]```, but this has type `list (erased nat)`, which is not the same as `list nat`.

This is unfortunate since it seems that basic libraries designed for use with with higher-order total functions cannot be reused with ghost functions. 2

The library function `FStar.Ghost.Pull.pull` can help in such situatinos. It has the following signature:

```val pull (#a:Type) (#b:a -> Type) (f: (x:a -> GTot (b x)))
: GTot (x:a -> b x)

val pull_equiv (#a:Type) (#b:a -> Type) (f: (x:a -> GTot (b x))) (x:a)
: Lemma (ensures pull f x == f x)
[SMTPat (pull f x)]
```

This type states that for any ghost function `f`, we can exhibit a total function `g` with the same type as `f`, while the lemma `pull_equiv` states that `g` is pointwise equal to `f`. 3 However, it may not be possible, in general, to compute `g` in a way that enables it to be compiled by F*. So, `pull f` itself has ghost effect, indicating that applications of `pull` cannot be used in compilable code.

Using `pull`, one can write ```map (pull factorial) [0;1;2] : GTot (list nat)```, thereby reusing ghost functions where total functions are expected.

2

F* does not have any built-in support for effect-label polymorphism. That is, one cannot write a type like this: ```map #a #b #e (f: a -> e b) (l:list a) : e (list b)```, where `e` is an effect variable. However, using F*’s dependent types, one can code up various ad hoc forms of effect polymorphism. Some other languages with effect systems, notably Koka, do support effect polymorphism primitively.

3

The `SMTPat` annotation on the `pull_equiv` lemma instructs F* to allow Z3 to instantiate this lemma automatically. You can learn more about SMT patterns and quantifier instantiation in a later chapter.

## Revisiting Vector Concatenation

We now have all the ingredients to understand how the vector append example shown at the start of this chapter works. Here, below, is a version of the same code with all the implicit arguments and reveal/hide operations made explicit.

```module VecErasedExplicit
open FStar.Ghost

noeq
type vec a : nat -> Type =
| Nil : vec a 0
| Cons : #n:erased nat -> hd:a -> tl:vec a (reveal n) -> vec a (reveal n + 1)

let rec append #a (#n #m:erased nat) (v0:vec a (reveal n)) (v1:vec a (reveal m))
: Tot (vec a (reveal n + reveal m))
(decreases (reveal n))
= match v0 with
| Nil -> v1
| Cons #_ #n_tl hd tl ->
Cons #a
#(hide (reveal n_tl + reveal m))
hd
(append #a
#n_tl
#m
tl
v1)
```

Definition of vec

In the definition of the inductive type `vec a`, we have two occurrences of `reveal`. Consider `vec a (reveal n)`, the type of the `tl` of the vector. `reveal n` is a ghost computation of type `GTot nat`, so `vec a (reveal n) : GTot Type`. But, since `Type` is non-informative, `GTot Type` is promoted to `Tot Type`. The promotion from `GTot Type` to `Tot Type` is pervasive in F* and enables ghost computations to be freely used in types and other specifications.

The `vec a (reveal n + 1)` in the result type of `Cons` is similar. Here `reveal n + 1` has type `GTot nat`, but applying it to `vec a` produces a `GTot Type`, which is promoted to ```Tot Type```.

Type of append

The type of `append` has four occurrences of `reveal`. Three of them, in the type of `v0`, `v1`, and the return type behave the same as the typing the fields of `Cons`: the `GTot Type` is promoted to `Tot Type`.

One additional wrinkle is in the decreases clause, where we have an explicit `reveal n`, since what decreases on each recursive call is the `nat` that’s in bijection with the parameter `n`, rather than `n` itself. When F* infers a decreases clause for a function, any erased terms in the clause are automatically revealed.

Definition of append

The recursive call instantiates the index parameters to `n_tl` and `m`, which are both erased.

When constructing the `Cons` node, its index argument is instantiated to `hide (reveal n_tl + reveal m)`. The needless addition is marked with a `hide` enabling that F* compiler to erase it. As we saw before in `factorial_tail_alt`, using `hide` allows one to mingle ghost computations (like `(reveal n - 1)`) with total computations, as needed for specifications and proofs.

All of this is painfully explicit, but the implicit reveal/hide coercions inserted by F* go a long way towards make things relatively smooth.