Erasure and the Ghost Effect
When writing proof-oriented programs, inevitably, some parts of the program serve only to state and prove properties about the code that actually executes. Our first non-trivial effect separates the computationally relevant parts of the program from the computationally irrelevant (i.e., specificational or ghost) parts of a program. This separation enables the F* compiler to guarantee that all the ghost parts of a program are optimized away entirely.
For a glimpse of what all of this is about, let’s take a look again at length-indexed vectors—we saw them first here.
type vec (a:Type) : nat -> Type =
| Nil : vec a 0
| Cons : #n:nat -> hd:a -> tl:vec a n -> vec a (n + 1)
and a function to concatenate two vectors:
let rec append #a #n #m (v1:vec a n) (v2:vec a m)
: vec a (n + m)
= match v1 with
| Nil -> v2
| Cons hd tl -> Cons hd (append tl v2)
Compare this with concatenating two lists:
let rec list_append #a (l1 l2:list a) =
match l1 with
| [] -> []
| hd::tl -> hd :: list_append tl l2
Superficially, because of the implicit arguments, it may look like
concatenating vectors with append
is just as efficient as a
concatenating lists—the length indexes seem to impose no
overhead. But, let’s look at the code that F* extracts to OCaml for
length-indexed vectors.
First, in the definition of the vec
type, since OCaml is not
dependently typed, the nat
-index of the F* vec
is replaced by
a 'dummy
type argument—that’s fine. But, notice that the
Cons
constructor contains three fields: a Prims.nat
for the
length of the tail of the list, the head of the list, and then then
tail, i.e., the length of the tail of the list is stored at every
Cons
cell, so the vec
type is actually less space efficient
than an ordinary list
.
type ('a, 'dummy) vec =
| Nil
| Cons of Prims.nat * 'a * ('a, unit) vec
Next, in the OCaml definition of append
, we see that it receives
additional arguments n
and m
for the lengths of the vectors,
and worse, in the last case, it incurs an addition to sum n' + m
when building the result vector. So, append
is also less
time-efficient than List.append
.
let rec append :
'a .
Prims.nat ->
Prims.nat -> ('a, unit) vec -> ('a, unit) vec -> ('a, unit) vec
=
fun n ->
fun m ->
fun v1 ->
fun v2 ->
match v1 with
| Nil -> v2
| Cons (n', hd, tl) ->
Cons ((n' + m), hd, (append n' m tl v2))
This is particularly unfortunate, since the computational behavior of
append
doesn’t actually depend on the length indexes of the input
vectors. What we need is a principled way to indicate to the F*
compiler that some parts of a computation are actually only there for
specification or proof purposes and that they can be removed when
compiling the code, without changing the observable result computed by
the program. This is what erasure is about—removing the
computationally irrelevant parts of a term for compilation.
Here’s a revised version of vectors, making use of the erased
type
from the FStar.Ghost
library to indicate to F* which parts must be
erased by the compiler.
module VecErased
open FStar.Ghost
noeq
type vec a : nat -> Type =
| Nil : vec a 0
| Cons : #n:erased nat -> hd:a -> tl:vec a n -> vec a (n + 1)
let rec append #a (#n #m:erased nat) (v0:vec a n) (v1:vec a m)
: vec a (n + m)
= match v0 with
| Nil -> v1
| Cons hd tl -> Cons hd (append tl v1)
We’ll look into this in much more detail in what follows, but notice for now that:
The first argument of
Cons
now has typeerased nat
.The implicit arguments of
append
corresponding to the indexes of the input vectors have typeerased nat
.
If we extract this code to OCaml, here’s what we get:
type ('a, 'dummy) vec =
| Nil
| Cons of unit * 'a * ('a, unit) vec
let rec append :
'a . unit -> unit -> ('a, unit) vec -> ('a, unit) vec -> ('a, unit) vec =
fun n ->
fun m ->
fun v0 ->
fun v1 ->
match v0 with
| Nil -> v1
| Cons (uu___, hd, tl) -> Cons ((), hd, (append () () tl v1))
Notice that the erased arguments have all been turned into the unit
value ()
, and the needless addition in append
is gone too.
Of course, the code would be cleaner if F* were to have entirely removed the argument instead of leaving behind a unit term, but we leave it to the downstream compiler, e.g., OCaml itself, to remove these needless units. Further, if we’re compiling the ML code extracted from F* to C, then KaRaMeL does remove these additional units in the C code it produces.
Ghost: A Primitive Effect
The second, primitive effect in F*’s effect system is the effect of
ghost computations, i.e., computation types whose effect label is
GTot
. [1] The label GTot
is strictly above Tot
in the
effect hierarchy, i.e., Tot < GTot
. This means that a term with
computation type GTot t
cannot influence the behavior of a term
whose type is Tot s
. Conversely, every Tot
computation can be
implicitly promoted to a GTot
computation.
Ghost computations are just as well-behaved as pure, total
terms—they always terminate on all inputs and exhibit no observable
effects, except for the value they return. As such, F*’s logical core
really includes both Tot
and GTot
computations. The
distinction between Tot
and GTot
is only relevant when
considering how programs are compiled. Ghost computations are
guaranteed to be erased by the the compiler, while Tot
computations are retained.
Since Tot
terms are implicitly promoted to GTot
, it is easy to
designate that some piece of code should be erased just by annotating
it with a GTot
effect label. For example, here is an ghost version
of the factorial function:
let rec factorial (n:nat)
: GTot nat
= if n = 0 then 1 else n * factorial (n - 1)
Its definition is identical to the corresponding total function that
we saw earlier, except that we have annotated the return computation
type of the function as GTot nat
. This indicates to F* that
factorial
is to be erased during compilation, and the F*
type-and-effect checker ensures that Tot
computation cannot depend
on an application of factorial n
.
Ghost Computations as Specifications
A ghost function like factorial
can be used in specifications,
e.g., in a proof that a tail recursion optimization factorial_tail
is equivalent to factorial
.
let rec factorial_tail (n:nat) (out:nat)
: Tot (r:nat { r == out * factorial n })
= if n = 0 then out
else factorial_tail (n - 1) (n * out)
let fact (n:nat)
: Tot (r:nat { r == factorial n })
= factorial_tail n 1
This type allows a client to use the more efficient fact
, but for
reasoning purposes, one can use the more canonical factorial
,
proven equivalent to fact
.
In contrast, if we were to try to implement the same specification by directly using the factorial ghost function, F* complains with a effect incompatibility error.
[@@expect_failure]
let factorial_bad (n:nat) (out:nat)
: Tot (r:nat { r == out * factorial n })
= out * factorial n
The error is:
Computed type "r: nat{r == out * factorial n}" and
effect "GTot" is not compatible with the annotated
type "r: nat{r == out * factorial n}" effect "Tot"
So, while F* forbids using ghost computations in Tot
contexts, it
seems to be fine with accepting a use of factorial in specifications,
e.g., in the type r:nat { r == out * factorial n }
. We’ll see in a
moment why this is permitted.
Erasable and Non-informative Types
In addition to using the GTot
effect to classifies computations
that must be erased, F* also provides a way to mark certain value
types as erasable.
Consider introducing an inductive type definition that is meant to
describe a proof term only and for that proof term to introduce no
runtime overhead. In a system like Coq, the type of Coq propositions
Prop
serves this purpose, but prop
in F* is quite
different. Instead, F* allows an inductive type definition to be
marked as erasable
.
For example, when we looked at the simply typed lambda calculus (STLC), we introduced the inductive type below, to represent a typing derivation for an STLC term. One could define a typechecker for STLC and give it the type shown below to prove it correct:
val check (g:env) (e:exp) : (t : typ & typing g e t)
However, this function returns both the type t:typ
computed for
e
, we well as the typing derivation. Although the typing
derivation may be useful in some cases, often returning the whole
derivation is unnecessary. By marking the definition of the typing
inductive as shown below (and keeping the rest of the definition the
same), F* guarantees that the compiler will extract typing g e t
to the unit
type and correspondinly, all values of typing g e
t
will be erased to the unit value ()
[@@erasable]
noeq
type typing : env -> exp -> typ -> Type = ...
Marking a type with the erasable
attribute and having it be erased
to unit
is safe because F* restricts how erasable
types can be
used. In particular, no Tot
computations should be able to extract
information from a value of an erasable type.
Closely related to erasable types are a class of types that are called non-informative, defined inductively as follows:
The type
Type
is non-informativeThe type
prop
is non-informative (i.e., unit and all its subtypes)An erasable type is non-informative
A function type
x:t -> Tot s
is non-informative, ifs
is non-informativeA ghost function type
x:t -> GTot s
is non-informativeA function type
x:t -> C
, with user-defined computation typeC
, is non-informative if the effect label ofC
has the erasable attribute.
Intuitively, a non-informative type is a type that cannot be
case-analyzed in a Tot
context.
With this notion of non-informative types, we can now define the
restrictions on an erasable
type:
Any computation that pattern matches on an erasable type must return a non-informative type.
Inductive types with the
erasable
attribute do not support built-in decidable equality and must also be markednoeq
.
The erased type, reveal, and hide
The erasable
attribute can only be added to new inductive type
definitions and every instance of that type becomes erasable. If you
have a type like nat
, which is not erasable, but some occurrences
of it (e.g., in the arguments to Vector.append
) need to be erased,
the F* standard library FStar.Ghost.fsti
offers the following:
(** [erased t] is the computationally irrelevant counterpart of [t] *)
[@@ erasable]
val erased (t:Type u#a) : Type u#a
FStar.Ghost
also offers a pair of functions, reveal
and
hide
, that form a bijection between a
and erased a
.
val reveal (#a: Type u#a) (v:erased a) : GTot a
val hide (#a: Type u#a) (v:a) : Tot (erased a)
val hide_reveal (#a: Type) (x: erased a)
: Lemma (ensures (hide (reveal x) == x))
[SMTPat (reveal x)]
val reveal_hide (#a: Type) (x: a)
: Lemma (ensures (reveal (hide x) == x))
[SMTPat (hide x)]
Importantly, reveal v
breaks the abstraction of v:erased a
returning just an a
, but doing so incurs a GTot
effect—so,
reveal
cannot be used in an arbitrary Tot
contexts.
Dually, hide v
can be used to erase v:a
, since a Tot
context cannot depend on the value of an erased a
.
The SMT patterns on the two lemmas allow F* and Z3 to automatically instantiate the lemmas to relate a value and its hidden counterpart—this chapter provides more details on how SMT patterns work.
Implicit coercions
FStar.Ghost.erased
, reveal
, and hide
are so commonly used
in F* that the compiler provides some special support for it. In
particular, when a term v:t
is used in a context that expects an
erased t
, F* implictly coerces v
to hide v
. Likewise, when
the context expects a t
where v:erased t
is provided, F*
implicitly coerces v
to reveal v
.
The following examples illustrates a few usages and limitations. You
can ask F* to print the code with implicits enabled by using
--dump_module RevealHideCoercions --print_implicits
.
module RevealHideCoercions
open FStar.Ghost
let auto_hide (x:nat) : erased nat = x
let auto_reveal (x:erased nat) : GTot nat = x
[@@expect_failure] //Expect GTot
let auto_reveal_2 (x:erased nat) : Tot nat = x
let incr (x:nat) : nat = x + 1
let incr_e (x:erased nat) : erased nat = incr x
let incr' (x:nat) : GTot nat = incr_e x
[@@expect_failure]
let poly (x:nat) (y:erased nat) = x == y
A few comments on these examples:
The first two functions illustrate how a
nat
is coerced implicitly toerased nat
. Note, the effect ofauto_reveal
isGTot
auto_reveal_2
fails, since the the annotation claims, incorrectly, that the effect label isTot
incr
is just anat -> nat
function.incr_e
is interesting because it callsincr
with anerased nat
and the annotation expects anerased nat
too. The body ofincr_e
is implicitly coerced tohide (incr (reveal x))
incr'
is interesting, since it callsincr_e
: its body is implicitly coerced toreveal (incr_e (hide x))
Finally,
poly
shows the limitations of implicit coercion: F* only inserts coercions when the expected type of the term in a context and the type of the term differ by anerased
constructor. Inpoly
, since==
is polymorphic, the expected type of the context is just an unresolved unification variable and, so, no coercion is inserted. Instead, F* complains thaty
has typeerased nat
when the typenat
was expected.
Using Ghost Computations in Total Contexts
We have already noted that Tot < GTot
, enabling Tot
computations to be re-used in GTot
contexts. For erasure to be
sound, it is crucial that GTot
terms cannot be used in Tot
contexts, and indeed, F* forbids this in general. However, there are
two exceptions.
Effect Promotion for Non-informative Types
Consider a term f
with type GTot s
, where s
is a
non-informative type. Since s
is non-informative, no total context
can extract any information from f
. As such, F* allows implicitly
promoting GTot s
to Tot s
, when s
is a non-informative
type.
For instance, the following is derivable,
hide (factorial 0) : Tot (erased nat)
: let’s work through it in detail.
We know that that
factorial n : GTot nat
Recall from the discussion on evaluation order and the application of functions to effectful arguments,
hide (factorial 0)
is equivalent tolet x = factorial 0 in hide x
, wherex:nat
andhide x : Tot (erased nat)
.From the rule for sequential composition of effectful terms, the type of
let x = factorial 0 in hide x
should beGTot (erased nat)
, sinceGTot = lub GTot Tot
.Since
erased nat
is a non-informative type,GTot (erased nat)
is promoted toTot (erased nat)
, which is then the type ofhide (factorial 0)
.
Effect promotion for ghost functions returning non-informative types
is very useful. It allows one to mix ghost computations with total
computations, so long as the result of the ghost sub-computation is
hidden with an erased type. For instance, in the code below, we use
hide (factorial (n - 1))
and use the result f_n_1
in an
assertion to or some other proof step, all within a function that is
in the Tot
effect.
let rec factorial_tail_alt (n:nat) (out:nat)
: Tot (r:nat { r == out * factorial n })
= if n = 0 then out
else (
let f_n_1 = hide (factorial (n - 1)) in
let result = factorial_tail_alt (n - 1) (n * out) in
assert (result == (n * out) * f_n_1);
result
)
FStar.Ghost.Pull.pull
Consider applying map factorial [0;1;2]
, where map : (a ->
Tot b) -> list a -> list b
. F* refuses to typecheck this written,
since factorial : nat -> GTot nat
, which is not a subtype of nat
-> Tot nat
.
Using effect promotion, one could write map (fun x -> hide
(factorial x)) [0;1;2]
, but this has type list (erased nat)
,
which is not the same as list nat
.
This is unfortunate since it seems that basic libraries designed for use with with higher-order total functions cannot be reused with ghost functions. [2]
The library function FStar.Ghost.Pull.pull
can help in such
situatinos. It has the following signature:
val pull (#a:Type) (#b:a -> Type) (f: (x:a -> GTot (b x)))
: GTot (x:a -> b x)
val pull_equiv (#a:Type) (#b:a -> Type) (f: (x:a -> GTot (b x))) (x:a)
: Lemma (ensures pull f x == f x)
[SMTPat (pull f x)]
This type states that for any ghost function f
, we can exhibit a
total function g
with the same type as f
, while the lemma
pull_equiv
states that g
is pointwise equal to f
. [3] However,
it may not be possible, in general, to compute g
in a way that
enables it to be compiled by F*. So, pull f
itself has ghost
effect, indicating that applications of pull
cannot be used in
compilable code.
Using pull
, one can write map (pull factorial) [0;1;2] : GTot
(list nat)
, thereby reusing ghost functions where total functions
are expected.
F* does not have any built-in support for effect-label
polymorphism. That is, one cannot write a type like this: map
#a #b #e (f: a -> e b) (l:list a) : e (list b)
, where e
is an effect variable. However, using F*’s dependent types, one
can code up various ad hoc forms of effect polymorphism. Some other languages
with effect systems, notably Koka, do support
effect polymorphism primitively.
The SMTPat
annotation on the pull_equiv
lemma instructs
F* to allow Z3 to instantiate this lemma automatically. You
can learn more about SMT patterns and quantifier
instantiation in a later chapter.
Revisiting Vector Concatenation
We now have all the ingredients to understand how the vector append example shown at the start of this chapter works. Here, below, is a version of the same code with all the implicit arguments and reveal/hide operations made explicit.
module VecErasedExplicit
open FStar.Ghost
noeq
type vec a : nat -> Type =
| Nil : vec a 0
| Cons : #n:erased nat -> hd:a -> tl:vec a (reveal n) -> vec a (reveal n + 1)
let rec append #a (#n #m:erased nat) (v0:vec a (reveal n)) (v1:vec a (reveal m))
: Tot (vec a (reveal n + reveal m))
(decreases (reveal n))
= match v0 with
| Nil -> v1
| Cons #_ #n_tl hd tl ->
Cons #a
#(hide (reveal n_tl + reveal m))
hd
(append #a
#n_tl
#m
tl
v1)
Definition of vec
In the definition of the inductive type vec a
, we have two
occurrences of reveal
. Consider vec a (reveal n)
, the type of
the tl
of the vector. reveal n
is a ghost computation of type
GTot nat
, so vec a (reveal n) : GTot Type
. But, since Type
is non-informative, GTot Type
is promoted to Tot Type
. The
promotion from GTot Type
to Tot Type
is pervasive in F* and
enables ghost computations to be freely used in types and other
specifications.
The vec a (reveal n + 1)
in the result type of Cons
is
similar. Here reveal n + 1
has type GTot nat
, but applying it
to vec a
produces a GTot Type
, which is promoted to Tot
Type
.
Type of append
The type of append
has four occurrences of reveal
. Three of
them, in the type of v0
, v1
, and the return type behave the
same as the typing the fields of Cons
: the GTot Type
is
promoted to Tot Type
.
One additional wrinkle is in the decreases clause, where we have an
explicit reveal n
, since what decreases on each recursive call is
the nat
that’s in bijection with the parameter n
, rather than
n
itself. When F* infers a decreases clause for a function, any
erased terms in the clause are automatically revealed.
Definition of append
The recursive call instantiates the index parameters to n_tl
and
m
, which are both erased.
When constructing the Cons
node, its index argument is
instantiated to hide (reveal n_tl + reveal m)
. The needless
addition is marked with a hide
enabling that F* compiler to erase
it. As we saw before in factorial_tail_alt
, using hide
allows
one to mingle ghost computations (like (reveal n - 1)
) with total
computations, as needed for specifications and proofs.
All of this is painfully explicit, but the implicit reveal/hide coercions inserted by F* go a long way towards make things relatively smooth.