Higher-order Abstract Syntax
In the previous chapter, we looked at a deep embedding of the simply typed lambda calculus (STLC). The encoding is “deep” in the sense that we used an inductive type to represent the syntax of the lambda calculus in F*, and then defined and proved some properties of its semantics represented mathematically in F*.
Another way to embed a language like the STLC in F* is a shallow embedding. F* is itself a functional programming language, and it has a type system that is certainly powerful enough to represent simply typed terms, so why not use lambda terms in F* itself to represent STLC, rather than merely encoding STLC’s abstract syntax in F*. This kind of encoding is called a shallow embedding, where we use semantic constructs in the host (or meta) language (F*) to represent analogous features of the embedded (or object) language (STLC, in our example).
In this chapter, we look at a particularly elegant technique for doing this called higher-order abstract syntax (or HOAS). For more background about this, a 2008 paper by Adam Chlipala is a good resource, though it develops a more sophisticated parametric version.
Our small case study in HOAS is meant to illustrate the use of inductive types with non-trivial indexes while also containing strictly positive functions as arguments, and also a bit of type-level computation.
Roadmap
The type typ
below represents the types we’ll use in our STLC
object language, i.e., the base types Bool
and Int
, and
function types Arrow t1 t2
.
type typ =
| Bool
| Int
| Arrow : typ -> typ -> typ
This is analogous to our representation of STLC types in the deep embedding of the previous chapter.
Where things get interesting is in the representation of STLC terms and their semantics. To set the goal posts, we want to
Give an interpretation of STLC types into F* types, by defining a function
denote_typ : typ -> Type
Define a type
term t
, to represent well-typed STLC terms whose type ist:typ
Give an intepretation of STLC terms in to F* terms of the suitable type, i.e., define a function
denote_term (#t:typ) (e:term t) : denote_typ t
, proving that every well-typed STLC term at typet
can be represented in F* as a function of typedenote_typ t
.
Such a result would encompass the type soundness results we proved in the previous chapter (proving that well-typed programs can always make progress), but would go substantially further to show that the reduction of all such terms always terminates producing F* values that model their semantics.
Denotation of types
Step 1 in our roadmap is to give an interpreration of STLC types
typ
into F* types. This is easily done, as shown below.
let rec denote_typ (t:typ)
: Type
= match t with
| Bool -> bool
| Int -> int
| Arrow t1 t2 -> (denote_typ t1 -> denote_typ t2)
We have here a recursive function that computes a Type from its argument. This is may seem odd at first, but it’s perfectly legal in a dependently typed language like F*.
The function denote_typ
The interpretation of Bool
and Int
are the F* type bool
and int
, while the interpretation of
Arrow
is an F* arrow. Note, the function terminates because the
two recursive calls are on strict sub-terms of the argument.
Term representation
The main difficulty in representing a language like STLC (or any language with lambda-like variable-binding structure), is the question of how to represent variables and their binders.
In the deep embedding of the previous chapter, our answer to this questions was very syntactic—variables are de Bruijn indexes, where, at each occurrence, the index used for a variable counts the number of lambdas to traverse to reach the binder for that variable.
The HOAS approach to answering these questions is very different. The idea is to use the binding constructs and variables already available in the host language (i.e., lambda terms in F*) to represent binders and variables in the object language (STLC).
The main type in our representation of terms is the term
defined
below. There are several clever subtleties here, which we’ll try to
explain briefly.
noeq
type term : typ -> Type =
| Var : #t:typ -> denote_typ t -> term t
| TT : term Bool
| FF : term Bool
| I : int -> term Int
| App : #t1:typ -> #t2:typ ->
term (Arrow t1 t2) ->
term t1 ->
term t2
| Lam : #t1:typ -> #t2:typ ->
(denote_typ t1 -> term t2) ->
term (Arrow t1 t2)
First, the term
type represents both the abstract syntax of the
STLC as well as its typing rules. We’ll see in detail how this works,
but notice already that term
is is indexed by a t:typ
, which
describes the type of encoded STLC term. The indexing structure
encodes the typing rules of STLC, ensuring that only well-typed STLC
terms can be constructed.
The second interesting part is the use of denote_typ
within the
syntax—variables and binders at a given STLC type t
will be
represented by F* variables and binders of the corresponding F* type
denote_typ t
.
Var
: Variables are represented asVar #t n : term t
, wheren
is a term of typedenote_typ t
.
TT
andFF
: The two boolean constansts are represented by these constructors, both of typeterm Bool
, where the index indicates that they have typeBool
.
I
: STLC integers are represented by tagged F* integers.
App
: To applye1
toe2
in a well-typed way, we must prove thate1
has an arrow typeTArrow t1 t2
, whilee2
has typet1
, and the resulting termApp e1 e2
has typet2
. Notice how the indexing structure of theApp
constructor precisely captures this typing rule.
Lam
: Finally, and crucially, we represent STLC lambda terms using F* functions, i.e.,Lam f
has typeArrow t1 t2
, whenf
is represented by an F* function from arguments of typedenote_typ t1
, to terms of typeterm t
. TheLam
case includes a function-typed field, but the type of that field,denote_typ t1 -> term t2
is strictly positive—unlike the thedyn
type, shown earlier.
Denotation of terms
Finally, we come to Step 3 (below), where we give an interpretation to
term t
as an F* term of type denote_typ t
. The trickiest part
of such a interpretation is to handle functions and variables, but
this part is already done in the representation, since these are
already represented by the appropriate F* terms.
let rec denote_term (#t:typ) (e:term t)
: Tot (denote_typ t)
(decreases e)
= match e with
| Var x -> x
| TT -> true
| FF -> false
| I i -> i
| App f a -> (denote_term f) (denote_term a)
| Lam f -> fun x -> denote_term (f x)
Let’s look at each of the cases:
The
Var
case is easy, since the variablex
is already interpreted into an element of the appropriate F* type.The constants
TT
,FF
, andI
are easy too, since we can just interpret them as the suitable boolean or integer constants.For the
App #t1 #t2 f a
case, we recursively interpretf
anda
. The type indices tell us thatf
must be interpreted into an F* function of typedenote_typ t1 -> denote_typ t2
and thatdenote_term a
has typedenote_typ t1
. So, we can simply apply the denotation off
to the denotation ofa
to get a term of typedenote_typ t2
. In other words, function application in STLC is represented semantically by function application in F*.Finally, in the
Lam #t1 #t2 f
case, we need to produce a term of typedenote_typ t1 -> denote_typ t2
. So, we build an F* lambda term (where the argumentx
has typedenote_typ t1
), and in the body we applyf x
and recursively calldenote_term
onf x
.
If that felt a little bit magical, it’s because it almost is! We’ve defined the syntax, typing rules, and an interpreter that doubles as a denotational semantics for the STLC, and proved everything sound in around 30 lines of code and proof. By picking the right representation, everything just follows very smoothly.
Termination
You may be wondering why F* accepts that denote_term e
terminates. There are three recursive calls to consider
The two calls in the
App
case are easy: The recursive calls are on strict sub-terms ofApp f a
.In the
Lam f
case, we have one recursive calldenote_term (f x)
, and F* accepts thatf x
is strictly smaller thanLam f
. This is an instance of the sub-term ordering on inductive types explained earlier, as part of F*’sprecedes
relation, explained earlier.
For a bit of intuition, one way to understand the type term
is by
thinking of it as a tree of finite depth, but possibly infinite width:
The leaves of the tree are the
Var
,TT
, andFF
cases.The internal node
App e0 e1
composes two sub-trees,e0
ande1
.The internal node
Lam #t1 #t2 f
composes a variable number of sub-trees, where the number of sub-trees depends on the parametert1
. For example:
If
t1 = Unit
, thenf : unit -> term v t
, i.e., there is only one child node,f()
.If
t1 = Bool
, thenf : bool -> term v t
, i.e., there are two children,f true
andf false
.if
t1 = Int
, thenf : int -> term v t
, i.e., there are infinitely many children,..., f -1, f 0, f 1, ...
.
With this intuition, informally, it is safe to recursively call
denote_term e
on any of the children of e
, since the depth of
the tree will decrease on each recursive call. Hence the call
denote_term (f x)
terminates.
We’ll revisit termination arguments for recursive functions more formally in a subsequent chapter on well-founded recursion.
Exercises
Giving a semantics to STLC is just the tip of the iceberg. There’s a lot more one can do with HOAS and Chlipala’s paper gives lots of examples and sample code in Coq.
For several more advanced exercises, based on the definitions shown below, try reconstructing other examples from Chlipala’s paper, including a proof of correctness of a compiler implementing a continuation-passing style (CPS) transformation of STLC.
(* This example is transcribed from Adam Chlipala's
Parametric Higher-order Abstract Syntax (ICFP 2008) paper.
http://adam.chlipala.net/papers/PhoasICFP08/PhoasICFP08.pdf
*)
module Part2.PHOAS
type typ =
| Bool
| Arrow : typ -> typ -> typ
noeq
type term0 (v: typ -> Type) : typ -> Type =
| Var : #t:typ -> v t -> term0 v t
| TT : term0 v Bool
| FF : term0 v Bool
| App : #t1:typ -> #t2:typ ->
term0 v (Arrow t1 t2) ->
term0 v t1 ->
term0 v t2
| Lam : #t1:typ -> #t2:typ ->
(v t1 -> term0 v t2) ->
term0 v (Arrow t1 t2)
let term (t:typ) = v:(typ -> Type) -> term0 v t
let rec denote_typ (t:typ)
: Type
= match t with
| Bool -> bool
| Arrow t1 t2 -> (denote_typ t1 -> denote_typ t2)
let rec denote_term0 (#t:typ) (e:term0 denote_typ t)
: Tot (denote_typ t)
(decreases e)
= match e with
| Var x -> x
| TT -> true
| FF -> false
| App f a -> (denote_term0 f) (denote_term0 a)
| Lam f -> fun x -> denote_term0 (f x)
let denote_term (t:typ) (e:term t)
: denote_typ t
= denote_term0 (e _)