Inductive type definitions

An inductive type definition, sometimes called a datatype, has the following general structure.

\[\begin{split}\mathsf{type}~T_1~\overline{(x_1:p_1)} : \overline{y_1:q_1} \rightarrow \mathsf{Type} = \overline{| D_1 : t_1} \\ \ldots\qquad\qquad\qquad\qquad\\ \mathsf{and}~T_n~\overline{(x_n:p_n)} : \overline{y_n:q_n} \rightarrow \mathsf{Type} = \overline{| D_n : t_n} \\\end{split}\]

This defines \(n\) mutually inductive types, named \(T_1 \ldots T_n\), called the type constructors. Each type constructor \(T_i\) has a number of parameters, the \(\overline{x_i : p_i}\), and a number of indexes, the \(\overline{y_i:q_i}\).

Each type constructor \(T_i\) has zero or more data constructors \(\overline{D_i:t_i}\). For each data constructor \(D_{ij}\), its type \(t_{ij}\) must be of the form \(\overline{z:s} \rightarrow T_i~\bar{x_i}~\bar{e}\), i.e., it must be a function type returning an instance of \(T_i\) with the same parameters \(\overline{x_i}\) as in the type constructor’s signature, but with any other well-typed terms \(\overline{e}\) for the index arguments. This is the main difference between a parameter and an index—a parameter of a type constructor cannot vary in the result type of the data constructors, while the indexes can.

Further, in each of the arguments \(\overline{z:s}\) of the data constructor, none of the mutually defined type constructors \(\overline{T}\) may appear to the left of an arrow. That is, all occurrences of the type constructors must be strictly positive. This is to ensure that the inductive definitions are well-founded, as explained below. Without this restriction, it is easy to break soundness by writing non-terminating functions with Tot types.

Also related to ensuring logical consistency is the universe level of an inductive type definition. We’ll return to that later, once we’ve done a few examples.

Strictly positive definitions

As a strawman, consider embedding a small dynamically typed programming language within F*. All terms in our language have the same static type dyn, although at runtime values could have type Bool, or Int, or Function.

One attempt at representing a language like this using a data type in F* is as follows:

noeq
type dyn =
  | Bool : bool -> dyn
  | Int  : int -> dyn
  | Function : (dyn -> dyn) -> dyn

The three cases of the data type represent our three kinds of runtime values: Bool b, Int b, and Function f. The Function case, however, is problematic: The argument f is itself a function from dyn -> dyn, and the constructor Function allows promoting a dyn -> dyn function into the type dyn itself, e.g., one can represent the identity function in dyn as Function (fun (x:dyn) -> x). However, the Function case is problematic: as we will see below, it allows circular definitions that enable constructing instances of dyn without actually providing any base case. F* rejects the definition of dyn, saying “Inductive type dyn does not satisfy the strict positivity condition”.

Consider again the general shape of an inductive type definition:

\[\begin{split}\mathsf{type}~T_1~\overline{(x_1:p_1)} : \overline{y_1:q_1} \rightarrow \mathsf{Type} = \overline{| D_1 : t_1} \\ \ldots\qquad\qquad\qquad\qquad\\ \mathsf{and}~T_n~\overline{(x_n:p_n)} : \overline{y_n:q_n} \rightarrow \mathsf{Type} = \overline{| D_n : t_n} \\\end{split}\]

This definition is strictly positive when

for every type constructor \(T \in T_1, ..., T_n\),

and every data constructor \(D : t \in \overline{D_1}, ... \overline{D_n}\), where t is of the form \(x0:s_0 \rightarrow ... \rightarrow xn:s_n \rightarrow T_i ...\), and \(s_0, ..., s_n\) are the types of the fields of \(D\)

and for all instantiations \(\overline{v}\) of the type parameters \(\overline{p}\) of the type \(T\),

\(T\) does not appear to the left of any arrow in any \(s \in (s_0, ..., s_k)[\overline{v}/\overline{p}]\).

Our type dyn violates this condition, since the defined typed dyn appears to the left of an arrow type in the dyn -> dyn-typed field of the Function constructor.

To see what goes wrong if F* were to accept this definition, we can suppress the error reported by using the option __no_positivity and see what happens.

#push-options "--__no_positivity"
noeq
type dyn =
  | Bool : bool -> dyn
  | Int  : int -> dyn
  | Function : (dyn -> dyn) -> dyn
#pop-options

Note

F* maintains an internal stack of command line options. The #push-options pragma pushes additional options at the top of the stack, while #pop-options pops the stack. The pattern used here instructs F* to typecheck dyn only with the __no_positivity option enabled. As we will see, the __no_positivity option can be used to break soundness, so use it only if you really know what you’re doing.

Now, having declared that dyn is a well-formed inductive type, despite not being strictly positive, we can break the soundness of F*. In particular, we can write terms and claim they are total, when in fact their execution will loop forever.

let loop' (f:dyn)
  : dyn
  = match f with
    | Function g -> g f
    | _ -> f

let loop
  : dyn
  = loop' (Function loop')

Here, the type of loop claims that it is a term that always evaluates in a finite number of steps to a value of type dyn. Yet, reducing it produces an infinite chain of calls to loop' (Function loop'). Admitting a non-positive definition like dyn has allowed us to build a non-terminating loop.

Such loops can also allow one to prove False (breaking soundness), as the next example shows.

#push-options "--__no_positivity"
noeq
type non_positive =
  | NP : (non_positive -> False) -> non_positive
#pop-options

let almost_false (f:non_positive)
  : False
  = let NP g = f in g f

let ff
  : False
  = almost_false (NP almost_false)

This example is very similar to dyn, except NP stores a non-positive function that returns False, which allows use to prove ff : False, i.e., in this example, not only does the violation of strict positivity lead to an infinite loop at runtime, it also renders the entire proof system of F* useless, since one can prove False.

Finally, in the example below, although the type also_non_pos does not syntactically appear to the left of an arrow in a field of the ANP constructor, an instantiation of the type parameter f (e.g., with the type f_false) does make it appear to the left of an arrow—so this type too is deemed not strictly positive, and can be used to prove False.

#push-options "--__no_positivity"
noeq
type also_non_pos (f:Type -> Type) =
  | ANP : f (also_non_pos f) -> also_non_pos f
#pop-options

let f_false
  : Type -> Type
  = fun a -> (a -> False)

let almost_false_again
  : f_false (also_non_pos f_false)
  = fun x -> let ANP h = x in h x

let ff_again
  : False
  = almost_false_again (ANP almost_false_again)

We hope you are convinced that non-strictly positive types should not be admissible in inductive type definitions. In what follows, we will no longer use the __no_positivity option. In a later section, once we’ve introduced the effect of divergence, we will see that non-positive definitions can safely be used in a context where programs are not expected to terminate, allowing one to safely model things like the dyn type, without compromising the soundness of F*.

Strictly Positive Annotations

Sometimes it is useful to parameterize an inductive definition with a type function, without introducing a non-positive definition as we did in also_non_pos above.

For example, the definition below introduces a type free f a, a form of a tree whose leaf nodes contain a values, and whose internal nodes branch according the type function f.

noeq
type free (f:([@@@ strictly_positive] _:Type -> Type))
          (a:Type) 
  : Type =
  | Leaf : a -> free f a
  | Branch : f (free f a) -> free f a

We can instantiate this generic free to produce various kinds of trees. Note: when instantiating free list a in variable_branching_list below, we need to explicitly re-define the list type with a strict-positivity annotation: F* does not correctly support rechecking type constructors to prove that they are strictly positive when they are used at higher order.

let binary_tree (a:Type) = free (fun t -> t & t) a
let list_redef ([@@@strictly_positive] a:Type) = list a
let variable_branching_list a = free list_redef a
let infinite_branching_tree a = free (fun t -> nat -> t) a

However, we should only be allowed to instantate f with type functions that are strictly positive in their argument, since otherwise we can build a proof of False, as we did with also_non_pos. The @@@strictly_positive attribute on the formal parameter of f enforces this.

If we were to try to instantiate free with a non-strictly positive type function,

let free_bad = free (fun t -> (t -> False)) int

then F* raises an error:

Binder (t: Type) is marked strictly positive, but its use in the definition is not

Unused Annotations

Sometimes one indexes a type by another type, though the index has no semantic meaning. For example, in several F* developments that model mutable state, the a heap reference is just a natural number modeling its address in the heap. However, one might use the type let ref (a:Type) = nat to represent the type of a reference, even though the type a is not used in the definition. In such cases, it can be useful to mark the parameter as unused, to inform F*’s positivity checker that the type index is actually irrelevant. The snippet below shows an example:

irreducible
let ref ([@@@unused] a:Type) = nat
noeq
type linked_list (a:Type) =
  | LL : ref (a & linked_list a) -> linked_list a
noeq
type neg_unused =
  | NU : ref (neg_unused -> bool) -> neg_unused

Here, we’ve marked the parameter of ref with the unused attribute. We’ve also marked ref as irreducible just to ensure for this example that F* does not silently unfold the definition of ref.

Now, knowing that the parameter of ref is unused, one can define types like linked_list a, where although linked_list a appears as an argument to the ref type, the positivity checker accepts it, since the parameter is unused. This is similar to the use of a strictly_positive annotation on a parameter.

However, with the unused attribute, one can go further: e.g., the type neg_unused shows that even a negative occurrence of the defined type is accepted, so long as it appears only as an instantiation of an unused parameter.