Stateful Services

It’s one thing to have agents program & prove textbook algorithms. These have clear specifications, there are many variants of algorithms for the same problem, so there is ample opportunity for abstraction and sharing, and each algorithm is relatively small and self-contained (though their invariants can be quite subtle).

In contrast, many real-world applications are stateful services, long-running processes that often maintain some internal state (either in memory or persisted, say, to a database), and where multiple services may interact with each other over a network.

How is one to build such services in a proof-oriented way, and how can agents help? In the next few chapters, we explore one way to do this, using state transition systems as an abstract, mathemtical model for such services and to use a technique called state-machine refinement to build verified implementations of such services.

To get agents to do this, we will develop several rubrics, templates, and audits, eventually showing how to apply our methodology to a verified, agentically generated implementation of a client and server for the TLS-1.3 protocol—nearly 150,000 lines of F* and Pulse code, with proofs of correctness, all generated by agents

We will start small, developing a very simple calculator service, which still has many of the properties we want for such services in general—it receives requests from clients, parses those requests, maintains some internal state, processes requests, and responds by serializing a message back to the client.

By starting small, we aim to develop well-understood rubrics and templates which we can then use to develop more complex services. While it might seem like a leap too far to go from a calculator service to a TLS-1.3 client and server (and, indeed, for most human proof engineers, such a leap would be daunting), our experience is that agents can handle such leaps quite well, provided they are given the right rubrics, templates, and audits.

A Calculator Service

Consider a simple calculator service whose internal state is a stack of integers, and which processes messages of the form shown below:

type request =
| Push of int
| Peek
| Add
| Sub
| Mul
| Div

and produces responses of the form:

type response =
| Ok
| Result of int
| Error

where Ok indicates that the request was processed successfully; Result n is the response to a Peek request, and Error indicates that the request could not be processed successfully (e.g., an Add request when the stack is empty, a division by zero, or when the stack exceeds a certain size).

A pure specification of the behavior of this service is given by the following transition relation:

type calc_stack = list int

let max_stack_size = 10

(** State transition function **)
let step (s:calc_stack) (req:request) : calc_stack & response =
    match req with
    | Push x ->
        if L.length s >= max_stack_size
        then (s, Error)  // Stack overflow - return error
        else (x :: s, Ok)

    | Peek ->
        begin match s with
        | [] -> (s, Error)  // Stack underflow
        | x :: _ -> (s, Result x)
        end

    | Add ->
        begin match s with
        | x :: y :: rest -> (((x + y) % 4294967296) :: rest, Ok)  // Modular add (32-bit)
        | _ -> (s, Error)  // Stack underflow
        end

    | Sub ->
        begin match s with
        | x :: y :: rest -> (((y - x) % 4294967296) :: rest, Ok)  // Modular sub
        | _ -> (s, Error)  // Stack underflow
        end

    | Mul ->
        begin match s with
        | x :: y :: rest -> (((x * y) % 4294967296) :: rest, Ok)  // Modular mul
        | _ -> (s, Error)  // Stack underflow
        end

    | Div ->
        begin match s with
        | x :: y :: rest ->
            if x = 0
            then (s, Error)  // Division by zero
            else ((y / x) :: rest, Ok)  // Integer div doesn't wrap
        | _ -> (s, Error)  // Stack underflow
        end

Now, hopefully, this specification is simple enough to understand and audit by a human (we’ll revisit the question of auditing shortly, especially as state machine specifications become more complex).

We also need a way to parse requests and serialize responses, so that the service can interact with clients over a network. For example, we might define the following serialization and parsing functions, defining a wire format for our service:

let parse_request (b:bytes{Seq.length b == 5}) : option request =
    let tag = U8.v (Seq.index b 0) in
    let data_bytes = Seq.slice b 1 5 in
    let data = be_to_n data_bytes in
    match tag with
    | 0 -> Some (Push data)
    | 1 -> Some Peek
    | 2 -> Some Add
    | 3 -> Some Sub
    | 4 -> Some Mul
    | 5 -> Some Div
    | _ -> None

let serialize_response (r:response) : b:bytes{Seq.length b == 5} =
    let tag, data = match r with
        | Ok -> 0, 0
        | Result x -> 1, x
        | Error -> 2, 0
    in
    let tag_byte = U8.uint_to_t tag in
    let data_bytes = n_to_be data in
    Seq.cons tag_byte data_bytes

Now, in principle, we have all we need to define a stateful service: at least in this case, these purely functional specifications are executable, and one could imagine simply standing up a functional service by, say, extracting this code to OCaml, reading bytes of the network, parsing it, transitioning the state, and serializing the response, and looping.

While this would work for a simple calculator service, for more complex protocols, this is not always what one wants. For example, often, one would prefer to have a more efficient implementation in a low-level language like C or Rust, without needing to use a garbage collector. Or it might be that the state transition is not actually executable and may be specified a relation, rather than a function, leaving some freedom in how the transition is implemented.

What we’d like instead is to implement the service in a proof-oriented language like Pulse, with low-level control over memory and other resources, and to prove that a Pulse implementation of our service refines the abstract state machine specification.

A Pulse Implementation of the Calculator Service

You can ask an agent to implement the calculator service in Pulse, proving that it refines the abstract state machine specification, and it will produce a solution. Here is the signature of one solution it produced:

fn process_request
    (srv: server_state)
    (req_buf: Vec.vec U8.t)
    (resp_buf: Vec.vec U8.t)
    (#log0: erased calc_log)
requires
    server_exactly srv log0 **
    Vec.pts_to req_buf 'req_bytes **
    Vec.pts_to resp_buf 'resp_bytes **
    pure (
        Seq.length 'req_bytes == 5 /\
        Seq.length 'resp_bytes == 5 /\
        parse_request 'req_bytes <> None
    )
ensures exists* (resp_bytes1: bytes) (log1: calc_log).
    server_exactly srv log1 **
    Vec.pts_to req_buf 'req_bytes **
    Vec.pts_to resp_buf resp_bytes1 **
    pure (
        Seq.length 'req_bytes == 5 /\
        parse_request 'req_bytes <> None /\
        log_single_step log0 log1 /\
        log1.input_bytes `Seq.equal` Seq.append log0.input_bytes 'req_bytes /\
        log1.output_bytes `Seq.equal` Seq.append log0.output_bytes resp_bytes1 /\
        CalcP.calc_frame_network_step_ok log0 'req_bytes log1 resp_bytes1
    )

Now, is this is a correct implementation of the calculator service? It does compile to C and correctly runs a suite of network tests, but how can one be sure that it is correct? Well, one has to read the specificxtion carefully, and check that the preconditions and postconditions are what one expects, and that the implementation is correct with respect to the specification.

For an arbitrary, agent-generated specification, this is a non-trivial task. Even for this calculator service, one has to read the preconditions and postconditions carefully, and check that they all make sense. Can we do better? Can we have a more structured way to ensure that the implementation is correct, without needing to read the entire specification and implementation? We can by setting a better, generic rubric.

State Machine Refinement

We would like a way specify that a Pulse implementation of a service refines an abstract state machine. There are many ways to do this, but we want to settle on a generic, reusable structure that can be reviewed once by a human, and then used as a rubric for agents to implement other services in a similar way.

It will take us a few steps to get there, so a little roadmap:

  1. We will define a class for an abstract state machine, with typed inputs, outputs, and a transition relation, e.g., request, response and step as above.

  2. Then we will define a notion of a wire format, with parsers and serializers for the inputs and outputs, and compose it with our abstract state machine, to get a notion of a wire-level state machine specification, which is what we actually want to implement.

  3. Finally, we will define a typeclass for a Pulse implementation of an imperative event handler, with a type signature that ensures that the event handler processes events in accordance with the state machine specification.

Given instances of these typeclasses, we can be assured that the Pulse implementation refines the abstract state machine, and further analysis of the system for correctness properties can be done at the level of the abstract state machine, ignoring the efficient low-level implementation.

Note, this process of arriving at the state machine refinement class was itself done using a collaborative process between an agent and a human, emphasized our focus on structured interaction between humans and agents.

State Machine Class

For generality, our state machine class will process two kinds of input events: “wire events” which are received over a network, and “local events” which are generated by the service itself. For our calculator service, there are no local events, but we include them for generality.

Similarly, the state machine will produce two kinds of outputs: “wire outputs” which are sent over a network, and “local outputs” which are consumed by the service itself.

type event (wire_message:Type0) (local_event:Type0) =
| WireEvent: wire_message -> event wire_message local_event
| LocalEvent: local_event -> event wire_message local_event

type step_output (wire_message:Type0) (local_output:Type0) =
{
    so_wire_outputs: list wire_message;
    so_local_outputs: list local_output;
}

A state machine is described by the following class, which is parametric in the type of the state, the wire messages, the local events, and the local outputs.

The class defines the initial state and the transition relation step, which given a current state and an event relates it to a new state and a step output.

class state_machine (state wire_message local_event local_output:Type0)  =
{
    sm_initial_state:
        state;

    sm_step:
        state ->
        event wire_message local_event ->
        state ->
        step_output wire_message local_output ->
        prop;
}

The state machine induces a natural preorder among states, where one state is reachable from another if there is a sequence of transitions from the first state to the second, as defined below by state_evolves.

type transition (state wire_message local_event local_output:Type0) =
{
    tr_event: event wire_message local_event;
    tr_next_state: state;
    tr_output: step_output wire_message local_output;
}

let rec trace_reaches
    (sm:state_machine state wire_message local_event local_output)
    (st0:state)
    (trace:list (transition state wire_message local_event local_output))
    (st1:state)
: prop = (* applying trace steps from st0 reaches st1 *)

let state_evolves
    (#state #wire_message #local_event #local_output:Type0)
    (sm:state_machine state wire_message local_event local_output)
    (st0:state) (st1:state)
: prop = exists trace. trace_reaches sm st0 trace st1

Wire-Level State Machines

The wire_format class below defines a wire format for a type wire_message with parser and serializer specifiction (ghost) functions, and a proof that parsing is an inverse of serialization.

class wire_format (wire_message:Type0) =
{
    wf_serialize:
        wire_message -> GTot bytes;

    wf_parse:
        bytes -> GTot (option (wire_message & bytes));

    wf_parse_serialize_exact:
        msg:wire_message ->
        Lemma (wf_parse (wf_serialize msg) == Some (msg, Seq.empty))
}

A wire-level state machine is defined by composing a state machine with a wire format:

class wire_format_state_machine (state wire_message local_event local_output:Type0)  =
{
    wfsm_state_machine:
        SM.state_machine state wire_message local_event local_output;

    wfsm_wire_format:
        WF.wire_format wire_message;
}

And using this class, one can define a notion of a wire-level state machine trace.

let valid_byte_trace (#state #wire_message #local_event #local_output:Type0)
        (system:wire_format_state_machine state wire_message local_event local_output)
        (input_bytes:bytes)
        (st1:state)
        (output_bytes:bytes)
        (residual_input:bytes)
: prop =
exists trace.
    SM.trace_reaches
        system.wfsm_state_machine
        system.wfsm_state_machine.SM.sm_initial_state
        trace
        st1 /\
    WF.parses_as
        system.wfsm_wire_format
        input_bytes
        (trace_input_messages trace)
        residual_input /\
    Seq.equal
        output_bytes
        (WF.serialize_all
            system.wfsm_wire_format
            (SM.trace_wire_outputs trace))

This valid_byte_trace predicate is the main predicate we’ll use to connect the Pulse implementation to a state machine specification.

State Machine Implementation Refinement

Our final step is a typeclass to relate a Pulse implementation of an event handler to a wire-level state machine. It’s quite a bit to take in, so we’ll explain it in pieces.

First, we start with a class parameterized by an implementation type impl and the types of the state, wire messages, local events, and local outputs.

class protocol_implementation
    (impl:Type0)
    (state:Type0)
    (wire_message:Type0)
    (local_event:Type0)
    (local_output:Type0)

The class includes a field, pi_system which defines the wire-level state machine specification that the implementation is supposed to refine.

pi_system: WFSM.wire_format_state_machine state wire_message local_event local_output);

Next, comes a key part of the class, which is a separation logic predicate pi_invariant that ties the implementation to the state machine specification. Intuitively, pi_invariant impl received sent st is mean to signify that the implementation impl is in a state that corresponds to the state machine state st obtained after processing received and emitting sent.

This intuition is made precise by the next field, pi_invariant_valid, which states that one can derive valid_byte_trace from pi_invariant.

pi_invariant_valid:
  ghost fn (i:impl) (received sent:bytes) (st:state)
  requires pi_invariant i received sent st
  ensures pi_invariant i received sent st
  ensures pure (WFSM.valid_byte_trace (pi_system i) received st sent empty);

We can now define the type of event handler for a wire-level event. This is the type of a function that expects as arguments and precondition:

  • an implementation impl and its invariant pi_invariant i received sent st

  • input and output buffers, satifying buffers_wf, relating the lengths of the buffers to the pointers

  • some additional data frame:pi_network_frame that an implementation can choose for additional processing, satisfying a precondition pi_network_frame_pre.

And returns a result of type process_result, satisfying a postconditions:

  • the invariant pi_invariant i received1 sent1 st1 for some new state and received and sent bytes history

  • a postcondition pi_network_frame_post describing how the frame:pi_network_frame has evolved

  • an unchanged input buffer

  • an output buffer updated to contains out_contents

  • and a pure relation network_process_correct that ties it all the together, which we will look closely at next.

pi_process_network:
    fn (i:impl)
       (input:array U8.t) (input_len:SZ.t)
       (out:array U8.t) (out_len:SZ.t)
    requires pi_invariant i received sent st
    requires input |-> input_contents
    requires out |-> out0
    requires buffers_wf input_contents input_len old_out out_len
    returns result:process_result
    ensures exists* received1 sent1 st1 out1 consumed wire_outputs local_outputs
        pi_invariant i received1 sent1 st1 **
        input |-> input_contents **
        out |-> out1 **
        pure (network_process_correct i .. local_outputs)

Here is the main correctness property, network_process_correct, which ties together the implementation, the state machine specification, and the input and output buffers. The process_result contains a status, the number of bytes consumed from the input buffer and the number of bytes produced to the output buffer.

type process_status =
| StepOk
| NeedMoreInput
| ParseFailed
| DecodeError
| IllegalTransition
| OutputBufferTooSmall
| ConnectionFailed

noeq
type process_result = {
    process_status: process_status;
    process_consumed_len: SZ.t;
    process_produced_len: SZ.t;
}

For network_process_correct, we’ll look mainly at the success case, when the status is StepOk. In this case, the implementation has consumed some bytes from the input buffer, parsing it into msg leaving some residual bytes in the input; the state mcahine has stepped from st0 to st1 on the event msg, producing some wire and local outputs, which have been serialized to the output buffer; and the new indexes on the pi_invariant predicate exactly record the extended inputs and outputs, and the new state of the state machine.

let network_process_correct
    (#state:Type0)
    (#wire_message:Type0)
    (#local_event:Type0)
    (#local_output:Type0)
    (system:WFSM.wire_format_state_machine state wire_message local_event local_output)
    (input:TCP.bytes)
    (input_len:SZ.t)
    (old_out out_bytes:TCP.bytes)
    (out_len:SZ.t)
    (received0 sent0:TCP.bytes)
    (st0:state)
    (result:process_result)
    (received1 sent1:TCP.bytes)
    (st1:state)
    (consumed:TCP.bytes)
    (wire_outputs:list wire_message)
    (local_outputs:list local_output)
: prop =
match result.process_status with
| StepOk ->
    exists msg residual produced.
        consumed_by_parse
            system.WFSM.wfsm_wire_format
            (input_bytes input input_len)
            msg
            consumed
            residual /\
        SZ.v result.process_consumed_len == Seq.length consumed /\
        system.WFSM.wfsm_state_machine.SM.sm_step
            st0
            (SM.WireEvent msg)
            st1
            (step_output wire_outputs local_outputs) /\
        Seq.equal
            produced
            (WF.serialize_all system.WFSM.wfsm_wire_format wire_outputs) /\
        output_written out_bytes result.process_produced_len produced /\
        Seq.equal received1 (Seq.append received0 consumed) /\
        Seq.equal sent1 (Seq.append sent0 produced)
| NeedMoreInput
| ParseFailed
| OutputBufferTooSmall
| DecodeError
| IllegalTransition
| ConnectionFailed -> ...

A similar specification applies for processing local events, which we’ll omit here.

Monotonicity

There is one more technicality to account for: monotonicity. We know that pi_invariant i received sent st implies WFSM.valid_byte_trace (pi_system i) received st sent empty, meaning that the implementation is always in a state that corresponds to running the state machine from the initial state on the bytes received so far. But, nothing tells us that the implementation could not also suddenly rewrite history and claim that it had received some different bytes and produced some different outputs. We need a way to ensure that once the implementation has processed some inputs, it commits to the history of the inputs and outputs so far, and cannot change it.

Enforcing this sort of property is easily done in Pulse and other modern separation logics in a variety of ways. We saw an simple instance of monotonicity in the previous chapter where we used a monotonic ghost reference to ensure that a counter could only increase. But, in this case, we want a more abstract view of monotonicity, allowing an implementation to internalize the history of inputs and outputs in any way it wants, so long as it maintains the invariant that the history of inputs and outputs is monotonic.

For this, we add three more fields to our typeclass:

First, pi_snapshot i received sent st is a separation logic predicate, intuitively stating that the implementation was once in a given state st after having consumed received bytes and produced sent bytes.

pi_snapshot: impl -> bytes -> bytes -> state -> slprop

At any point, one can use pi_take_snapshot to record the current state of the implementation, and commit to a history.

pi_take_snapshot:
  ghost fn (i:impl) (received sent:bytes) (st:state)
  requires pi_invariant i received sent st
  ensures pi_invariant i received sent st
  ensures pi_snapshot i received sent st

And, having taken a snapshot of the history, one can use pi_recall_snapshot to prove that the current state is consistent with the snapshot, and ensure that an implementation cannot rewrite history.

pi_recall_snapshot:
  ghost fn (i:impl) (snap_recv snap_sent snap_st recv sent st:_)
  requires pi_snapshot i snap_recv snap_sent snap_st
  requires pi_invariant i recv sent st
  ensures  pi_snapshot i snap_recv snap_sent snap_st
  requires pi_invariant i recv sent st
  ensures pure (state_ahead (pi_system i) snap_recv snap_sent snap_st recv sent st)

Refined Calculator Service

Setting up this rubric for a state machine refinement may seem like a lot of work, but it is a one-time cost, and once it is done, it can be reused for many different services, especially since state machine refinement is such a common pattern.

Providing this as a rubric to an agent for our calculator service works well. GPT-5.5 with Copilot CLI and FStarLang/proof-copilot produces an instances of the protocol_implementation class for our calculator service without any human intervention. Here is the instantiation:

  let calc_server_protocol_implementation
  : CPI.protocol_implementation
      canonical_server calc_log CalcP.calc_frame CalcP.calc_frame_local_event unit
  =
{
  CPI.pi_system = (fun _ -> CalcP.calc_frame_wire_format_state_machine); ... }

The only thing we need to review is that the pi_system field is indeed the correct state machine for the calculator service. The rest of the class ensures there there is a Pulse implementation that refines it. Turns out that implementation and proof is about 4,500 lines of F* and Pulse code, but we do not need to inspect any of it. But, we do not need to inspect any of it, since the typeclass instance guarantees that the implementation is a refinement of the following state machine.

We do need to inspect the state machine specification, though, and here it is:

let calc_frame_step
    (log0:calc_log)
    (ev:SM.event calc_frame calc_frame_local_event)
    (log1:calc_log)
    (output:SM.step_output calc_frame unit)
: prop =
match ev with
| SM.WireEvent msg ->
    let req = calc_frame_request msg in
    let (next_stack, resp) = step log0.current_state req in (* This is the pure state machine step *)
    let resp_msg = calc_response_frame resp in
    log1.current_state == next_stack /\
    log1.requests == log0.requests @ [req] /\
    log1.responses == log0.responses @ [resp] /\
    Seq.equal log1.input_bytes (Seq.append log0.input_bytes msg) /\
    Seq.equal log1.output_bytes (Seq.append log0.output_bytes resp_msg) /\
    output.SM.so_wire_outputs == [resp_msg] /\
    output.SM.so_local_outputs == []
| SM.LocalEvent CalcFrameLocalNoop ->
    log1 == log0 /\
    output.SM.so_wire_outputs == [] /\
    output.SM.so_local_outputs == []

let calc_frame_state_machine
: SM.state_machine calc_log calc_frame calc_frame_local_event unit =
{
    SM.sm_initial_state = initial_log;
    SM.sm_step = calc_frame_step;
}

Auditing the Calculator Service

Of course, even knowing that it refines the pure state machine, one should also apply other auditing steps to the calculator service, including basic testing.

We don’t show it here, but we also define verified code to turn a protocol event handler in a full implementation, tied to a TCP channel, and with a top-level event handling loop that repeatedly calls the event handler and sends the results back to the client.

The resulting verified code extracts to about 300 lines of C code and correctly runs a suite of network tests.