Stateful Services
It’s one thing to have agents program & prove textbook algorithms. These have clear specifications, there are many variants of algorithms for the same problem, so there is ample opportunity for abstraction and sharing, and each algorithm is relatively small and self-contained (though their invariants can be quite subtle).
In contrast, many real-world applications are stateful services, long-running processes that often maintain some internal state (either in memory or persisted, say, to a database), and where multiple services may interact with each other over a network.
How is one to build such services in a proof-oriented way, and how can agents help? In the next few chapters, we explore one way to do this, using state transition systems as an abstract, mathemtical model for such services and to use a technique called state-machine refinement to build verified implementations of such services.
To get agents to do this, we will develop several rubrics, templates, and audits, eventually showing how to apply our methodology to a verified, agentically generated implementation of a client and server for the TLS-1.3 protocol—nearly 150,000 lines of F* and Pulse code, with proofs of correctness, all generated by agents
We will start small, developing a very simple calculator service, which still has many of the properties we want for such services in general—it receives requests from clients, parses those requests, maintains some internal state, processes requests, and responds by serializing a message back to the client.
By starting small, we aim to develop well-understood rubrics and templates which we can then use to develop more complex services. While it might seem like a leap too far to go from a calculator service to a TLS-1.3 client and server (and, indeed, for most human proof engineers, such a leap would be daunting), our experience is that agents can handle such leaps quite well, provided they are given the right rubrics, templates, and audits.
A Calculator Service
Consider a simple calculator service whose internal state is a stack of integers, and which processes messages of the form shown below:
type request =
| Push of int
| Peek
| Add
| Sub
| Mul
| Div
and produces responses of the form:
type response =
| Ok
| Result of int
| Error
where Ok indicates that the request was processed successfully; Result n
is the response to a Peek request, and Error indicates that the request
could not be processed successfully (e.g., an Add request when the stack is
empty, a division by zero, or when the stack exceeds a certain size).
A pure specification of the behavior of this service is given by the following transition relation:
type calc_stack = list int
let max_stack_size = 10
(** State transition function **)
let step (s:calc_stack) (req:request) : calc_stack & response =
match req with
| Push x ->
if L.length s >= max_stack_size
then (s, Error) // Stack overflow - return error
else (x :: s, Ok)
| Peek ->
begin match s with
| [] -> (s, Error) // Stack underflow
| x :: _ -> (s, Result x)
end
| Add ->
begin match s with
| x :: y :: rest -> (((x + y) % 4294967296) :: rest, Ok) // Modular add (32-bit)
| _ -> (s, Error) // Stack underflow
end
| Sub ->
begin match s with
| x :: y :: rest -> (((y - x) % 4294967296) :: rest, Ok) // Modular sub
| _ -> (s, Error) // Stack underflow
end
| Mul ->
begin match s with
| x :: y :: rest -> (((x * y) % 4294967296) :: rest, Ok) // Modular mul
| _ -> (s, Error) // Stack underflow
end
| Div ->
begin match s with
| x :: y :: rest ->
if x = 0
then (s, Error) // Division by zero
else ((y / x) :: rest, Ok) // Integer div doesn't wrap
| _ -> (s, Error) // Stack underflow
end
Now, hopefully, this specification is simple enough to understand and audit by a human (we’ll revisit the question of auditing shortly, especially as state machine specifications become more complex).
We also need a way to parse requests and serialize responses, so that the service can interact with clients over a network. For example, we might define the following serialization and parsing functions, defining a wire format for our service:
let parse_request (b:bytes{Seq.length b == 5}) : option request =
let tag = U8.v (Seq.index b 0) in
let data_bytes = Seq.slice b 1 5 in
let data = be_to_n data_bytes in
match tag with
| 0 -> Some (Push data)
| 1 -> Some Peek
| 2 -> Some Add
| 3 -> Some Sub
| 4 -> Some Mul
| 5 -> Some Div
| _ -> None
let serialize_response (r:response) : b:bytes{Seq.length b == 5} =
let tag, data = match r with
| Ok -> 0, 0
| Result x -> 1, x
| Error -> 2, 0
in
let tag_byte = U8.uint_to_t tag in
let data_bytes = n_to_be data in
Seq.cons tag_byte data_bytes
Now, in principle, we have all we need to define a stateful service: at least in this case, these purely functional specifications are executable, and one could imagine simply standing up a functional service by, say, extracting this code to OCaml, reading bytes of the network, parsing it, transitioning the state, and serializing the response, and looping.
While this would work for a simple calculator service, for more complex protocols, this is not always what one wants. For example, often, one would prefer to have a more efficient implementation in a low-level language like C or Rust, without needing to use a garbage collector. Or it might be that the state transition is not actually executable and may be specified a relation, rather than a function, leaving some freedom in how the transition is implemented.
What we’d like instead is to implement the service in a proof-oriented language like Pulse, with low-level control over memory and other resources, and to prove that a Pulse implementation of our service refines the abstract state machine specification.
A Pulse Implementation of the Calculator Service
You can ask an agent to implement the calculator service in Pulse, proving that it refines the abstract state machine specification, and it will produce a solution. Here is the signature of one solution it produced:
fn process_request
(srv: server_state)
(req_buf: Vec.vec U8.t)
(resp_buf: Vec.vec U8.t)
(#log0: erased calc_log)
requires
server_exactly srv log0 **
Vec.pts_to req_buf 'req_bytes **
Vec.pts_to resp_buf 'resp_bytes **
pure (
Seq.length 'req_bytes == 5 /\
Seq.length 'resp_bytes == 5 /\
parse_request 'req_bytes <> None
)
ensures exists* (resp_bytes1: bytes) (log1: calc_log).
server_exactly srv log1 **
Vec.pts_to req_buf 'req_bytes **
Vec.pts_to resp_buf resp_bytes1 **
pure (
Seq.length 'req_bytes == 5 /\
parse_request 'req_bytes <> None /\
log_single_step log0 log1 /\
log1.input_bytes `Seq.equal` Seq.append log0.input_bytes 'req_bytes /\
log1.output_bytes `Seq.equal` Seq.append log0.output_bytes resp_bytes1 /\
CalcP.calc_frame_network_step_ok log0 'req_bytes log1 resp_bytes1
)
Now, is this is a correct implementation of the calculator service? It does compile to C and correctly runs a suite of network tests, but how can one be sure that it is correct? Well, one has to read the specificxtion carefully, and check that the preconditions and postconditions are what one expects, and that the implementation is correct with respect to the specification.
For an arbitrary, agent-generated specification, this is a non-trivial task. Even for this calculator service, one has to read the preconditions and postconditions carefully, and check that they all make sense. Can we do better? Can we have a more structured way to ensure that the implementation is correct, without needing to read the entire specification and implementation? We can by setting a better, generic rubric.
State Machine Refinement
We would like a way specify that a Pulse implementation of a service refines an abstract state machine. There are many ways to do this, but we want to settle on a generic, reusable structure that can be reviewed once by a human, and then used as a rubric for agents to implement other services in a similar way.
It will take us a few steps to get there, so a little roadmap:
We will define a class for an abstract state machine, with typed inputs, outputs, and a transition relation, e.g.,
request,responseandstepas above.Then we will define a notion of a wire format, with parsers and serializers for the inputs and outputs, and compose it with our abstract state machine, to get a notion of a wire-level state machine specification, which is what we actually want to implement.
Finally, we will define a typeclass for a Pulse implementation of an imperative event handler, with a type signature that ensures that the event handler processes events in accordance with the state machine specification.
Given instances of these typeclasses, we can be assured that the Pulse implementation refines the abstract state machine, and further analysis of the system for correctness properties can be done at the level of the abstract state machine, ignoring the efficient low-level implementation.
Note, this process of arriving at the state machine refinement class was itself done using a collaborative process between an agent and a human, emphasized our focus on structured interaction between humans and agents.
State Machine Class
For generality, our state machine class will process two kinds of input events: “wire events” which are received over a network, and “local events” which are generated by the service itself. For our calculator service, there are no local events, but we include them for generality.
Similarly, the state machine will produce two kinds of outputs: “wire outputs” which are sent over a network, and “local outputs” which are consumed by the service itself.
type event (wire_message:Type0) (local_event:Type0) =
| WireEvent: wire_message -> event wire_message local_event
| LocalEvent: local_event -> event wire_message local_event
type step_output (wire_message:Type0) (local_output:Type0) =
{
so_wire_outputs: list wire_message;
so_local_outputs: list local_output;
}
A state machine is described by the following class, which is parametric in the type of the state, the wire messages, the local events, and the local outputs.
The class defines the initial state and the transition relation step, which
given a current state and an event relates it to a new state and a step output.
class state_machine (state wire_message local_event local_output:Type0) =
{
sm_initial_state:
state;
sm_step:
state ->
event wire_message local_event ->
state ->
step_output wire_message local_output ->
prop;
}
The state machine induces a natural preorder among states, where one state is
reachable from another if there is a sequence of transitions from the first state to
the second, as defined below by state_evolves.
type transition (state wire_message local_event local_output:Type0) =
{
tr_event: event wire_message local_event;
tr_next_state: state;
tr_output: step_output wire_message local_output;
}
let rec trace_reaches
(sm:state_machine state wire_message local_event local_output)
(st0:state)
(trace:list (transition state wire_message local_event local_output))
(st1:state)
: prop = (* applying trace steps from st0 reaches st1 *)
let state_evolves
(#state #wire_message #local_event #local_output:Type0)
(sm:state_machine state wire_message local_event local_output)
(st0:state) (st1:state)
: prop = exists trace. trace_reaches sm st0 trace st1
Wire-Level State Machines
The wire_format class below defines a wire format for a type
wire_message with parser and serializer specifiction (ghost) functions, and
a proof that parsing is an inverse of serialization.
class wire_format (wire_message:Type0) =
{
wf_serialize:
wire_message -> GTot bytes;
wf_parse:
bytes -> GTot (option (wire_message & bytes));
wf_parse_serialize_exact:
msg:wire_message ->
Lemma (wf_parse (wf_serialize msg) == Some (msg, Seq.empty))
}
A wire-level state machine is defined by composing a state machine with a wire format:
class wire_format_state_machine (state wire_message local_event local_output:Type0) =
{
wfsm_state_machine:
SM.state_machine state wire_message local_event local_output;
wfsm_wire_format:
WF.wire_format wire_message;
}
And using this class, one can define a notion of a wire-level state machine trace.
let valid_byte_trace (#state #wire_message #local_event #local_output:Type0)
(system:wire_format_state_machine state wire_message local_event local_output)
(input_bytes:bytes)
(st1:state)
(output_bytes:bytes)
(residual_input:bytes)
: prop =
exists trace.
SM.trace_reaches
system.wfsm_state_machine
system.wfsm_state_machine.SM.sm_initial_state
trace
st1 /\
WF.parses_as
system.wfsm_wire_format
input_bytes
(trace_input_messages trace)
residual_input /\
Seq.equal
output_bytes
(WF.serialize_all
system.wfsm_wire_format
(SM.trace_wire_outputs trace))
This valid_byte_trace predicate is the main predicate we’ll use to connect
the Pulse implementation to a state machine specification.
State Machine Implementation Refinement
Our final step is a typeclass to relate a Pulse implementation of an event handler to a wire-level state machine. It’s quite a bit to take in, so we’ll explain it in pieces.
First, we start with a class parameterized by an implementation type impl
and the types of the state, wire messages, local events, and local outputs.
class protocol_implementation
(impl:Type0)
(state:Type0)
(wire_message:Type0)
(local_event:Type0)
(local_output:Type0)
The class includes a field, pi_system which defines the wire-level state
machine specification that the implementation is supposed to refine.
pi_system: WFSM.wire_format_state_machine state wire_message local_event local_output);
Next, comes a key part of the class, which is a separation logic predicate
pi_invariant that ties the implementation to the state machine
specification. Intuitively, pi_invariant impl received sent st is mean to
signify that the implementation impl is in a state that corresponds to the
state machine state st obtained after processing received and emitting
sent.
This intuition is made precise by the next field, pi_invariant_valid, which
states that one can derive valid_byte_trace from pi_invariant.
pi_invariant_valid:
ghost fn (i:impl) (received sent:bytes) (st:state)
requires pi_invariant i received sent st
ensures pi_invariant i received sent st
ensures pure (WFSM.valid_byte_trace (pi_system i) received st sent empty);
We can now define the type of event handler for a wire-level event. This is the type of a function that expects as arguments and precondition:
an implementation
impland its invariantpi_invariant i received sent stinput and output buffers, satifying
buffers_wf, relating the lengths of the buffers to the pointerssome additional data
frame:pi_network_framethat an implementation can choose for additional processing, satisfying a preconditionpi_network_frame_pre.
And returns a result of type process_result, satisfying a postconditions:
the invariant
pi_invariant i received1 sent1 st1for some new state and received and sent bytes historya postcondition
pi_network_frame_postdescribing how theframe:pi_network_framehas evolvedan unchanged input buffer
an output buffer updated to contains
out_contentsand a pure relation
network_process_correctthat ties it all the together, which we will look closely at next.
pi_process_network:
fn (i:impl)
(input:array U8.t) (input_len:SZ.t)
(out:array U8.t) (out_len:SZ.t)
requires pi_invariant i received sent st
requires input |-> input_contents
requires out |-> out0
requires buffers_wf input_contents input_len old_out out_len
returns result:process_result
ensures exists* received1 sent1 st1 out1 consumed wire_outputs local_outputs
pi_invariant i received1 sent1 st1 **
input |-> input_contents **
out |-> out1 **
pure (network_process_correct i .. local_outputs)
Here is the main correctness property, network_process_correct, which ties
together the implementation, the state machine specification, and the input and
output buffers. The process_result contains a status, the number of bytes
consumed from the input buffer and the number of bytes produced to the output
buffer.
type process_status =
| StepOk
| NeedMoreInput
| ParseFailed
| DecodeError
| IllegalTransition
| OutputBufferTooSmall
| ConnectionFailed
noeq
type process_result = {
process_status: process_status;
process_consumed_len: SZ.t;
process_produced_len: SZ.t;
}
For network_process_correct, we’ll look mainly at the success case, when the
status is StepOk. In this case, the implementation has consumed some bytes
from the input buffer, parsing it into msg leaving some residual bytes in
the input; the state mcahine has stepped from st0 to st1 on the event
msg, producing some wire and local outputs, which have been serialized to
the output buffer; and the new indexes on the pi_invariant predicate exactly
record the extended inputs and outputs, and the new state of the state machine.
let network_process_correct
(#state:Type0)
(#wire_message:Type0)
(#local_event:Type0)
(#local_output:Type0)
(system:WFSM.wire_format_state_machine state wire_message local_event local_output)
(input:TCP.bytes)
(input_len:SZ.t)
(old_out out_bytes:TCP.bytes)
(out_len:SZ.t)
(received0 sent0:TCP.bytes)
(st0:state)
(result:process_result)
(received1 sent1:TCP.bytes)
(st1:state)
(consumed:TCP.bytes)
(wire_outputs:list wire_message)
(local_outputs:list local_output)
: prop =
match result.process_status with
| StepOk ->
exists msg residual produced.
consumed_by_parse
system.WFSM.wfsm_wire_format
(input_bytes input input_len)
msg
consumed
residual /\
SZ.v result.process_consumed_len == Seq.length consumed /\
system.WFSM.wfsm_state_machine.SM.sm_step
st0
(SM.WireEvent msg)
st1
(step_output wire_outputs local_outputs) /\
Seq.equal
produced
(WF.serialize_all system.WFSM.wfsm_wire_format wire_outputs) /\
output_written out_bytes result.process_produced_len produced /\
Seq.equal received1 (Seq.append received0 consumed) /\
Seq.equal sent1 (Seq.append sent0 produced)
| NeedMoreInput
| ParseFailed
| OutputBufferTooSmall
| DecodeError
| IllegalTransition
| ConnectionFailed -> ...
A similar specification applies for processing local events, which we’ll omit here.
Monotonicity
There is one more technicality to account for: monotonicity. We know that
pi_invariant i received sent st implies WFSM.valid_byte_trace (pi_system
i) received st sent empty, meaning that the implementation is always in a
state that corresponds to running the state machine from the initial state on
the bytes received so far. But, nothing tells us that the implementation could
not also suddenly rewrite history and claim that it had received some different
bytes and produced some different outputs. We need a way to ensure that once the
implementation has processed some inputs, it commits to the history of the
inputs and outputs so far, and cannot change it.
Enforcing this sort of property is easily done in Pulse and other modern separation logics in a variety of ways. We saw an simple instance of monotonicity in the previous chapter where we used a monotonic ghost reference to ensure that a counter could only increase. But, in this case, we want a more abstract view of monotonicity, allowing an implementation to internalize the history of inputs and outputs in any way it wants, so long as it maintains the invariant that the history of inputs and outputs is monotonic.
For this, we add three more fields to our typeclass:
First, pi_snapshot i received sent st is a separation logic predicate,
intuitively stating that the implementation was once in a given state st
after having consumed received bytes and produced sent bytes.
pi_snapshot: impl -> bytes -> bytes -> state -> slprop
At any point, one can use pi_take_snapshot to record the current state of
the implementation, and commit to a history.
pi_take_snapshot:
ghost fn (i:impl) (received sent:bytes) (st:state)
requires pi_invariant i received sent st
ensures pi_invariant i received sent st
ensures pi_snapshot i received sent st
And, having taken a snapshot of the history, one can use pi_recall_snapshot
to prove that the current state is consistent with the snapshot, and ensure that
an implementation cannot rewrite history.
pi_recall_snapshot:
ghost fn (i:impl) (snap_recv snap_sent snap_st recv sent st:_)
requires pi_snapshot i snap_recv snap_sent snap_st
requires pi_invariant i recv sent st
ensures pi_snapshot i snap_recv snap_sent snap_st
requires pi_invariant i recv sent st
ensures pure (state_ahead (pi_system i) snap_recv snap_sent snap_st recv sent st)
Refined Calculator Service
Setting up this rubric for a state machine refinement may seem like a lot of work, but it is a one-time cost, and once it is done, it can be reused for many different services, especially since state machine refinement is such a common pattern.
Providing this as a rubric to an agent for our calculator service works well.
GPT-5.5 with Copilot CLI and FStarLang/proof-copilot produces an instances of
the protocol_implementation class for our calculator service without any
human intervention. Here is the instantiation:
let calc_server_protocol_implementation
: CPI.protocol_implementation
canonical_server calc_log CalcP.calc_frame CalcP.calc_frame_local_event unit
=
{
CPI.pi_system = (fun _ -> CalcP.calc_frame_wire_format_state_machine); ... }
The only thing we need to review is that the pi_system field is indeed the
correct state machine for the calculator service. The rest of the class ensures
there there is a Pulse implementation that refines it. Turns out that
implementation and proof is about 4,500 lines of F* and Pulse code, but we do
not need to inspect any of it. But, we do not need to inspect any of it, since
the typeclass instance guarantees that the implementation is a refinement of the
following state machine.
We do need to inspect the state machine specification, though, and here it is:
let calc_frame_step
(log0:calc_log)
(ev:SM.event calc_frame calc_frame_local_event)
(log1:calc_log)
(output:SM.step_output calc_frame unit)
: prop =
match ev with
| SM.WireEvent msg ->
let req = calc_frame_request msg in
let (next_stack, resp) = step log0.current_state req in (* This is the pure state machine step *)
let resp_msg = calc_response_frame resp in
log1.current_state == next_stack /\
log1.requests == log0.requests @ [req] /\
log1.responses == log0.responses @ [resp] /\
Seq.equal log1.input_bytes (Seq.append log0.input_bytes msg) /\
Seq.equal log1.output_bytes (Seq.append log0.output_bytes resp_msg) /\
output.SM.so_wire_outputs == [resp_msg] /\
output.SM.so_local_outputs == []
| SM.LocalEvent CalcFrameLocalNoop ->
log1 == log0 /\
output.SM.so_wire_outputs == [] /\
output.SM.so_local_outputs == []
let calc_frame_state_machine
: SM.state_machine calc_log calc_frame calc_frame_local_event unit =
{
SM.sm_initial_state = initial_log;
SM.sm_step = calc_frame_step;
}
Auditing the Calculator Service
Of course, even knowing that it refines the pure state machine, one should also apply other auditing steps to the calculator service, including basic testing.
We don’t show it here, but we also define verified code to turn a protocol event handler in a full implementation, tied to a TCP channel, and with a top-level event handling loop that repeatedly calls the event handler and sends the results back to the client.
The resulting verified code extracts to about 300 lines of C code and correctly runs a suite of network tests.
Here is the calculator service: Calc.Server.CanonicalProtocol.fst