Hillel Wayne recently published a blog post titled "Modeling Adversaries with TLA+". The post does a good job showing how adversaries and environmental effects can be modeled in TLA+. The post is also subtly incorrect when it comes to two of the liveness properties. In other words, "Modeling Adversaries with TLA+" not only shows how to model adversaries but also comes with a spec to discuss the intricacies of temporal logic. :-)
Before we get started, let me make clear that the spec indeed satisfies all of the properties described in prose. The error is that some of the specified properties do not match their description but are instead trivially true. This is why Lamport - in e.g. the TLA+ video lecture #5 - cautions us to "Always be suspicious of success". Reasoning about liveness is delicate business and easy to get wrong.
To quickly recap, the spec models a system that converges to a set of goal states (x \in Goal
) and an environment World
that perturbs the system by knocking it out of the goal states. Below is a slightly simplified excerpt of the original spec:
VARIABLES x
TotalInterval == 0..10
Goal == 2..4
TypeInvariant == x \in TotalInterval
Init == x \in TotalInterval
Machine == IF x < 3 THEN x' = x + 1 ELSE x' = x - 1
World == x' \in TotalInterval
Next == Machine \/ World
Spec == Init /\ [][Next]_x /\ WF_x(Machine)
Several invariants and liveness properties are discussed that are satisfied by Spec
. Two properties however are incorrect; FiniteWorldStable
and RareWorldResilient
.
FiniteWorldStable
The comment for the FiniteWorldStable
property says:
[...] If the world is only kicking things out of alignment a finite number of times, say one million, then it should still be stable. My argument is that after the millionth kick, we're now somewhere in TotalInterval and the spec is equivalent to one without the World. We can represent “finite World actions” by saying “it's not always eventually the case that World happens”, which we'd write in TLA+ as
~[]<><<World>>_x
. Here the <<>> means “an action that changes x”, not sequence.While Stable still doesn't hold, FiniteWorldStable does.
This is translated into the following TLA+ property:
Safe == x \in Goal
Stable == <>[]Safe
FiniteWorldStable == ~[]<><<World>>_x => Stable
The reasoning in the comment is correct. Checking FiniteWorldStable
with TLC confirms that the property holds. Being suspicious of success though, let us alter FiniteWorldStable
to ~[]<><<World>>_x => ~<>[](x \in Goal)
; we negated the implication's consequent Stable
. Unfortunately, this also holds! Even substituting the consequent with 23=42 /\ 23#42
holds. The reason is that Spec => FiniteWorldStable
is vacuously true. We will discuss why it is vacuously true later. First we will shift our attention to RareWorldResilient
:
RareWorldResilient
[...] to get resilience we don't need to require World to only happen finite times. Instead, we only need to guarantee it happens finite times while we're out of equilibrium. If eventually the world only kicks x out of Goal when it's already in Goal, then we're giving our machine enough time to return x to Goal and we have resilience.
Another way of looking at it: if World happens rarely enough, say one-tenth as often as Machine, then we'll return to Goal before the next World action pushes us out again.
This property holds.
This is translated into RareWorldResilient
:
Resilient == []<>Safe
RareWorldResilient == <>[][World => Safe]_x => Resilient
As with FiniteWorldStable
above, the prose is convincing and TLC once again confirms that RareWorldResilient
holds. But as we have now learned, implication can be tricky. Let us try and replace the consequent of the inner implication: <>[][World => 23 # 42]_x => Resilient
. Checking the modified property with TLC should make us suspicious.
What is the problem?
At this point we will zoom out and discuss the underlying problem. For that we will first need Lamport's (see "The Temporal Logic of Actions" (section 4.1)) definition of what an action-step is. Let s,t
be a pair of states (a state s
and its successor state t
such as x=8,x=7
). The pair s,t
is a World
step iff s[[World]]t
equals true. We obtain s[[World]]t
by replacing each unprimed variable x
in World
with the value of x
in state s
and each primed occurrence of the variable x
with the value of x
in the successor state t
. For example, the pair x=8,x=7
is a World
step because 7 \in TotalInterval
equals true. However, the same argument applies to s[[Machine]]t
because 7 = 8 - 1
. Thus, the pair x=8,x=7
is both a World
and a Machine
step. The pair x=8,x=7
is not the only pair which is a World
and Machine
step. Due to the definitions of the World
and Machine
actions, the spec implies that any Machine
step is also a World
step (but not vice versa)! The only behaviors satisfying the spec and the property ~[]<><<World>>_x
are ones with a (finite) prefix of World
steps followed by infinite stuttering. The spec's fairness conjunct WF_x(Machine)
however, asserts that there are infinitely many (non-stuttering) Machine
steps. Thus ~[]<><<World>>_x
never holds and Spec => FiniteWorldStable
is vacuously true.
To approach this from a different angle, we will substitute for the identifier World
its definition x' \in TotalInterval
in FiniteWorldStable
to obtain ~[]<><<x' \in TotalInterval>>_x => Stable
. This immediately looks funny. But can we verify it too? We can use TLC to verify that [][x' \in TotalInterval]_x
(always) holds for all behaviors of spec (the type invariant TypeInvariant
confirms it too) and therefore ~[]<><<x' \in TotalInterval>>_x
has to be always false. To repeat, all steps of all behaviors satisfying Spec
are always World
steps regardless of whether a step "occurred" by taking a Machine
or World
action.
The property RareWorldResilient
suffers from the same flaw. By replacing World
with its definition we obtain <>[](x' \in TotalInterval => Safe]_x => []<>Safe
. By what we discussed in the previous paragraph, RareWorldResilient
thus equals <>[][TRUE => Safe]_x => []<>Safe
which can be reduced to <>[][Safe]_x => []<>Safe
. The formula <>[]P => []<>P
is a tautology in temporal logic. Semantically, if Safe
is eventually always true - in any behavior of Spec
- then it follows that Safe
is also repeatedly true.
How can we address this problem? We somehow have to make sure that Spec
no longer implies that Machine
steps are also World
steps.
How do we fix this?
One approach would be to modify the definition of World
to x' \in (TotalInterval \ Goal)
. This causes Machine
steps in the Goal
set to not be World
steps. For example x=2,x=3
would be a Machine
but not a World
step. We can verify this by checking that [][World]_x
is violated.
Unfortunately, this approach has drawbacks: The original definition of World
allows for behaviors (satisfying the spec) where the variable x
non-deterministically changes in the Goal
interval. Especially it allows behaviors with e.g. a World
step such as x=3,x=2
(all Machine
steps are World
steps but not vice versa). Semantically this models an adversary/environment that perturbs our Machine
but that we define to "play by the rules" in certain states. This is upside down because we want to make as few assumptions about the adversary as possible. On a tangent, we could even argue that x' \in TotalInterval
should be broadened to x' \in Nat
to model all (type-correct) values and - for model checking - be redefined with a Definition Override to a finite set of e.g. 0..6
.
There is a more subtly problem with changing the World
action though. Let us use TLC to generate all steps of Spec
that are not World
steps. We again check the property [][World]_x
but in order for TLC to print all violations (which correspond to the steps we are interested in), we use the -continue parameter[^1]. TLC reports the following five violations:
- x=2,x=3
- x=3,x=2
- x=4,x=3
- x=1,x=2
- x=5,x=4
As we can see, this is only a subset of the possible Machine
steps of the specification. In other words, by modifying the World
action to x' \in (TotalWorld \ Goal)
, the FiniteWorldStable
property only asserts that Spec
is guaranteed to reach a Goal
state if it already is in a Goal
state (1,2,3) or "on the brink" of it (4,5). The property still does not assert that - given no World
steps from some point forward - Spec
will eventually converge to stability when starting from any state in TotalInterval
. If we for example are in a state x=6
, the property FiniteWorldStable
does not assert that the Machine
will converge (we just happen to know it will because this is a toy example).
To make this more explicit, we will temporarily introduce a "bug" in our specification by changing the Machine
action to not converge to a stable state if the value of the variable x
is greater than 6 (the upper bound x < 10
prevents the model from being infinite).
ChangeX ==
/\ IF x < 3 THEN x' = x + 1 ELSE
IF x > 6 /\ x < 10 THEN x' = x + 1
ELSE x' = x - 1
The property FiniteWorldStable
still holds. Our "programming bug" above will go unnoticed!
One might also try and remove Machine
from the next-state relation to show that FiniteWorldState
does not reveal violations even with the TotalInteval \ Goal
modification. After all, if no Machine
steps are possible and World
no longer gets us into a stable state, FiniteWorldState
should be violated. This however does not work for reasons beyond what this post can cover (Lamport calls this a "weird spec").
Another fix?
In "Auxiliary Variables in TLA+" (section 3.4) Lamport and Merz discuss how history variables serve as a "helper" to state properties that we want to show are satisfied by a spec. While being primarily used in the scope of refinement mappings, "misusing" an auxiliary (history) variable will help us distinguish World
from Machine
steps. For Spec
we simply amend the invariant TypeInvariant
, the state predicate Init
and the two World
and Machine
actions:
VARIABLES x,h
TypeInvariant ==
/\ x \in TotalInterval
/\ h \in {"i","m","w"}
Machine ==
/\ IF x < 3 THEN x' = x + 1 ELSE x' = x - 1
/\ h' = "m"
Init ==
/\ x \in TotalInterval
/\ h = "i"
World ==
/\ x' \in TotalInterval
/\ h' = "w"
With this modification e.g. one behavior allowed by Spec
will be <x=7,h="i">, <x=6,h="m">, <x=0,h="w">, <x=1,h="m">, <x=0,h="w">, ...
where ....
represents infinite more steps. The first step of the behavior has to be an Init
step because <x=7,h="i">[[Init]]<x=6,h="m">
equals true and false for World
and Machine
. Likewise the pair <x=0,h="m">, <x=1,h="m">
is a Machine
and not a World
step and so on. With this change in place FiniteWorldStable
and RareWorldResilient
finally assert what is stated in their comments.
FiniteWorldStable == ~[]<><<(x' \in TotalInterval /\ h' = "w")>>_x => Stable
RareWorldResilient == <>[][(x' \in TotalInterval /\ h' = "w")) => Safe]_x => Resilient
Having learned from past mistakes, we now experiment with the properties by e.g. negating the antecedent and/or the consequent to gain some confidence in their correctness.
In general a history variables comes with an increase in the size of the state space. For the given spec the introduction of the history variable h
almost triples the size from 11 to 31 (distinct) states. While this is not a problem here, it can be for real-world specs. Since we are checking liveness properties, we are barred from using the symmetry reduction technique; liveness checking and symmetry reduction are - for now - incompatible (TLC and the Toolbox will emit a warning if we try this)[^2].
One might also argue that the introduction of a history variable has a second undesired consequence: How can we possibly expect an adversary to be so nice and properly "mark" his steps? Instead of assuming the adversary to mark steps, we can instead consider the history variable to represent message authentication codes i.e. Machine
"messages" include an (unforgeable) signature (modeled above as "m"
). We will stop here and ignore this problem because I wanted to constrain this post to leave the original properties untouched. The goal was to make the properties assert what the comments says. We might revisit modeling adversaries in a future post though to discuss alternative ways to state the properties.
The primary purpose of this post is to remind us of two things: First, we should always be suspicious of success! "Model checking [...] is most informative when it returns a counter-example, not when it says that the property holds, which may be vacuous". Second, decomposing a next-state relation into (sub) actions makes a spec more comprehensible for a human. It however does not give us a higher-level concept to reason about. Or as Lamport puts it: "Processes are in the eye of the beholder".
We end this post with a little puzzler: What hack does the linked BlockingQueue spec below (gist with better syntax highlighting) "pull" to make the NoPStarvation
property reveal that the spec does not guarantee starvation-freedom for producers? Why doesn't the same hack work for consumers, i.e. NoCStarvation
is not violated? Let me know what you find.
Thanks to Leslie Lamport and Ron Pressler for reviews of this post.
([^1]: The -continue parameter is not yet supported by the TLA Toolbox.)
([^2]: What does work for this spec is to use a view function <<x>>
to retain the original size of the state space even with a history variable. This is however due to the particular shape of the state space and does not generalize. A view is great to shoot yourself in the foot!)
--------------------------- MODULE BlockingQueueBlog ---------------------------
EXTENDS Naturals, Sequences, FiniteSets, TLC
C == {"a", "b"} \* Two consumers and producers...
P == {"c", "d"} \* ...which are usually better model as sets of model values.
K == 1 \* Fix queue size to 1 element
\* We assume the following properties even though the spec doesn't use CONSTANTS.
ASSUME /\ IsFiniteSet(C) /\ IsFiniteSet(P) \* Finite sets
/\ C \intersect P = {} \* A producer is no consumer and vice versa
ASSUME K \in (Nat \ {0}) \* K is a natural number without zero
------------------------------------------------------------------
\* Nondeterministically notify (wake) a process from the given waitset.
Notify(w) == IF w # {}
THEN \E i \in w: w' = w \ {i}
ELSE UNCHANGED << w >>
------------------------------------------------------------------
VARIABLES store, \* The (internal) storage for the queue
waitP, \* A waitset for producers
waitC \* A waitset for consumers
vars == << store, waitP, waitC >>
\* A type correctness invariant asserting that...
TypeOK == /\ waitP \subseteq P \* only producers are in waitP
/\ waitC \subseteq C \* only consumers are in waitC
/\ DOMAIN store \subseteq 0..K \* store is a sequence for which the
\* elements of the codomain are undefined.
(***************************************************************************)
(* Initially consumers and processes are alive and the queue is empty. *)
(***************************************************************************)
Init == /\ store = <<>>
/\ waitP = {}
/\ waitC = {}
(***************************************************************************)
(* A producer either waits if the queue is full or appends an element. *)
(***************************************************************************)
p1(self) == /\ Len(store) = K
/\ waitP' = (waitP \cup {self})
/\ UNCHANGED << store, waitC >>
p2(self) == /\ Len(store) # K
/\ Notify(waitC)
/\ store' = Append(store, self)
/\ UNCHANGED << waitP >>
(***************************************************************************)
(* A consumer either waits if the queue is empty (c1) or picks one element *)
(* (c2). c2 is the critical section from the perspective of the consumer. *)
(***************************************************************************)
c1(self) == /\ Len(store) = 0
/\ waitC' = (waitC \cup {self})
/\ UNCHANGED << store, waitP >>
c2(self) == /\ Len(store) # 0
/\ Notify(waitP)
/\ store' = Tail(store)
/\ UNCHANGED << waitC >>
(***************************************************************************)
(* Let the scheduler non-deterministically schedule consumer and *)
(* producers processes. *)
(* *)
(* Correct Next to not schedule waiting processes. Thanks to Hillel for *)
(* making me aware of this issue. *)
(***************************************************************************)
Next == \/ (\E self \in (P \ waitP): p1(self) \/ p2(self))
\/ (\E self \in (C \ waitC): c1(self) \/ c2(self))
Spec == Init /\ [][Next]_vars /\ WF_vars(Next)
THEOREM Spec => []TypeOK
------------------------------------------------------------------
\* Not all consumers and producers wait at once.
NoDeadLock == (waitP \cup waitC) # (C \cup P)
THEOREM Spec => []NoDeadLock
\* All consumers eventually dequeue elements.
NoCStarvation == \A c \in C: []<>(<<c2(c)>>_<<store>>)
\* All producers eventually enqueue elements.
NoPStarvation == \A p \in P: []<>(<<p2(p)>>_<<store>>)
=============================================================================