[]BeSuspiciousOfSuccess (Review of "Modeling Adversaries with TLA+")

Hillel Wayne recently published a blog post titled "Modeling Adversaries with TLA+". The post does a good job showing how adversaries and environmental effects can be modeled in TLA⁺. The post is also subtly incorrect when it comes to two of the liveness properties. In other words, "Modeling Adversaries with TLA+" not only shows how to model adversaries but also comes with a spec to discuss the intricacies of temporal logic. :-)

Before we get started, let me make clear that the spec indeed satisfies all of the properties described in prose. The error is that some of the specified properties do not match their description but are instead trivially true. This is why Lamport - in e.g. the TLA⁺ video lecture #5 - cautions us to "Always be suspicious of success". Reasoning about liveness is delicate business and easy to get wrong.

To quickly recap, the spec models a system that converges to a set of goal states (x \in Goal) and an environment World that perturbs the system by knocking it out of the goal states. Below is a slightly simplified excerpt of the original spec:

VARIABLES x

TotalInterval == 0..10
Goal == 2..4

TypeInvariant == x \in TotalInterval

Init == x \in TotalInterval

Machine == IF x < 3 THEN x' = x + 1 ELSE x' = x - 1

World == x' \in TotalInterval

Next == Machine \/ World

Spec == Init /\ [][Next]_x /\ WF_x(Machine)

Several invariants and liveness properties are discussed that are satisfied by Spec. Two properties however are incorrect; FiniteWorldStable and RareWorldResilient.

FiniteWorldStable

The comment for the FiniteWorldStable property says:

[...] If the world is only kicking things out of alignment a finite number of times, say one million, then it should still be stable. My argument is that after the millionth kick, we're now somewhere in TotalInterval and the spec is equivalent to one without the World. We can represent “finite World actions” by saying “it's not always eventually the case that World happens”, which we'd write in TLA+ as ~[]<><<World>>_x. Here the <<>> means “an action that changes x”, not sequence.

While Stable still doesn't hold, FiniteWorldStable does.

This is translated into the following TLA⁺ property:

Safe == x \in Goal
Stable == <>[]Safe

FiniteWorldStable == ~[]<><<World>>_x => Stable

The reasoning in the comment is correct. Checking FiniteWorldStable with TLC confirms that the property holds. Being suspicious of success though, let us alter FiniteWorldStable to ~[]<><<World>>_x => ~<>[](x \in Goal); we negated the implication's consequent Stable. Unfortunately, this also holds! Even substituting the consequent with 23=42 /\ 23#42 holds. The reason is that Spec => FiniteWorldStable is vacuously true. We will discuss why it is vacuously true later. First we will shift our attention to RareWorldResilient:

RareWorldResilient

[...] to get resilience we don't need to require World to only happen finite times. Instead, we only need to guarantee it happens finite times while we're out of equilibrium. If eventually the world only kicks x out of Goal when it's already in Goal, then we're giving our machine enough time to return x to Goal and we have resilience.

Another way of looking at it: if World happens rarely enough, say one-tenth as often as Machine, then we'll return to Goal before the next World action pushes us out again.

This property holds.

This is translated into RareWorldResilient:

Resilient == []<>Safe
RareWorldResilient == <>[][World => Safe]_x => Resilient

As with FiniteWorldStable above, the prose is convincing and TLC once again confirms that RareWorldResilient holds. But as we have now learned, implication can be tricky. Let us try and replace the consequent of the inner implication: <>[][World => 23 # 42]_x => Resilient. Checking the modified property with TLC should make us suspicious.

What is the problem?

At this point we will zoom out and discuss the underlying problem. For that we will first need Lamport's (see "The Temporal Logic of Actions" (section 4.1)) definition of what an action-step is. Let s,t be a pair of states (a state s and its successor state t such as x=8,x=7). The pair s,t is a World step iff s[[World]]t equals true. We obtain s[[World]]t by replacing each unprimed variable x in World with the value of x in state s and each primed occurrence of the variable x with the value of x in the successor state t. For example, the pair x=8,x=7 is a World step because 7 \in TotalInterval equals true. However, the same argument applies to s[[Machine]]t because 7 = 8 - 1. Thus, the pair x=8,x=7 is both a World and a Machine step. The pair x=8,x=7 is not the only pair which is a World and Machine step. Due to the definitions of the World and Machine actions, the spec implies that any Machine step is also a World step (but not vice versa)! The only behaviors satisfying the spec and the property ~[]<><<World>>_x are ones with a (finite) prefix of World steps followed by infinite stuttering. The spec's fairness conjunct WF_x(Machine) however, asserts that there are infinitely many (non-stuttering) Machine steps. Thus ~[]<><<World>>_x never holds and Spec => FiniteWorldStable is vacuously true.

To approach this from a different angle, we will substitute for the identifier World its definition x' \in TotalInterval in FiniteWorldStable to obtain ~[]<><<x' \in TotalInterval>>_x => Stable. This immediately looks funny. But can we verify it too? We can use TLC to verify that [][x' \in TotalInterval]_x (always) holds for all behaviors of spec (the type invariant TypeInvariant confirms it too) and therefore ~[]<><<x' \in TotalInterval>>_x has to be always false. To repeat, all steps of all behaviors satisfying Spec are always World steps regardless of whether a step "occurred" by taking a Machine or World action.

The property RareWorldResilient suffers from the same flaw. By replacing World with its definition we obtain <>[](x' \in TotalInterval => Safe]_x => []<>Safe. By what we discussed in the previous paragraph, RareWorldResilient thus equals <>[][TRUE => Safe]_x => []<>Safe which can be reduced to <>[][Safe]_x => []<>Safe. The formula <>[]P => []<>P is a tautology in temporal logic. Semantically, if Safe is eventually always true - in any behavior of Spec - then it follows that Safe is also repeatedly true.

How can we address this problem? We somehow have to make sure that Spec no longer implies that Machine steps are also World steps.

How do we fix this?

One approach would be to modify the definition of World to x' \in (TotalInterval \ Goal). This causes Machine steps in the Goal set to not be World steps. For example x=2,x=3 would be a Machine but not a World step. We can verify this by checking that [][World]_x is violated.

Unfortunately, this approach has drawbacks: The original definition of World allows for behaviors (satisfying the spec) where the variable x non-deterministically changes in the Goal interval. Especially it allows behaviors with e.g. a World step such as x=3,x=2(all Machine steps are World steps but not vice versa). Semantically this models an adversary/environment that perturbs our Machine but that we define to "play by the rules" in certain states. This is upside down because we want to make as few assumptions about the adversary as possible. On a tangent, we could even argue that x' \in TotalInterval should be broadened to x' \in Nat to model all (type-correct) values and - for model checking - be redefined with a Definition Override to a finite set of e.g. 0..6.

There is a more subtly problem with changing the World action though. Let us use TLC to generate all steps of Spec that are not World steps. We again check the property [][World]_x but in order for TLC to print all violations (which correspond to the steps we are interested in), we use the -continue parameter[^1]. TLC reports the following five violations:

x=2,x=3
x=3,x=2
x=4,x=3
x=1,x=2
x=5,x=4

As we can see, this is only a subset of the possible Machine steps of the specification. In other words, by modifying the World action to x' \in (TotalWorld \ Goal), the FiniteWorldStable property only asserts that Spec is guaranteed to reach a Goal state if it already is in a Goal state (1,2,3) or "on the brink" of it (4,5). The property still does not assert that - given no World steps from some point forward - Spec will eventually converge to stability when starting from any state in TotalInterval. If we for example are in a state x=6, the property FiniteWorldStable does not assert that the Machine will converge (we just happen to know it will because this is a toy example).

To make this more explicit, we will temporarily introduce a "bug" in our specification by changing the Machine action to not converge to a stable state if the value of the variable x is greater than 6 (the upper bound x < 10 prevents the model from being infinite).

ChangeX ==
   /\ IF x < 3 THEN x' = x + 1 ELSE 
                               IF x > 6 /\ x < 10 THEN x' = x + 1 
                               ELSE x' = x - 1

The property FiniteWorldStable still holds. Our "programming bug" above will go unnoticed!

One might also try and remove Machine from the next-state relation to show that FiniteWorldState does not reveal violations even with the TotalInteval \ Goal modification. After all, if no Machine steps are possible and World no longer gets us into a stable state, FiniteWorldState should be violated. This however does not work for reasons beyond what this post can cover (Lamport calls this a "weird spec").

Another fix?

In "Auxiliary Variables in TLA+" (section 3.4) Lamport and Merz discuss how history variables serve as a "helper" to state properties that we want to show are satisfied by a spec. While being primarily used in the scope of refinement mappings, "misusing" an auxiliary (history) variable will help us distinguish World from Machine steps. For Spec we simply amend the invariant TypeInvariant, the state predicate Init and the two World and Machine actions:

VARIABLES x,h

TypeInvariant ==
  /\ x \in TotalInterval
  /\ h \in {"i","m","w"}

Machine ==
  /\ IF x < 3 THEN x' = x + 1 ELSE x' = x - 1
  /\ h' = "m"

Init ==
  /\ x \in TotalInterval
  /\ h = "i"

World == 
  /\ x' \in TotalInterval
  /\ h' = "w"

With this modification e.g. one behavior allowed by Spec will be <x=7,h="i">, <x=6,h="m">, <x=0,h="w">, <x=1,h="m">, <x=0,h="w">, ... where .... represents infinite more steps. The first step of the behavior has to be an Init step because <x=7,h="i">[[Init]]<x=6,h="m"> equals true and false for World and Machine. Likewise the pair <x=0,h="m">, <x=1,h="m"> is a Machine and not a World step and so on. With this change in place FiniteWorldStable and RareWorldResilient finally assert what is stated in their comments.

FiniteWorldStable == ~[]<><<(x' \in TotalInterval /\ h' = "w")>>_x => Stable

RareWorldResilient == <>[][(x' \in TotalInterval /\ h' = "w")) => Safe]_x => Resilient

Having learned from past mistakes, we now experiment with the properties by e.g. negating the antecedent and/or the consequent to gain some confidence in their correctness.

In general a history variables comes with an increase in the size of the state space. For the given spec the introduction of the history variable h almost triples the size from 11 to 31 (distinct) states. While this is not a problem here, it can be for real-world specs. Since we are checking liveness properties, we are barred from using the symmetry reduction technique; liveness checking and symmetry reduction are - for now - incompatible (TLC and the Toolbox will emit a warning if we try this)[^2].

One might also argue that the introduction of a history variable has a second undesired consequence: How can we possibly expect an adversary to be so nice and properly "mark" his steps? Instead of assuming the adversary to mark steps, we can instead consider the history variable to represent message authentication codes i.e. Machine "messages" include an (unforgeable) signature (modeled above as "m"). We will stop here and ignore this problem because I wanted to constrain this post to leave the original properties untouched. The goal was to make the properties assert what the comments says. We might revisit modeling adversaries in a future post though to discuss alternative ways to state the properties.

The primary purpose of this post is to remind us of two things: First, we should always be suspicious of success! "Model checking [...] is most informative when it returns a counter-example, not when it says that the property holds, which may be vacuous". Second, decomposing a next-state relation into (sub) actions makes a spec more comprehensible for a human. It however does not give us a higher-level concept to reason about. Or as Lamport puts it: "Processes are in the eye of the beholder".

We end this post with a little puzzler: What hack does the linked BlockingQueue spec below (gist with better syntax highlighting) "pull" to make the NoPStarvation property reveal that the spec does not guarantee starvation-freedom for producers? Why doesn't the same hack work for consumers, i.e. NoCStarvation is not violated? Let me know what you find.

Thanks to Leslie Lamport and Ron Pressler for reviews of this post.

([^1]: The -continue parameter is not yet supported by the TLA Toolbox.)

([^2]: What does work for this spec is to use a view function <<x>> to retain the original size of the state space even with a history variable. This is however due to the particular shape of the state space and does not generalize. A view is great to shoot yourself in the foot!)

--------------------------- MODULE BlockingQueueBlog ---------------------------
EXTENDS Naturals, Sequences, FiniteSets, TLC

C == {"a", "b"}   \* Two consumers and producers...
P == {"c", "d"}   \* ...which are usually better model as sets of model values.
K == 1            \* Fix queue size to 1 element

\* We assume the following properties even though the spec doesn't use CONSTANTS.
ASSUME /\ IsFiniteSet(C) /\ IsFiniteSet(P) \* Finite sets
       /\ C \intersect P = {}              \* A producer is no consumer and vice versa
ASSUME K \in (Nat \ {0})                   \* K is a natural number without zero

------------------------------------------------------------------

\* Nondeterministically notify (wake) a process from the given waitset.
Notify(w) == IF w # {}
             THEN \E i \in w: w' = w \ {i}
             ELSE UNCHANGED << w >>

------------------------------------------------------------------

VARIABLES store, \* The (internal) storage for the queue
          waitP, \* A waitset for producers
          waitC  \* A waitset for consumers
vars == << store, waitP, waitC >>

\* A type correctness invariant asserting that...
TypeOK == /\ waitP \subseteq P \* only producers are in waitP
          /\ waitC \subseteq C \* only consumers are in waitC
          /\ DOMAIN store \subseteq 0..K \* store is a sequence for which the 
                                         \* elements of the codomain are undefined.

(***************************************************************************)
(* Initially consumers and processes are alive and the queue is empty.     *)
(***************************************************************************)
Init == /\ store = <<>>
        /\ waitP = {}
        /\ waitC = {}

(***************************************************************************)
(* A producer either waits if the queue is full or appends an element.     *)
(***************************************************************************)
p1(self) == /\ Len(store) = K
            /\ waitP' = (waitP \cup {self})
            /\ UNCHANGED << store, waitC >>

p2(self) == /\ Len(store) # K
            /\ Notify(waitC)
            /\ store' = Append(store, self)
            /\ UNCHANGED << waitP >>

(***************************************************************************)
(* A consumer either waits if the queue is empty (c1) or picks one element *)
(* (c2). c2 is the critical section from the perspective of the consumer.  *)
(***************************************************************************)
c1(self) == /\ Len(store) = 0
            /\ waitC' = (waitC \cup {self})
            /\ UNCHANGED << store, waitP >>

c2(self) == /\ Len(store) # 0
            /\ Notify(waitP)
            /\ store' = Tail(store)
            /\ UNCHANGED << waitC >>

(***************************************************************************)
(* Let the scheduler non-deterministically schedule consumer and           *)
(* producers processes.                                                    *)
(*                                                                         *)
(* Correct Next to not schedule waiting processes. Thanks to Hillel for    *)
(* making me aware of this issue.                                          *)
(***************************************************************************)
Next == \/ (\E self \in (P \ waitP): p1(self) \/ p2(self))
        \/ (\E self \in (C \ waitC): c1(self) \/ c2(self))

Spec == Init /\ [][Next]_vars /\ WF_vars(Next)
THEOREM Spec => []TypeOK

------------------------------------------------------------------

\* Not all consumers and producers wait at once.
NoDeadLock == (waitP \cup waitC) # (C \cup P)
THEOREM Spec => []NoDeadLock

\* All consumers eventually dequeue elements.
NoCStarvation == \A c \in C: []<>(<<c2(c)>>_<<store>>)

\* All producers eventually enqueue elements.
NoPStarvation == \A p \in P: []<>(<<p2(p)>>_<<store>>)
=============================================================================

[]BeSuspiciousOfSuccess (Review of "Modeling Adversaries with TLA+")

FiniteWorldStable

RareWorldResilient

What is the problem?

How do we fix this?

Another fix?

Published

Category

Contact