STAR

TLA*


 

How do you interact with Dr. Leslie Lamport, 2014 Turing award recipient, when you are not a Computer Scientist? The answer is that you probably don’t. Dr. Lamport did reply to me yesterday encouraging me to “clarify my thoughts and submit an article to a proper scientific journal”. I wish I could, I have not written a refereed paper for more than 15 years and I don’t see that changing any time soon, most likely ever.

So I’ll just try to clarify my thought process and post it here.

A couple of weeks ago, a friend of mine told me that he was impressed by TLA+ and they had been successful, like Amazon, using it for their project. I feel sorry not having come across the work of Dr. Lamport earlier, this is such an important foundation to Software Engineering and programming languages:

TLA stands for the Temporal Logic of Actions, but it has become a shorthand for referring to the TLA+ specification language and the PlusCal algorithm language, together with their associated tools.

TLA+ is based on the idea that the best way to describe things formally is with simple mathematics, and that a specification language should contain as little as possible beyond what is needed to write simple mathematics precisely. TLA+ is especially well suited for writing high-level specifications of concurrent and distributed systems.

PlusCal is an algorithm language that, at first glance, looks like a typical tiny toy programming language. However, a PlusCal expression can be any TLA+ expression, which means anything that can be expressed with mathematics.

The best introduction I found so far on TLA+ was written by Stephan Merz (INRIA). TLA+ comes with a toolbox and a full set of documentation as well.

As you can imagine, sentences like “TLA+ is especially well suited for writing high-level specifications of concurrent and distributed systems” resonate immediately with my interests.

The way I parse and analyze new elements of knowledge is by constructing their metamodel. In 2003 I was part of the WS-CDL (Choreography Definition Language) working group and several people in the group felt that we should create a specification that was in essence aligning itself with π-calculus. The discussions were difficult and I felt there was quite a bit of hand waving in the π-calculus camp. It was not a stupid proposal by any means, but it seemed widely incomplete: it’s semantics were too limited to deliver a “choreography definition language”. A couple of years earlier Intalio had shoehorned the π-Calculus theory into BPML (which later became WSCI) and I felt that had been quite a disaster. So I set out to create a representation of the π-Calculus semantics, which ended the discussions immediately. The semantics of Robin Milner’s theory were indeed too limited and could not account for many scenarios. In the end, the theory, though related, was never designed to support choreography or process definitions. Serendipity is such a romantic concept…

In the end, a theory is just that: a model, and without creating an objective representation of its semantics, you can pretty much argue anything you want such as “which means anything that can be expressed with mathematics” or the foundational predicate of TLA+ which states that: “a state is an assignment of values to a variable”

Let me say first that I really like TLA+ and that I wish I could be 1/10th of a fraction of someone capable of discussing with Dr. Lamport, but I am not and will never be.

So this is how I think, verbatim, and here is a digest, my digest, of what I wish I could discuss with Dr. Lamport.

First, I start by looking at how well, something like TLA+ aligns with what I call the M4 layer (mainly because I feel that UML is an M3 layer technology, and as such there can be more than one M3 layer). To my great surprise TLA+ is fully STAR compliant. It has a precise set of semantics for Action, State (Assignment) and Relationship (Tuples), and Types (Sets and Records). This is the first time I see a conceptual model so complete. It’s hard to contain my excitement.

Second, I try to construct the metamodel. In the present case, it is a bit more difficult because the documentation is simply too hard for me to interpret (and somewhat inconsistant -that’s what a metamodel surfaces instantly). The high level view of the metamodel states that a digital system exposes a collection of behaviors, behaviors are represented as a graph of states and events (aka steps). The specification of the digital system uses sets, functions and logic operators (using the BOLT notation) :

I claimed earlier that the metamodel was already inconsistent at the highest level, because when you analyze the description Dr. Lamport makes and the semantics of TLA+, they seem to diverge (but again, I am barely scratching the surface here):

For instance, as Dr. Lamport reduces a state as a mere variable assignment, it is not clear how a state relates to an expression from a TLA+ semantics point of view (next-state action).

So until I am able to create the metamodel with enough precision, I’d like to discuss what I feel are some important points such as “a state is an assignment of values to a variable”, because I feel strongly that it is not. Yes, as a first approximation model, it would be hard to disagree, say when you take the example of an hour clock. But, when you take a close look at the first metamodel (above), one can also infer that a state can be uniquely identified by the actions that lead to the state and the actions that are enabled when you are in that particular state. I prefer that definition because it is type independent (Please see my post on the “Die Hard” problem for more details). I do not view “hr = 1” and “hr = 2” … as different “states” of the clock because from an action perspective, they are strictly equivalent. I understand that there is a confusion in Software Engineering as to what state is (most software engineers would define state as the snapshot of a type instance), but that definition would be independent of the transitions into and out of that state.

This is how Dr. Lamport describes the Hour Clock system:

There is a basic problem with using a state machine as a specification. Consider an hour clock, a clock that displays only the hour. Ignoring the actual time that elapses between ticks, an hour clock is trivially specified by a state machine that increments the hour with each step

and this is how this system is modeled with TLA+:

This example is quite approachable and illustrates well the semantics of states and actions:

  • there are 24 possible initial states to the Hour Clock (HC) (as defined by the initial predicate: HCini)
  • the nex-action predicate (HCnxt) defines a single action which transitions from a given state to the next
  • the allowed states and their temporal relationships are specified by the HCsafe expression (start from any of the allowed initial states and apply the next-action for ever. 

Now, I don’t ever expect Dr. Lamport to agree with me but I believe that his definition of a state is physically incorrect, though mathematically sound: “a state is an assignment of values to a variable”. It introduces a strong coupling between the states, types and actions. Yes, indeed, the ultimate result of the action will be to increment a counter, but does that define a state? It’s a bit like saying the state of an electron is solely defined by the energy it took to reach that state. Not quite. Computer Scientists have a permanent bias towards the reification of the conceptual view (states and relationships) into the physical view (actions and types). I believe that bias is wrong and must be avoided: states and relationships must never be expressed in terms of actions and types (and vice versa). That reification is the root of all problems in Software Engineering.

States are specific entities (uniquely identified by a label, not properties): states are not variable assignments, though they can be reduced to it. The engine of a car can reach the “started” state once the starter is properly powered or when someone pushes it fast enough. And yes, I can relate the fact that each time the engine is in the started state, RPM > 0, but are all property assignments a state? Yes, it does look like somebody PUT a RPM value, but isn’t that a mere observation? When I set my car’s color to “blue”, did I change its “state”? That’s why I use the word “reduce” because the perspective of a variable assignment does not reflect properly the metamodel of how state, types and actions relate. In particular it couples (if not pushes) the responsibility of deciding whether a change of state has occured within the action, which IMHO, is the wrong way to look at distributed systems. What happens when you reify the concept of state behind the concept of type? You end up writing arguments like that one:

However, a naive state-machine specification of the hour clock asserts that the hour changes with every step, while an hour-minute clock changes the hour only on every 60th step. This problem is solved by requiring invariance under stuttering. The specification of the hour clock must allow any finite number of stuttering steps, ones that leave the hour unchanged, between successive changes to the hour. Steps of the hour-minute clock that change only the minute are then stuttering steps allowed by the hour clock’s specification

Of course, my limited knowledge of TLA+ does not entitle me to critique Dr. Lamport’s work, but it is difficult not to look at a more accurate representation of the state machine of a clock and see for instance that a “Daily Savings Time” action would reach the “ticked” state just as a normal “tick” action (in other words, the clock should reach the “ticked” state in both cases, though an observer of the “hour clock” might see no “change in state” if they simply look at the variable assignment.

I would also want to see in the TLA+ description a “started” state which can transition to a “stopped” state, which prevents the clock to further honor requests to “tick”. I would also question why would the “states” of a clock change based on its precision? It only does when you are coupling types and states.

So I don’t mean to be overly condescending, because I understand, this is exactly how it looks, but I feel that one more time, the reification of the conceptual view (states) into the physical view (types) is failing. If Dr. Lamport was a physicist, and not a Mathematician, he might understand the argument that Physics has four foundamental concepts: Light, Mass, Energy and Force. Not, 5, not 2: 4. Optics has been very successful at demonstrating that one can deliver a lot of valuable insights by ignoring, Mass, Energy and Force, but in the end, you hit a wall and you cannot satisfactorily describe a system by considering light alone and reifying all the other concepts as mere “assignments” (such as refractive indices).

The most constraining wall that we are hitting with this kind of reification is that we cannot define a common structure to problem statements. If the proper view of “state” is brought into perspective, one can then define a problem as:

  • either a missing transition between two states
  • or an unwanted (but possible) transition between to states

Dr. Lamport was kind enough to evaluate this definition and immediately spotted that it was incomplete. First he noted that this definition was limited to “engineering” problems. He also added that one condition was glaringly missing:

An incorrect behavior can also be the result of an incorrect initial state (this is rare) and as the result of an incorrect or missing liveness/fairness property. The latter as a distinct class of errors is often overlooked because one simply assumes a sufficiently strong fairness assumption, so that failure to satisfy a liveness property must be due to one or more missing state transitions. For example, consider a two-process program in which one process can loop forever and the second one can perform a single action, and correctness requires the second process?s action to be performed. Without a fairness condition to ensure that the second process takes a step, then there can be a behavior in which only the first process takes steps–a “problem” that is neither of the two you claim to be the only ones.

Why is formalizing the problem specification important? because except for obvious problems, software is built from a problem specification (a.k.a. requirements, user stories, …). TLA+ has created the foundation for a formal solution specification, which we could build on to create a formal problem specification. Once we have a formal problem and solution specification, we open the door to support formal verification as well.

I understand that I am playing way out of my league here, but I cannot help thinking that reifying the conceptual view into the physical is a monumental mistake, in particular, it makes it more difficult for a problem specification formalism (aligned with TLA+) to emerge.

Leave a Reply

Your email address will not be published. Required fields are marked *

+ 14 = 15