Designing a Reliable API Call Component with SAM

06/16/15 | by jdubray | Categories: MDE, STAR


The goal of this post is to demonstrate how the SAM Pattern (State-Action-Model) can be put in practice using a simple but practical end-to-end scenario.

Let’s start by building a reliable API call component which purpose is to call an API and be capable of detecting timeouts and retrying several times in case the API is not available. It should also be able detect error messages returned by the API call and return them as errors to the caller.

The development workflow is straightforward:

  • define the control states of the component,
  • its actions and guards
  • the corresponding Type(s) of the Model.

The semantics we use are based on the STAR metamodel. Perhaps something unusual worth noting is that STAR allows an action to have two possible outcomes (the resulting state being decided by the type based on the property values). This is the case of the receive:response action.

Designing a Reliable API Call Component with SAM

The transitions out of the “errored” control state are guarded to reflect different types of errors. We could have used different states for each type of error, but control states are expensive to implement and you generally want to minimize the number of control states in a given system.

The corresponding actions are pretty straightforward:

send:         /→ makes the API call
tick:         /→ a system ticker that allows to monitor for timeouts
receive:      /→ provides the response
return:       /→ prepares the response message to be returned to the requestor
error:        /→ checks if retry is needed or prepares the error message 
                      to be returned to the requestor 

The “Call” type should look like this:


  • data is the data provided by the requestor for call the API
  • request is the request prepared from the data to make the API call
  • requested is a control variable that turns true when the API is called
  • ticks is the number of ticks since the last API call
  • retryCount is the number of times the API was called for a given request
  • response is the data received from the API Call
  • error is the error received from the API call (if any)
  • responded is a control variable



The Call type relies on a couple of constants: retryCountMax that specifies how many times the call should be retried in case of a timeout, and the duration of the timeout (tickMax) in the number of ticks. 

You may also have noticed that I didn’t mention the hash parameter. Reactive architectures are all about achieving greater efficiencies, and if the Model doesn’t change, well, there is nothing interesting to propagate. So, when we return a response to the caller, we return a hash too, which can then be provided next time a call is requested to give us an indication as to which dataset the caller currently holds. When a response is returned by API, the guard on the return action would then prevent that action to execute and the API call will simply end without returning data or errors.

Now that we have the Call type defined, we can specify the ranges from which the control states are computed. The STAR semantics require that these ranges need to be designed such that for any set of variable assignments only on control state is returned.

started  (data != null) && !(waiting || responded || errored || retried)
waiting (requested == true) && !(responded || errored || retried)
responded (response != null) && (error == null)
errored (error != null) || (ticks > tickMax) 
retried (retryMax>retryCount>0) && (ticks == 0) && !(waiting || responded || errored)


The last step is about implementing the actions and the guards. Actions are purely functional, the type properties are provided as an input and they return an output which is then presented to the type to update its properties. Only the type can decide whether it accepts the values, whether it is partially or atomically, and subsequently derive the new current state based on the new values.

request:data {
 call = new Call() ;
 data = data ;
 return call;

request that instantiates the Call component

results in “started” state which is assigned an automatic send: action

send:call {
 request’ = f(data) ;
 write(request,outputStream) ;
 requested’ = true ;
 response’ = null ;
 error’ = null ;
 ticks’ = 0 ; 
 return call’; 

action which prepares the request from the caller’s data (and metadata), via the function f(), and initiates the call to the API

results in waiting state which is expecting two actions triggered externally: tick: and receive:

tick:call {
 ticks’ = ticks+1 ;
 return call’;


action which is invoked by the system at a periodic interval. When ticks > ticksMax, the component transitions to the error state

retry:call {
 retryCount’ = retryCount + 1 ;
 tick’ = -1 ;
 return call’;


action which gets the component ready to resend the request

results in “retried” state which initiates an automatic send: action

We use a value of -1 to avoid a collision with the waiting state.

receive:response,error {
 response’ = response ;
 error’ = error ; 
 return call’;

the received action is triggered when the system receives the API call response (or error), the action may result in the “responded” or “errored” states

return:response {
 response’ = g(response)
 publish response’ ;

the return action prepares the response for the caller and publishes the response

Several components may subscribe to the notification

error:error {
 error= g(error)
 publish error’ ;

the error action prepares the exception response for the caller and publishes it

Several components may subscribe to the notification


A couple of guard conditions need to be added to complete the definition of the component: 

ReturnGuard = (hash != hash(response)) 
HashGuard = !ReturnGuard


That’s it! You can see the sample app in action here.

The behavior of the component if fully described as:

B = (Si,Aj,Gk,Tn)

Designing a Reliable API Call Component with SAM

To create an implementation of the pattern, we use the STAR framework (javascript edition) and node.js

The sample code can be found here (src/samples/apicall), there is also a server-side implementation, using WebSocket availble here (src/samples/node-apicall):

  • the CallType class represents the model of the call component
  • the Call class is the behavior which implementation follows the SAM pattern
In the STAR programming model the Call class is called a "Behavior", it is the class that owns the states, actions/guards and ranges (mapping between the type assignments and the corresponding control states). 
The trace of the execution is provided by the STAR framework as a PlantUML diagram:

SAM - the State-Action-Model pattern

06/13/15 | by jdubray | Categories: MDE



In the last six month I went on a journey that started with a discussion with Dr. Lamport on the semantics of TLA+ and continued with the realization of how significant and innovative the Facebook’s React.js Application Architecture is. I spent some time learning AWS Lambda, and just like React, Lambda is so innovative that it has the potential to break every rule we know in Software Engineering.  

I would like to share with you the place where I landed because I feel it may be relevant to the work of many people. After all, both TLA+ and React are used by big names in our industry to solve elegantly what were otherwise intractable problems, with corny solutions that had not changed much since the 70s.


Computing is based on an approximation

The general theory of computation, and therefore the programming languages that were inferred from it, assume that a large class of problems can be solved by using deterministic state machines.

Even though we all understand that the true behavior of a computer is based on a State-Action metamodel, this approximation has resulted in avoiding the mention of states, because they are redundant, merely the shadow of every action. So when a computation transitions from state S1 to state S3 via the actions a and b:

       a    b
    S1 → S2 → S3

we only need to refer to the behavior of the system as [Lamport]:

       a    b
    .  → .  →  .


To be fair, it would be horrendous to have to specify every single state in our programs.  Most of them would have no value at all.

As a result of that approximation, programming languages were designed to work on a conceptual view of “state” which we can refer as an "assignment of variables", which in turn gave the super power to any program to potentially CRUD change the assignment of any variable at any step (looks like a description of the code you are writing? you are not alone). This approach works well in many cases, but there are a few cases when this approximation cannot be used.

The advent of structured programming nailed the coffin of the State-Action behavior by removing any major reason to use something other than an Action Behavior and these days, it is nearly impossible to find programmers who can think outside the Structured Programming box.

But truly, the most tragic consequence of that approximation is the reification of the concept of “control state” behind an assignment of variables. When we program with modern, so called "structured" languages, we have no particular structure that would give us a hint as to:

  1. which control state we are currently in
  2. which actions are possible from that particular control state, automatic or expected

Everyone has to handcode these constructs without any support from frameworks and programming languages. Actually, a culture has grown in our industry which views "State" as evil, and encourages everyone to prefer "stateless" (for whatever that means) architectures.  


Control States must be explicit


When building systems which core function is to “wait” (such as business process or service orchestration engines) as opposed to systems that execute, you are often facing the excruciating pain of realizing a state-action behavior in an action-assignment-based programming language. The French had designed a programming language that "waits": the GRAFCET, which found its niche in programming automata, and not surprisingly, was based on the formalism of Petri Nets. However, programming languages which are based on a State-Action formalism remain a rarity. 

To illustrate the importance of control states and their relationship to actions, let’s take the example of … a glass of water (please replace by your favorite beverage, if it makes it easier to grok the argument).

In an action-based behavior, this system would be described by a state variable v (volume of water in the glass) and two actions: add water, drink. 


Fig. 1 Types, Actions and States


On the other hand, the state-action behavior of the glass relies on three control states, and the corresponding next-actions:

  • full: drink
  • in-between: drink, add water
  • empty: add water

I apologize for the triviality of this example, but it is important to come to the realization that in an action-based model the concept of “control state” will have to be emulated in some fashion or the system will not be able to respond to actions correctly (fill the glass infinitely, drink from a negative value).

In structured programming, correctness is achieved with the use of conditional expressions: “if-then-else” or “switch-case”, which tend to focus the attention of the developer on the next-action predicate, without requiring the identification of control states, and consequently without explicitly specifying the actions that should be enabled in a particular control state.

Actually, even state-action behaviors are challenged to provide a formalism where “control states” are modeled properly: they suffer from an approximation of their own because the action's metamodel connects the action to the end state, as if, the action of adding water to the glass would be able to know when the glass is full.

That second approximation is more subtle, but when you think of it, it has to be the type, i.e. the target of the action, which should decide which (control) state results from applying a given action. The action should have no knowledge whatsoever of the “current state” (let alone current control state) of its target. It should only present a series of variable assignments to the type which may or may not accept them, and as a result reach a new control state (or not, when the action fails or has no side effects). This seemingly mundane point is, I believe, why writing code is so error prone, because we either ignore the control states altogether and let actions assign variables directly, or, when control state is explicit, the formalism we use lets the action decides which control state should be reached (which is somewhat equivalent).

If we go back to our glass of water, either actions drink or add water can potentially reach two different states, again, it cannot be the action which decides which end state is being reached.

At this point, it is important to realize that functional programming alone would not help much if the mechanism that transitions from one control state to another is flawed. First, control states are facts, not something you can optionally choose to ignore, and second the metamodel of how control states are reached must be defined with the greatest precision. 

The STAR Programming Model


STAR stands for (control) State, Type, Action and Relationship. A STAR-based programming model, is a programming model where these four concepts are independent of each other, which is not the case, for instance, in OO where relationships are reified behind the properties of a Type, and there is no support for control state.

TLA+ is one of the formalisms that is closest to be STAR compliant and after my discussions with Dr. Lamport, I derived a tiny extension to TLA+ which makes control states explicit instead of reifying them behind the assignment of variables. As a matter of fact, Dr. Lamport consistently names the equivalent of control state variables with the “pc” moniker:


Fig. 2 TLA+ implementation of the Factorial Algorithm [Lamport]

What this fragment of TLA+ says is that if you are in the control state "mult" (multiply) you will transition automatically to the "test" control state and apply the action that will present a new set of values: f' and i' to the factorial type {f,i}. 

As you can see TLA+ creates a strong coupling between the current state, the automatic action performed when it is reached and the resulting target state. Of course one could argue that since "test" is not exactly a control state per se but is used instead to compute the resulting control state, the coupling is not as strong as I might suggest. Nevertheless, the metamodel of a TLA+ predicate, as it stands, is based on the standard State-Action behavior with step tuples such as: (Sn , A , Sn+1).

The reification of control states behind the variable assignments creates two challenges. First as we have just discussed (and as discussed in this previous blog), it forces us to introduce synthetic control states (“test”) which have nothing to do with the correctness or livelyness of the system while introducing an infortunate coupling of the true control states (“mult” and “done” in the condition expression). Control states are independent of each other.  

Second, the reification hides the true nature of a “control state”, as explained in the blog post, which is  far better represented as a range or ranges within all the possible assignments of variables (i.e. mathematical intervals) than by the use of conditional expression which are directly tied to action-based behavior: then and else predicates are not meant to be used for states but actions.

A STAR-based programming model is different because:

  • Type (variables) and Control States are distinct
  • Action are not linked to a target state, but to a target type
  • Actions “present” values to the type, they do not “assign variables”
  • Actions are therefore fully functional: f(in,out) with absolutely zero side effect
  • Types derive their current state based on ranges of variable assignments

But the most important implication of the STAR programming model is that Action-based behavior and State-Action behavior can coexist because we introduced the concept of Type that connects the two. An action can and should be implemented with an action-based behavior. If for anything, STAR is reinforcing the correctness of that approach.


Fig. 3 STAR-based programming model


Please note that in Fig. 3, the relationships (R in STAR) are not represented as the current discussion focuses on the relationships between States, Actions and Types.

I have also voluntarily left “blank” the initiator of the action, and the “response” mechanism which communicates the resulting state derived by the type (which, again, cannot be inferred from the action performed itself) which we'll discuss in the next section. 




React.js is truly a marvel of software engineering, first, because it finally sweeps away decades of failed GUI architectures under the banner of MVC, and of course, because of the amazing virtual DOM concept and its Diff algorithm.

What React.js can teach us is that the response of a model (type) to events (actions) must indeed be factored separately from the consumption of a model (by a view, or an external software agent). Showing us that it is not only possible, but also far superior to existing paradigms, is a major contribution to software architecture that will remain in our industry for decades to come.

Cherry on the cake for an old (state)ful guy like me, React.js components have both an explicit lifecycle with proper control states and an explicit mechanism to assign the value of variables (setState, distinct from properties).

With that in mind, however, React.js alone is too simplistic to deal with the design of complex systems and the team, not surprisingly, augmented the base Reactive pattern with … an action-based behavior: Flux. Yes, I know, some days you just want to cry: crossing control boundaries without the structure of control states to tame complexity and achieve greater correctness is at best naive. 

In Flux, all action requests are dispatched to “stores” which implement the changes to the model and which are then notifying the views of the corresponding changes. If the Reactive foundation is key to achieve the decoupling of the consumer (of the model) from the model, it suffers from all the shortcomings of action-based formalisms which couple the actions and the types, while ignoring the control states.


Fig. 4 The Flux Pattern


The State-Action-Model Pattern


Considering that STAR has now a Javascript implementation, it is easy to replace expand the Flux pattern with a STAR-based programming model (as we'll see in my next post), or what I call SAM, the State-Action-Model pattern:



Fig. 5 Aligning Flux and STAR - The State-Action-Model pattern


With the SAM pattern we can accommodate any software agent, not just React components (views) and there lies another very important lesson that can be learned from React.js and its component lifecycle: it is indeed possible to replace Request/Response inter-actions (be they synchronous or asynchronous) with a Diff/Notify pattern. React.js gives us the unique opportunity to bury another antiquated pattern that has plagued our industry for decades: RPC. That, again, will become a tremendous legacy of React.js.

AWS Lambda


If you have not explored AWS Lambda, one of the latest Amazon Services, I would highly recommend you do. SAM may well become a key pattern for Lambda, because Lambdas are an ideal building block for implementing SAM:

  • Lambdas can be used for implementing Actions (Behavior)
  • A Lamda can aslo be associated to every Type to accept the variable assignments and compute the resulting current state
  • As the assignments are made persistent, say in Dynamo DB, a change event can trigger the corresponding changeEvent to the requestor


In this post, we have shown how the introduction of a State-Action-Model behavior (as opposed to State-Action behaviors) is radically changing the way we can write software, without compromising the value of Action-based behaviors (and functional programming) while offering a close alignment with the semantics of TLA+. We have also shown that SAM can be used to augment key innovation such as TLA+ and React.js/Flux. We have also looked at how SAM could become a key pattern for AWS Lambda.

It would be hard to conclude this post without mentioning my favorite quote from Barbara Liskov:

I believe that the fundamental role of programs is to modify state not just the bits

In my next blog post, I create an example: the reliable API Call component, where SAM is implemented with the STAR Component Model which is available in Java and Javascript.


ReaCall: a new Architecture for API Client SDKs

05/23/15 | by jdubray | Categories: News, SOA, android


I have published project ReaCall, a new Architecture for API Client SDKs. The project was developed in collaboration with VIMOC Technologies, a leader in Landscape Computing. 

API Client SDKs are a must have for API providers who are looking to grow a vibrant ecosystem of 3rd party application developers. However, they are generally difficult to build and maintain, especially when the API evolves rapidly over time. 

You can always push the responsibility of building client-side SDKs to the developers, but that approach creates some friction, taking away precious resources from the developers which need to build plumbing, when they should be building solutions, and often breaking these solutions when the API changes.

Project ReaCall combines React.js and Swagger to make it easier for API providers to develop and evolve API Client SDKs (ACS).

ReaCall is a simple pattern enabled by React.js and Swagger's metadata. A ReaCall component is a simple React.js component which acts as a mediator between an application, its views and some 3rd party API. The application can either be Web (client-side or server-side) or Native, per React.js deployments models.

Api Components (React.js) are the core of the SDK. Here is how they are used in the VIMOC's SDK:


 You can learn more on the project's page on BitBucket. 


Hypermedia: don't bother

04/30/15 | by jdubray | Categories: SOA


It is no secret that I am not a big fan of applying Roy's REST to something other than the Web Architecture. Hipsters have pushed "REST" to architect distributed systems and craft APIs, but after a decade of RESTfulness, the RESTafarians have little to show for:  the so-called "uniform interface" is not that uniform (it's rather verbose), "resource-orientation" is just a fancy name for CRUD, schema-less contracts lead to ... Swagger and ... back to code generation, and so forth. I can't think of a single recommendation from Tilkov, Vinoski, Bray, Burke... that turned out to be correct. 

Since by definition a hipster would never admit defeat, the only strategy possible is the "fuite en avant": everything else was wrong, no worries, we got it right this time: the correct way to do REST is "hypermedia". Right?

I find it fascinating that the hardcore RESTafarians like Stefan Tilkov or Mike Amundsen are now using some of my arguments to "REST" correctly as they introduce the concept of a state machine to identify the verbs, yes, you heard it right, the verbs  of "the" once thought to be uniform interface.

In case you are not familiar with the role of state machines in computation, please I beg you to read this outstanding paper from Dr. Lamport, on Computation and State Machines. Anyone who writes code and designs interfaces must read it. 

With that in mind, let's explore the use of Hypermedia in distributed systems (M2M). Five years ago, I published this post detailing the metamodel behind REST to surface its semantics (interestingly RESTafarians are allergic to metamodels...):

There are three parts to REST:

  1. REST core (derived from Roy’s Thesis),
  2. What I call the practical REST which added query parameters, collections and properties of resources.
  3. What I call the “full REST” which of course includes states, transitions and actions.

If you disagree with me on the "full REST" and the connection of non-deterministic state machines with hypermedia, look no further than the great work of Ivan Zuzak et al on the “Formal Modeling of RESTful Systems using State Machines”.

I do agree that Hypermedia is the keystone of the architecture of the Web and without it, the Web, or what's left of it, would look very different, but unless you never wrote a single line of code in your life, you’d quickly spot that Hypermedia works because there is a user in the loop, in other words, a human, which is capable of interpreting the links and deciding which action to take next. Arbitrary strings of characters are as good as graffiti on the wall facing a computer’s “web”cam when it comes to helping a machine decide what to do next.

What can Hypermedia teach us about API design? Hypermedia clearly shows that the interface to a resource is not “uniform” never has been, never will be. Duh!! At last, after nearly a decade of "nounsense", Stefan and Mike use verbs... how refreshing. 

Is hypermedia hyped today with the same kind of bullshit intensity we heard when REST started? you bet! Look no further than this summary of RESTFest. Imagine... with Hypermedia you can build "bots", which, without the knowledge of chess, can play chess against each other... OMFG, Hypermedia can even do machine learning. I can't wait to see how Amazon's drones and Google's self-driving car use Hypermedia.

But, to be fair, when you take away the people who are financially and/or emotionally attached to REST, you get a reasonable set of arguments behind the use of hypermedia, or so it looks. For instance, Glenn Block wrote this post in 2012, where he explains:

In a hypermedia based system, links are generated based on a state machine. Depending on the state of the resource, different links apply. [This means that] you will have to deal with how to handle the generation [and interpretation] of links.

Model-driven engineering has been around for quite some time, and I have been building model-driven software since 1991 (thanks to NeXTStep and Interface Builder, and things like EOF…). Embedding metadata in a server’s response for the client to interpret has probably been around since the 60s, I would guess, the first distributed system ever built was using that "pattern" already.

So what do you gain by paying the penalty of inflating your responses with next-action metadata? Can you spontaneously teach a machine how to play chess that way? Well, no… From Roy’s thesis:

Machines can follow links when they understand the data format and relationship types.

Yes, a small detail that a lot of hipters fail to surface is that, machines need to form a shared understanding (out-of-band) of the data formats and the relationship types. Since they need to form a shared understanding in the first place, what could be the value of redundently Hypermediating machine responses in M2M scenarios?

First, in complex non-deterministic state machines, it might be great to give some hints (a.k.a. constraints) to the client as to what is possible in case the developer(s) of the client(s) can’t read the doc and create a client that works properly. Ah, but I almost forgot, you remember what the Vinoskis of the world were telling us in 2007? that client SDKs were no longer needed with REST, that simple human readable doc was enough? How did that pan out? Yep, the serious API providers have to hand-code at great cost what WSDL was generating for free. So it is unlikely that this kind of metadata will ever become a major driver for Hypermedia adoption, because the hypermedia will be buried behind the client SDK, and since the API providers write both the SDK and the APIs, they don't really need hypermedia to support the shared understanding between client and API.

Second, one could use the link metadata to construct UIs (with queries, actions…). RoFL!! You mean the RESTafarians could convince someone to rewrite a proprietary HTML engine, just for the privilege of using hypermedia? I am sure some people will pay top $$$ just for that privilege, just ask some of the people who paid north of $300/hr to learn how to convert verbs into nouns. After all they have already convinced our industry that it was cool to hand-code ESBs, client SDKs… So why not add an HTML engine, if they can find people who are ready to pay for learning how to do that?

The problem in finding value for M2M-Hypermedia, however, is not just shared understanding, it is shared state. Again, most of the Web’s architecture was built with the user in mind, when we talk about the “application” state, we talk about the user session, not much more. When we talk about M2M, we talk about many machines interacting with (and therefore changing the state of) each other (imagine a server booting, what does it mean to hold a representation of its state?). REST is essentially client/server, and therefore not an architecture suitable for distributed computing. REST's fundamental assumption is that the state you are handed off changes infrequently (as in never) and even if it changes, the stale state is still actionable (as in you can navigate to something that makes sense regardless). Web Apps have been architected with the constraint that the state is solely changed under the circumstances of the client. 

In M2M, state is likely to be stale, irrelevant, because another machine would have changed that state before any action could be taken on a given representation. IoT is not the WoT, otherwise it would be called the Web of Things. So, what is the point of having the knowledge of the "affordances" on a piece of state that is no longer relevant? Interesting question, isn't it?

So why are the RESTafarians pushing hypermedia so hard today when there is absolutely no value to show for?

The true reason is because REST has a flaw at its core, a major flaw, that the hipsters don’t know how to deal with, perhaps other than by pulling in the machinery of hypermedia. They won’t tell it to you that way, but I reckon most of the hypermedia you will see will be used to deal with that issue.

The problem with REST is that it couples access with identity.

When you write:


you are coupling the identity of the order (123) with the way you access its "state". That kind of URI can't be opaque to a machine/client (again all this works nicely when a user is in the loop). So at a minimum, REST needs an identity relationship that provides a iURId (as in: immutable Uniform Resource Identity) that a client can rely on to retrieve that resource at a later time (generally via a mapping from their own internal identity of the resource).

When you don't have a iURId to rely on, when a new version comes out and all the plain old URIs of the orders you hold on the client side become obsolete.

So, how do you know that these two URIs and point to the same order?

This is also true of “views” of a resource, how do you know that these views point to the same order?


Well, you really don’t know they are the same, unless you have a shared understanding with the resource owner that 123 is the identity of the order and is (one of) the way you access it.

Yep, you don’t hear much the hipsters telling you about these little rough edges of REST. My good friend Pete Williams does explain it, but in a different way:

A URI that is constructed by a client constitutes a permanent, potentially huge, commitment by the server. Any resource that may be addressed by the constructed URIs must forever live on that particular server (or set of servers) and the URI patterns must be supported forever

Effectively, you are trading a small one time development cost on the client side for an ongoing, and ever increasing, maintenance cost on the server side.

But who, who in their right mind, would let his client trust and follow a dynamic link based on a label (relationship name)? Are client-side URIs really such an acute problem? Isn’t compatible versioning enough to deal with these server-side cost?

The bottom line is that Hypermedia has zero value, especially in the context of M2M (again, I am not talking about the Web’s Architecture when a human is in the loop). I am certain that someone somewhere will find an application (or two) that shows some value, just like any interesting software pattern, beyond that, hypermedia will be a big waste of time for most people, just like the uniform interface, the coupling of identity with access, http caching, verbs vs nouns, human readable documentation… have been thus far.

Can we just turn the page (pun intended) on REST, once and for all? 


STAR-based Component Model with TLA+ semantics

04/12/15 | by jdubray | Categories: MDE


I am very excited to announce the SCM project, an open source STAR-based Component Model. SCM is derived from the TLA+ Specification Language and is the direct product of email conversations Dr. Lamport had the kindness to have with me during the month of January. 

The SCM project is open source (Apache 2.0) and is currently implemented in Java only. Javascript will be next. 

Formal methods are increasingly used in our industry to solve complex problems. For instance, the AWS team recently published this article summarizing their experience using TLA+:

Dangerous bugs turn out to be errors in design; the code faithfully implements the intended design, but the design fails to correctly handle a particular “rare” scenario. We have found that testing the code is inadequate as a method for finding subtle errors in design, as the number of reachable states of the code is astronomical. 

According to the AWS team, the key to solving that problem lies in increasing the precision of its designs. They found in the TLA+ Specification Language an ability to express all the legal behavior of the system. 

The SCM project delivers some of the semantics of TLA+ directly at the programming level (with a couple of noticeable deviations) and as such, enables a broad community of developers to harness the power of formal methods. What kind of power can you expect? Here is how the AWS teams sees the benefits of TLA+:

In industry, formal methods have a reputation for requiring a huge amount of training and effort to verify a tiny piece of relatively straightforward code, so the return on investment is justified only in safety-critical domains (such as medical systems and avionics). Our experience with TLA+ shows this perception to be wrong. At the time of this writing, Amazon engineers have used TLA+ on 10 large complex real-world systems. In each, TLA+ has added significant value, either finding subtle bugs we are sure we would not have found by other means, or giving us enough understanding and confidence to make aggressive performance optimizations without sacrificing correctness. Amazon now has seven teams using TLA+, with encouragement from senior management and technical leadership.

Just like in the case of TLA+, SCM allows you to express:

  • Safety. What the system is allowed to do. 
  • Liveness. What the system must eventually do.

Even though, SCM amounts to writing code, which could potentially be deployed in a running system (as a Component), it is not its primary intent. An SCM Component is optimized for rapidly converging to its most precise design, which in turn, may not be the highest performance implementation possible.

SCM is based on the STAR conceptual framework (State, Type, Action, Relation):

STAR framework

Fig. 1 - The STAR conceptual framework

STAR is a style where every concept we express is clearly positioned as a State, Type, Action or Relationship. STAR has applications in Strategy/Innovation, Problem/Solution Specification and today with SCM, in Programming/Verification as well. 

Combining the foundations of STAR and TLA+, I derived the following SCM Metamodel: 

Fig. 2 - The SCM Metamodel

What the model tells us is that an SCM component has a set of behaviors which is defined by a combination of States, Action, Types (and Relationships). The models deviates somewhat from TLA+ in the sense that the semantics of State (in STAR) would be called "Control State" in TLA+. In TLA+ a state is simply an assignment of variables (which is embodied by the Type) and control states appear as a variable (generally named pc) which is assigned values which are the labels of the control states. However, unlike TLA+, the Type and not the Action is responsible for the determination of the resulting state. That is why the second association between Actions and State is dashed, because it is infered, not specified. In SCM, Actions are purely functional and their results are presented to the Type which may or may not accept them, based on internal constraints. Once the Type accepts the new values, a determination of the new state is made, based on the mapping between a set of Ranges of values and States. (Control) States seem to be Ranges of values, not "assignments of values".

This is how the model operates:

Of course, these semantics can be implemented in TLA+ but with a substantial complexity, even though the notion of "Range of Values" clearly belongs to mathematics. When the action is not in the position to deterministically determine the target state, at design time (which is a very common case), people are forced to implement a synthetic state (generatlly called "test"), which purpose is to make the the determination. This can be seen in the Factorial example of Dr. Lamport's paper on "Computation and State Machines" (see page 12 , TLA+ definition).

Let's explore now how we use SCM. A couple of examples are provided in the SCM repository (the Factorial and DieHard algorithms). 

Based on Dr. Lamport's article, the design of the Factorial algorithm can be summarize as (BOLT Notation):

Note that:

  • Start is a standard action of the component model which determines the initial state of the component based on the values
  • Multiply and Set are associated to an automatic transition, i.e. as long as the control state of the system is "mult", the multiply action is invoked automatically
  • We chose not to implement a Zero state and rather forbid any initial value which is less than one

The initialize action illustrate well the relationship between action-type-state since the resulting state can only be determined once the action has been completed. Of course the action could make that decision, but that would be an obvious and unwanted coupling between the type and the action, since the resulting state could be influenced by parameters outside the scope of the action.

It is that decoupling between action and types which confer to SCM a balanced articulation between Functional Programming (Action), Object Orientation (Types/Actions) and State Machine (State/Actions). 

The design of the component is expressed in the Component constructor's, this is where elements of the metamodel are assembled. 

public Factorial(int i) throws Exception {
	fact = new FactorialType(i) ;
	multiply = new MultiplyAction("multiply()",fact) ;
	initialize = new InitializeAction("initialize()",fact) ;

	mult = new State("mult", multiply,true) ;
	def = new State("default", initialize) ;
	forbidden = new ForbiddenState("inputLessThanOne") ;
	desired = new DesiredState("lastMultState") ;
	//SCM comes with a couple of configurable ranges
	//SimpleRange and Interval
	SimpleRange defaultRange = new SimpleRange("i",Operator.equals,new Integer(1)) ;
	SimpleRange multRange = new SimpleRange("i",Operator.greaterThan,new Integer(1)) ;
	//Concrete ranges are OK too
	LessThanOneRange fRange = new LessThanOneRange() ;
	EqualsTwoRange desiredRange = new EqualsTwoRange() ;
	fact.addRange(mult, multRange)
	    .addRange(forbidden, fRange) 
	    .addRange(desired, desiredRange) ;

	behavior = new FactorialBehavior(fact) ;
	        .add(def) ;
	this.add(behavior) ;

	State currentState = fact.currentState() ;

	//Add trace

	trace = new Trace("fact_trace") ;

To facilitate the study of the Component's behavior, SCM generates PlantUML activity and state diagrams both as traces of execution or after a "walk". For instance the Factorial algorithm will generate that state diagram:

Factorial Algorithm State Diagram

Please note that the "d1" state is a desired state, not a control state. Here is corresponding PlantUML descriptor (generated by SCM):




I'd be more than honored if you would give SCM a try. If you have any question, feel free to reach me at


Pages: 1 3 4 5 6 7 8 9 10 11 ... 25



I work at

Recent projects

blog engine