Wednesday, September 7, 2016

Monitoring Distributed Systems – Lessons from the RELEASE project

The observer effect tells us that we cannot observe a system without disturbing it; on the other hand, without observation we are unable to correct, evolve or optimise systems. In this post we first discuss the general question of observing distributed systems, and then we look at some of the ideas and prototypes for more effective monitoring that come out of the RELEASE project, which was focussed on building scalable distributed systems in Erlang.


We begin by exploring what it means to monitor a distributed system, how it is done, and the uses to which monitoring data can be put. Sloppily, we know, we will use the term “monitoring” to cover activities of monitoring, observation, tracing and visualisation: what they have in common is the fact that they involve extracting information from a system which goes beyond the information it is required to produce to function and gives information about how it is performing that function.

After setting out the various dimensions of monitoring, we conclude by discussing the ways in which monitoring can load a system. This sets the scene for a discussion of how to mitigate this loading, in the context of Erlang distributed systems.

Why monitor?

We monitor to make sure that a system is delivering what it should. This might be a matter of ensuring that it can maintain performance over time, as unread messages accumulate in mailboxes, perhaps. It could mean observing the way in which (eventual) consistency is managed, so that the system is able to respond appropriately despite there being disruptions to infrastructure. It could also have a more active purpose: to allow the system to adapt to changing circumstances, either through real-time reactive behaviour, or through longer term code evolution.

What are we monitoring?

In a distributed system two kinds of activity can be observed: first there are data which come from a single node, such as CPU usage, or trace messages to particular function calls; there are also data arising from the traffic between different nodes.

As the examples just given show, these data can be event-based, such as log messages about function calls or other system events; or they can be observations, such as CPU usage. While it is in the user’s control how often to observe a property of the system, trace messages will be generated according to how the system is behaving, and so the rate and scale of their production is not under user control.

Different levels of monitoring are possible, too. Simple data, such as CPU or memory usage, can be gathered for all systems, whereas in many cases the most useful data is application-aware, more directly reflecting the semantics of the system that is being observed.

When do we monitor?

Monitoring can be online: information about a system is available in real-time, so that problems can be identified and rectified as a system runs. A drawback of this approach is that data analysis is limited by the fact that it must produce results on the basis of what has happened up to a point during execution. It must also do this in a timely way, using a limited amount of data.

On the other hand, more complex and complete analyses can be produced post hoc using data generated during the system execution. The two are combined in a hybrid approach, in which  offline analyses are repeatedly generated for portions of an execution.

How do we react?

It may be that the results of monitoring are presented to an operator, and it is up to her to decide how to react to what she sees, reconfiguring or rebooting parts of a system for instance. It is also possible to design automated systems that use monitoring data: these range from a simple threshold trigger fired, for example, by CPU loading reaching a particular level, to a fully autonomic system.

The burden of monitoring

Monitoring requires data to be produced by the system, and for these data to be shipped around the system to a central monitoring point where storage, analysis and presentation can take place. We’ll look at these in more detail once we have discussed a particular concrete case: Erlang/OTP.


Erlang is a concurrent, distributed, fault-tolerant programming language that comes with the Open Telecom Platform (OTP) middleware library. In distributed Erlang each node runs a copy of the BEAM abstract machine and runtime: separate nodes can be run on the same or different hosts. The “out of the box” Erlang distribution model provides full connectivity between all nodes; but this can be modified by manual deployment and connection of “hidden” nodes, which are unconnected by default. Erlang nodes can also be partitioned into global groups, with each node in at most one group. (Note: there is current work on improving the scalability of distributed Erlang: link).

SD-Erlang is an adaptation of Distributed Erlang designed to be more scalable by limiting all-to-all connectivity to groups (called “s_groups”). These groups may overlap, thus providing indirect connectivity between nodes in different s_groups, SD Erlang also allows non-local spawning of processes on nodes determined by various attributes including measures of proximity.

Improving monitoring 

In this section we  look at a number of ways of improving monitoring of distributed systems; these are illustrated by means of tools built to monitor Erlang and SD Erlang systems, but the lessons are more generally applicable.

SD-Mon is a tool designed specifically to monitor SD-Erlang systems; this purpose is accomplished by means a “shadow” network of agents, which collect and transit monitoring data gathered from a running system. The agents run on separate nodes, and so interfere as little as possible with computations on system nodes; their interconnections are also separate from system interconnects, and so transmission of data gathered does not impose an additional load on the system network.

SD-Mon is configured automatically from a system description file to match the initial system configuration. It also monitors changes to the s_group structure of a system and adapts automatically to the changes, as illustrated here:

Filter data at source: BEAM changes

The Erlang BEAM virtual machine is equipped with a sophisticated tracing mechanism that allows trace messages to be generated by wide range of program events. Tracing in this way typically generates substantial data, which can be filtered out downstream. We were able instead to modify the tracing system itself to filter some messages “at source” and so avoid the generation and subsequent filtering of irrelevant data. These changes make the data gathered for Percept2 more compact.

Enhanced analytics: Percept2

Percept2 provides post hoc, process-level, visualisation of the working of the BEAM virtual machine on a single Erlang node. When a node runs on a multicore device, different processes can run on the different cores, and Percept2 enhances the analysis of the original Percept in a number of ways (Percept is a part of the Erlang distribution). For instance, it can distinguish between processes that are potentially runnable and those that are actually running, as well as collecting runtime information about the dynamic calling structure of the program, thus supporting application-level observation of the system. Multiple instances of Percept2 can be run to monitor multiple nodes.

Improved infrastructure: Percept2

The post hoc analysis of Erlang trace files for the visualisation in Percept2 is computationally intensive, but is substantially speeded up through parallel processing of the logs using  concurrent processes on a multicore machine. Logging of some trace information is also performed more efficiently using OS-level tracing in DTrace.

Network information: Devo / SD-Mon

Existing monitoring applications for Erlang concentrate on single nodes. In building Devo which visualises inter-node messaging intensity in real time, it was necessary to analyse Erlang trace information to find inter-node messages; Devo and SD-Mon also show information about messaging through nodes that lie in the intersection of two s_groups, thus providing the route for inter-group messages.

Monitoring and deployment: WombatOAM

WombatOAM provides support for Erlang and other distributed systems that run on the BEAM virtual machine. As well as providing a platform for deploying systems, it also provides a dashboard for observing metrics, alerts and alarms across a system, and can be integrated with other monitoring and deployment infrastructure. WombatOAM is a commercial product from Erlang Solutions Ltd, developed on the RELEASE project.


The aim of this post is to raise the general issue of monitoring of advanced distributed systems. After doing this we have been able, inter alia, to illustrate some of the challenges in the context of monitoring Erlang distributed systems.

We are very grateful to the European Commission for its support of this work through the RELEASE project: EU-ICT Specific targeted research project (STREP) ICT-2012-287510. Thanks also to Maurizio Di Stefano for his contributions to this post.

Tuesday, May 10, 2016

Trustworthy Refactoring project

Research Associate Positions in Refactoring Functional Programs and Formal Verification (CakeML)

The Trustworthy Refactoring project at the University of Kent is seeking to recruit postdoc research associates for two 3.5 year positions, to start in September this year.

The overall goal of this project is to make a step change in the practice of refactoring by designing and constructing of trustworthy refactoring tools. By this we mean that when refactorings are performed, the tools will provide strong evidence that the refactoring has not changed the behaviour of the code, built on a solid theoretical understanding of the semantics of the language. Our approach will provide different levels of assurance from the (strongest) case of a fully formal proof that a refactoring can be trusted to work on all programs, given some pre-conditions, to other, more generally applicable guarantees, that a refactoring applied to a particular program does not change the behaviour of that program. 

The project will make both theoretical and practical advances. We will build a fully-verified refactoring tool for a relatively simple, but full featured programming language (CakeML, and at the other we will build an industrial-strength refactoring tool for a related industrially-relevant language (OCaml). This OCaml tool will allow us to explore a range of verification techniques, both fully and partially automated, and will set a new benchmark for building refactoring tools for programming languages in general. 

The project, which is coordinated by Prof Simon Thompson and Dr Scott Owens, will support two research associates, and the four will work as a team. One post will focus on pushing the boundaries of trustworthy refactoring via mechanised proof for refactorings in CakeML, and the other post will concentrate on building an industrial strength refactoring tool for OCaml. The project has industrial support from Jane Street Capital, who will contribute not only ideas to the project but also host the second RA for a period working in their London office, understanding the OCaml infrastructure and their refactoring requirements.

You are encouraged to contact either of the project investigators by email (, if you have any further questions about the post, or if you would like a copy of the full research application for the project. We expect that applicants will have PhD degree in computer science (or a related discipline) or be close to completing one. For both posts we expect that applicants will have experience of writing functional programs, and for the verification post we also expect experience of developing (informal) proofs in a mathematical or programming context.

To apply, please go to the following web pages:

Research Associate in Formal Verification for CakeML (STM0682): Link

Research Associate in Refactoring Functional Programs (STM0683): Link

Simon and Scott