Modern C++ programming: Coroutines with Boost

(c) https://xkcd.com/303/

We’re rewriting our network stack in Icinga 2.11 in order to to eliminate bugs with timeouts, connection problems, improve the overall performance. Last but not least, we want to use modern library code instead of many thousands of lines of custom written code. More details can be found in this GitHub issue.

From a developer’s point of view, we’ve evaluated different libraries and frameworks before deciding on a possible solution. Alex created several PoCs and already did a deep-dive into several Boost libraries and modern application programming. This really is a challenge for me, keeping up with the new standards and possibilities. Always learning, always improving, so I had a read on the weekend in “Boost C++ Application Development Cookbook – Second Edition“.

One of things which are quite resource consuming in Icinga 2 Core is multi threading with locks, waits and context switching. The more threads you spawn and manage, the more work needs to be done in the Kernel, especially on (embedded) hardware with a single CPU core. Jean already shared insights how Go solves this with Goroutines, now I am looking into Coroutines in C++.

 

Coroutine – what’s that?

Typically, a function in a thread runs, waits for locks, and later returns, freeing the locked resource. What if such a function could be suspended at certain points, and continue once there’s resources available again? The benefit would also be that wait times for locks are reduced.

Boost Coroutine as library provides this functionality. Whenever a function is suspended, its frame is put onto the stack. At a later point, it is then resumed. In the background, the Kernel is not needed for context switching as only stack pointers are stored. This is done with Boost’s Context library which uses hardware registers, and is not portable. Some architectures don’t support it yet (like Sparc).

Boost.Context is a foundational library that provides a sort of cooperative multitasking on a single thread. By providing an abstraction of the current execution state in the current thread, including the stack (with local variables) and stack pointer, all registers and CPU flags, and the instruction pointer, a execution context represents a specific point in the application’s execution path. This is useful for building higher-level abstractions, like coroutinescooperative threads (userland threads) or an equivalent to C# keyword yield in C++.

callcc()/continuation provides the means to suspend the current execution path and to transfer execution control, thereby permitting another context to run on the current thread. This state full transfer mechanism enables a context to suspend execution from within nested functions and, later, to resume from where it was suspended. While the execution path represented by a continuation only runs on a single thread, it can be migrated to another thread at any given time.

context switch between threads requires system calls (involving the OS kernel), which can cost more than thousand CPU cycles on x86 CPUs. By contrast, transferring control vias callcc()/continuation requires only few CPU cycles because it does not involve system calls as it is done within a single thread.

TL;DR – in the way we write our code, we can suspend function calls and free resources for other functions requiring it, without typical thread context switches enforced by the Kernel. A more deep-dive into Coroutines, await and concurrency can be found in this presentation and this blog post.

 

A simple Example

$ vim coroutine.cpp

#include <boost/coroutine/all.hpp>
#include <iostream>

using namespace boost::coroutines;

void coro(coroutine::push_type &yield)
{
        std::cout << "[coro]: Helloooooooooo" << std::endl;
        /* Suspend here, wait for resume. */
        yield();
        std::cout << "[coro]: Just awesome, this coroutine " << std::endl;
}

int main()
{
        coroutine::pull_type resume{coro};
        /* coro is called once, and returns here. */

        std::cout << "[main]: ....... " << std::endl; //flush here

        /* Now resume the coro. */
        resume();

        std::cout << "[main]: here at NETWAYS! :)" << std::endl;
}

 

Build it

On macOS, you can install Boost like this, Linux and Windows require some more effort listed in the Icinga development docs). You’ll also need CMake and g++/clang as build tool.

brew install ccache boost cmake 

Add the following CMakeLists.txt file into the same directory:

$ vim CMakeLists.txt

cmake_minimum_required(VERSION 2.8.8)
set(BOOST_MIN_VERSION "1.66.0")

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++0x")

find_package(Boost ${BOOST_MIN_VERSION} COMPONENTS context coroutine date_time thread system program_options regex REQUIRED)

# Boost.Coroutine2 (the successor of Boost.Coroutine)
# (1) doesn't even exist in old Boost versions and
# (2) isn't supported by ASIO, yet.
add_definitions(-DBOOST_COROUTINES_NO_DEPRECATION_WARNING)

link_directories(${Boost_LIBRARY_DIRS})
include_directories(${Boost_INCLUDE_DIRS})

set(base_DEPS ${CMAKE_DL_LIBS} ${Boost_LIBRARIES})

set(base_SOURCES
  coroutine.cpp
)

add_executable(coroutine
        ${base_SOURCES}
)

target_link_libraries(coroutine ${base_DEPS})

set_target_properties(
        coroutine PROPERTIES
        FOLDER Bin
        OUTPUT_NAME boost-coroutine
)

Next, run CMake to check for the requirements and invoke make to build the project afters.

cmake .
make

 

Run and understand the program

$ ./boost-coroutine
[coro]: Helloooooooooo
[main]: .......
[coro]: Just awesome, this coroutine
[main]: here at NETWAYS! :)

Now, what exactly happened here? The Boost coroutine library allows us to specify the “push_type” where this functions should be suspended, after reaching this point, a subsequent call to “yield()” is required to resume this function.

void coro(coroutine::push_type &yield)

Up until “yield()”, the function logs the first line to stdout.

The first call happens inside the “main()” function, by specifying the pull_type and directly calling the function as coroutine. The pull_type called “resume()” (free form naming!) must then be explicitly invoked in order to resume the coroutine.

coroutine::pull_type resume{coro};

After the first line is logged from the coroutine, it stops before “yield()”. The main function logs the second line.

[coro]: Helloooooooooo
[main]: .......

Now comes the fun part – let’s resume the coroutine. It doesn’t start again, but the function’s progress is stored as stack pointer, targeting “yield()”. Exactly this resume function is called with “resume()”.

        /* Now resume the coro. */
        resume();

That being said, there’s more to log inside the coroutine.

[coro]: Just awesome, this coroutine

After that, it reaches the end and returns to the main function. That one logs the last line and terminates.

[main]: here at NETWAYS! :)

Without a coroutine, such synchronisation between functions and threads would need waits, condition variables and lock guards.

 

Icinga and Coroutines

With Boost ASIO, the spawn() method wraps coroutines on a higher level and hides the strand required. This is used in the current code and binds a function into its scope. We’re using lambda functions available with C++11 in most locations.

The following example implements the server side of our API waiting for new connections. An endless loop listens for incoming connections with “server->async_accept()”.

Then comes the tricky part:

  • 2.9 and before spawned a thread for each connection. Lots of threads, context switches and memory leaks with stalled connections.
  • 2.10 implemented a thread pool, managing the resources. Handling the client including asynchronous TLS handshakes are slower, and still many context switches ahead between multiple connections until everything stalls.
  • 2.11 spawns a coroutine which handles the client connection. The yield_context is required to suspend/resume the function inside.

 

void ApiListener::ListenerCoroutineProc(boost::asio::yield_context yc, const std::shared_ptr& server, const std::shared_ptr& sslContext)
{
	namespace asio = boost::asio;

	auto& io (server->get_io_service());

	for (;;) {
		try {
			auto sslConn (std::make_shared(io, *sslContext));

			server->async_accept(sslConn->lowest_layer(), yc);

			asio::spawn(io, [this, sslConn](asio::yield_context yc) { NewClientHandler(yc, sslConn, String(), RoleServer); });
		} catch (const std::exception& ex) {
			Log(LogCritical, "ApiListener")
				<< "Cannot accept new connection: " << DiagnosticInformation(ex, false);
		}
	}
}

The client handling is done in “NewClientHandlerInternal()” which follows this flow:

  • Asynchronous TLS handshake using the yield context (Boost does context switches for us), Boost ASIO internally suspends functions.
    • TLS Shutdown if needed (again, yield_context handled by Boost ASIO)
  • JSON-RPC client
    • Send hello message (and use the context for coroutines)

And again, this is the IOBoundWork done here. For the more CPU hungry tasks, we’re using a different CpuBoundWork pool which again spawns another coroutine. For JSON-RPC clients this mainly affects syncing runtime objects, config files and the replay logs.

Generally speaking, we’ve replaced our custom thread pool and message queues for IO handling with the power of Boost ASIO, Coroutines and Context thus far.

 

What’s next?

After finalizing the implementation, testing and benchmarks are on the schedule – snapshot packages are already available for you at packages.icinga.com. Coroutines will certainly help embedded devices with low CPU power to run even faster with many network connections.

Boost ASIO is not yet compatible with Coroutine2. Once it is, the next shift in modernizing our code is planned. Up until then, there are more Boost features available with the move from 1.53 to 1.66. Our developers are hard at work with implementing bug fixes, features and learning all the good things.

There’s many cool things under the hood with Icinga 2 Core. If you want to learn more and become a future maintainer, join our adventure! 🙂

Michael Friedrich
Michael Friedrich
Senior Developer

Michael ist seit vielen Jahren Icinga-Entwickler und hat sich Ende 2012 in das Abenteuer NETWAYS gewagt. Ein Umzug von Wien nach Nürnberg mit der Vorliebe, österreichische Köstlichkeiten zu importieren - so mancher Kollege verzweifelt an den süchtig machenden Dragee-Keksi und der Linzer Torte. Oder schlicht am österreichischen Dialekt der gerne mit Thomas im Büro intensiviert wird ("Jo eh."). Wenn sich Michael mal nicht in der Community helfend meldet, arbeitet er am nächsten LEGO-Projekt oder geniesst...

CGo. Halb C – halb Go.

In meinem letzten Blog-Post, The way to Go, habe ich bereits einige Vorteile der Programmiersprache Go aufgezeigt. Und als ob es nicht schon genug wäre, dass man beim Umstieg auf Go nicht gleich das Kind mit dem Bade ausschütten muss: Ein einzelnes Programm muss nicht mal komplett in Go geschrieben sein…

Multilingual unterwegs

Ich arbeite für einen deutschen IT-Dienstleister, aber beim recherchieren von Lösungen für Kundenprobleme muss ich die russischen Quellen nicht erst ins Deutsche übersetzen, sondern kann direkt mit ihnen arbeiten. Genau so kann bspw. ein Go-Programm direkt von C-Bibliotheken Gebrauch machen – und umgekehrt. Vorausgesetzt, die Schnittstellen beschränken sich auf den kleinsten gemeinsamen Nenner.

Was wäre das auch wirtschaftlich für ein Fass ohne Boden wenn sämtliche benötigte Bibliotheken erst in Go neu geschrieben werden müssten. Umso mehr wundert es mich, dass so manche Datenbank-Treiber doch komplett neu geschrieben wurden…

Good programmers know what to write.
Great ones know what to rewrite (and reuse).

Eric Steven Raymond, The Cathedral and the Bazaar

Multilingual in der Praxis

Ich bin zwar kein Fernsehkoch, aber habe trotzdem schon mal was vorbereitet – eine extrem abgespeckte Version von go-linux-sensors:

package go_linux_sensors

/*
#include <stdio.h>
#include <string.h>
#include <sensors/sensors.h>
#include <sensors/error.h>

typedef const char cchar;

#cgo LDFLAGS: -lsensors
*/
import "C"

import (
	"reflect"
	"unsafe"
)

type Error struct {
	errnr C.int
}

func (e Error) Error() string {
	return cStr2str(C.sensors_strerror(e.errnr))
}

func Init() error {
	if err := C.sensors_init((*C.FILE)(nil)); err != C.int(0) {
		return Error{err}
	}

	return nil
}

func Cleanup() {
	C.sensors_cleanup()
}

func GetLibsensorsVersion() string {
	return cStr2str(C.libsensors_version)
}

func cStr2str(cStr *C.cchar) string {
	slen := int(C.strlen(cStr))

	bridge := reflect.SliceHeader{
		Data: uintptr(unsafe.Pointer(cStr)),
		Len:  slen,
		Cap:  slen,
	}

	return string(*(*[]byte)(unsafe.Pointer(&bridge)))
}

Diese schlägt eine Brücke zu libsensors, um die Hardware-Sensoren auszulesen.

Direkt nach dem Package folgt auch schon der Import des ominösen Pakets “C”. Dieses enthält sämtliche C-Standard-Datentypen und alle Datentypen und Funktionen, die über Header-Dateien im Kommentar direkt darüber eingebunden wurden. Ja, richtig, in diesem “magischen” Kommentar lässt sich nahezu beliebiger C-Code unterbringen, um auf jenen im Go-Code zuzugreifen. Des weiteren lassen sich die tatsächlichen Bibliotheken mit “#cgo LDFLAGS:” automatisch linken. Bei höheren Mengen an Code/Includes bevorzuge ich persönlich eine extra Header-Datei statt einem langen Kommentar.

Etwas weiter unten ruft Init() auch schon sensors_init() auf – als wäre es eine gewöhnliche Go-Funktion im Paket C. Und auch die Datentypen lassen sich verwenden als seien es die eigenen – wovon Error fleißig Gebrauch macht.

Lediglich cStr2str() macht einen etwas unorthodoxen Eindruck… aber hey – es funktioniert. 😉

Jeder gesparte Euro zählt…

… und jede wiederverwendete Zeile Code. Wir bei NETWAYS wissen das zu schätzen und arbeiten dementsprechend effizient – an euren Projekten!

Alexander Klimov
Alexander Klimov
Developer

Alexander hat 2017 seine Ausbildung zum Developer bei NETWAYS erfolgreich abgeschlossen. Als leidenschaftlicher Programmierer und begeisterter Anhänger der Idee freier Software, hat er sich dabei innerhalb kürzester Zeit in die Herzen seiner Kollegen im Development geschlichen. Wäre nicht ausgerechnet Gandhi sein Vorbild, würde er von dort aus daran arbeiten, seinen geheimen Plan, erst die Abteilung und dann die Weltherrschaft an sich zu reißen, zu realisieren - tut er aber nicht. Stattdessen beschreitet er mit der...

On giving up and trying an IDE

I dislike IDEs, at least I tell myself and others that. A 200 line long .vimrc gives a lot more street cred than clicking on a colored icon and selecting some profile that mostly fits ones workflow. So does typing out breakpoints in GDB compared to just clicking left of a line. Despite those very good reasons I went ahead and gave Goland and CLion, two JetBrains products for Go and C/C++ respectively, a chance. The following details my experiences with a kind of software I never seen much use for, the problems I ran into, and how it changed my workflow.

Installation and Configuration

A picture of my IDE wouldn’t do much good, they all look the same. So here’s a baby seal.
Source: Ville Miettinen from Helsinki, Finland


First step is always the installation. With JetBrains products being mostly proprietary, there are no repositories for easy installation and updating. But for the first time I had something to put in /opt. When going through the initial configuration wizard one plugin caught my eye: “IdeaVim”. Of course I decided to install and activate said plugin but quickly had to realize it does not work the same simply running vim in a window.
This “Vim emulation plug-in for IDEs based on the IntelliJ platform” sadly does for one not offer the full Vim experience and the key bindings often clash with those of the IDE. Further I started getting bothered by having to manage the Vim modes when wanting to write code.
I sometimes miss the ability to easily edit and move text the way Vim allows, the time I spend using the mouse to mark and copy text using the mouse are seconds of my life wasted I’ll never get back!
Except for the underlying compiler and language specific things both IDEs work the same, identical layout, window options, and most plugins are shared, so I won’t talk about the tool chain too much. But it needs to be said that Goland just works, possibly because building a Go project seems simpler than a C++ one. Getting CLion to work with CMake is tiresome, the only way to edit directives is by giving them to the command like you would on the shell. This would not bother me as much if CMake didn’t already have a great GUI.

Coding and Debugging

Yet I wouldn’t be still using those programs if there weren’t upsides. The biggest being the overview over the whole project, easily finding function declarations and splitting windows as needed. These are things Vim can be made to do, but it does not work as seamless as it does in the IntelliJ world. It made me realize how little time is spent the actual typing of code, most of it is reading code, drawing things and staring at a prototype until your eyes bleed confusion (sometimes code is not well commented). The debuggers, again specifically the one of Goland, work great! Sometimes I have to talk to GDB directly since there are many functions but too few buttons, but the typical case of setting a breakpoint and stepping through to find some misplaced condition is simple and easy.

Alright, here it is.


There are a few features I have not found a use for yet e.g. code generators and I still manage my git repositories from the shell. The automatic formatting is cool, again especially in Go where there is one standard and one tool for it. Other than that I run into a few bugs now and then, one that proved to be quite a hassle is the search/search and replace sometimes killing my entire window manager. Were it free software, there’d be a bug report. But for now I work around it. Maybe I’ll drop CLion but I doubt I’ll be writing any Go code in Vim anytime soon.
If you think you have found the perfect IDE or just want to share Vim tips, meet me at the OSMC in November!

Diana Flach
Diana Flach
Developer

Geboren und aufgewachsen in Bamberg kam Diana, nach einem Ausflug an die Uni, als Azubi zu NETWAYS. Dort sitzt sie seit 2014 im Icinga 2 Core Entwicklungsteam.

The way to Go

Lange Zeit waren die Auftragnehmer der Raumfahrt große Rüstungskonzerne mit eingefahrenen Strukturen und dem entsprechenden Produkten. Die Raketen waren bspw. nicht gerade dazu gedacht, sie wieder zu verwenden. Wahrscheinlich konnte sich kaum einer der Auftraggeber vorstellen, dass das auch ganz anders geht. Und dann kam Elon Musk und hat “mal eben” SpaceX auf die Beine gestellt… und dann gingen viele Dinge auf einmal viel besser. So ähnlich auch in unserer Branche…
Lange Zeit gab es zwar maschinennahe Programmiersprachen, aber diese waren umständlich in der Handhabung – insbesondere im Hinblick auf die parallele Ausführung mehrerer Aufgaben. Die konstante Größe der Thread-Stacks limitierte zusätzlich die Anzahl der Threads, so dass bspw. das in C++ geschriebene Icinga 2 aktuell die E/A auf einige wenige Threads verteilen muss. Seit 2009 gibt es immerhin NodeJS, das gut und gerne viele E/A-Aufgaben parallel ausführt, aber auch nur diese – für Rechenoperationen steht nur ein Thread zur Verfügung. Zudem sind die Typen und Funktionen dynamisch und damit nicht so maschinennah und performant wie bspw. in C++. Und da saßen die Programmierer bis 2012 zwischen diesen zwei Stühlen. Und dann hat Google 2012 die erste stabile Version von Go veröffentlicht… und damit gingen viele Dinge auf einmal viel besser. So auch mittlerweile bei NETWAYS

Und was macht dieses Go jetzt besser als alle anderen?

Wie mein Kollege Florian sagen würde: “So einiges.” Aber Scherz beiseite…
Go ist maschinennah – d.h. die Typen und Funktionen sind allesamt statisch und werden wie auch bei bspw. C++ im voraus in Maschinencode umgewandelt – mehr Performance geht nicht.
Go ist einfach (obwohl es maschinennah ist). Die Datentypen sind zwar statisch, also explizit, aber deren Angabe ist nur so explizit wie nötig:

type IcingaStatus struct {
   Name, Description string
}
var IcingaStatusSet = map[uint8]IcingaStatus{
   0: {"OK",       "Alles im grünen Bereich"},
   1: {"WARNING",  "Die Ruhe vor dem Sturm"},
   2: {"CRITICAL", "Sämtliche Infrastruktur im Eimer"},
   3: {"UNKNOWN",  "Mein Name ist Hase, ich weiß von nichts"},
}

Im gerade gezeigten Beispiel muss der Datentyp der Map-Variable nur einmal angegeben werden. Weder die Typen der enthaltenen Werte, noch deren Felder müssen angegeben werden – sie werden vom Typ der Map abgeleitet. Wer befürchtet, den Überblick zu verlieren, kann auf die IDE GoLand zurückgreifen:

Go ist relativ sicher vor Unfällen (obwohl es maschinennah ist). Bei Zugriff auf eine Stelle eines Arrays, die gar nicht existiert oder unzulässiger Umwandlung von Zeiger-Datentypen wirft Go einen Fehler, um Schäden durch Programmierfehler abzuwenden:

type Laptop struct {
    DvdDrive uint32
}
func (l *Laptop) DoSomethingUseful() {
}
type SmartPhone struct {
    SimSlot uint16
}
func (s *SmartPhone) DoSomethingUseful() {
}
type Computer interface {
    DoSomethingUseful()
}
func main() {
    var computers = []Computer{&Laptop{}, &Laptop{}, &Laptop{}}
    _ = computers[3]
    var computer Computer = &Laptop{}
    _ = computer.(*SmartPhone)
}

Go erledigt von sich aus E/A-Aufgaben effizient (obwohl es nicht NodeJS ist). Aufgaben werden in Go nicht über Threads parallelisiert, sondern über sog. Go-Routinen (das gleiche in grün). Diese werden von Go selbst auf die eigentlichen Threads verteilt. Wenn eine Go-Routine eine blockierende E/A-Operation ausführt, wird diese transparent im Hintergrund vollzogen und eine andere Go-Routine beansprucht währenddessen den Thread.
Der Himmel ist die Grenze der Parallelisierung dank Scheduler und dynamischer Stack-Größe. Die o.g. Verteilung von Go-Routinen auf Threads verantwortet der sog. Scheduler von Go. Dies führt dazu, dass “zu” viele parallele Aufgaben sich und dem Rest des Systems nicht im Weg stehen. Zudem beansprucht jede Go-Routine nur soviel RAM wie sie auch wirklich braucht, d.h. eigentlich kann ein Programmierer so viele Go-Routinen starten wie er lustig ist (Beispiel). “Eigentlich” ist genau das richtige Stichwort, denn trotzdem sollte jeder Einzelfall für sich betrachtet werden. Ansonsten macht das OS irgendwann git push --feierabend (Beispiel).
Go geht einfach (daher kommt wahrscheinlich auch der Name). Im Gegensatz zu etablierten maschinennahen Sprachen muss ich mich nicht darum kümmern, dass libfoobar23.dll an der richtigen Stelle in der korrekten Version vorliegt. Das Ergebnis eines Go-Kompiliervorgangs ist eine Binary, die nichtmal gegen libc gelinked ist:

root@576214afd7e6:/# cat example.go
package main
func main() {
}
root@576214afd7e6:/# go build -o example example.go
root@576214afd7e6:/# ./example
root@576214afd7e6:/# ldd ./example
not a dynamic executable
root@576214afd7e6:/#

Alles kann, nichts muss. Go muss ja nicht von heute auf Morgen in sämtlichen Applikationen Anwendung finden. Man kann auch mit einem einzigen Programm anfangen, das nicht heute, jetzt und eigentlich schon vorgestern fertig sein muss. Und selbst wenn nicht alles beim ersten Mal klappt, bieten wir Ihnen gerne maßgeschneidertes Consulting an.

Alexander Klimov
Alexander Klimov
Developer

Alexander hat 2017 seine Ausbildung zum Developer bei NETWAYS erfolgreich abgeschlossen. Als leidenschaftlicher Programmierer und begeisterter Anhänger der Idee freier Software, hat er sich dabei innerhalb kürzester Zeit in die Herzen seiner Kollegen im Development geschlichen. Wäre nicht ausgerechnet Gandhi sein Vorbild, würde er von dort aus daran arbeiten, seinen geheimen Plan, erst die Abteilung und dann die Weltherrschaft an sich zu reißen, zu realisieren - tut er aber nicht. Stattdessen beschreitet er mit der...

The Icinga Config Compiler: An Overview

The Icinga configuration format was designed to be easy to use for novice users while at the same time providing more advanced features to expert users. At first glance it might look like a simple key-value configuration format:

object Host "example.icinga.com" {
	import "generic-host"
	address = "203.0.113.17"
	address6 = "2001:db8::17"
}

However, it features quite a bit of functionality that elevates it to the level of scripting languages: variables, functions, control flow (if, for, while) and a whole lot more.

Icinga’s scripting language is used in several places:

  • configuration files
  • API filter expressions
  • auto-generated files used for keeping track of modified attributes

In this post I’d like to show how some of the config machinery works internally.

The vast majority of the config compiler is contained in the lib/config directory. It weighs in at about 4.5% of the entire code base (3059 out of 68851 lines of code as of July 2018). The compiler is made up of three major parts:

Lexer

The lexer (lib/config/config_lexer.ll, based on flex) takes the configuration source code in text form and breaks it up into tokens. In doing so the lexer recognizes all the basic building blocks that make up the language:

  • keywords (e.g. “object”, “for”, “break”) and operators (e.g. >, ==, in)
  • identifiers
  • literals (numbers, strings, booleans)
  • comments

However, it has no understanding of how these tokens fit together syntactically. For that it forwards them to the parser. All in all the lexer is actually quite boring.

Parser/AST

The parser (lib/config/config_parser.yy, based on GNU Bison) takes the tokens from the lexer and tries to figure out whether they represent a valid program. In order to do so it has production rules which define the language’s syntax. As an example here’s one of those rules for “for” loops:

| T_FOR '(' identifier T_FOLLOWS identifier T_IN rterm ')'
{
        BeginFlowControlBlock(context, FlowControlContinue | FlowControlBreak, true);
}
rterm_scope_require_side_effect
{
        EndFlowControlBlock(context);
        $$ = new ForExpression(*$3, *$5, std::unique_ptr($7), std::unique_ptr($10), @$);
        delete $3;
        delete $5;
}

Here’s a list of some of the terms used in the production rule example:

Symbol Description
T_FOR Literal text “for”.
identifier A valid identifier
T_FOLLOWS Literal text “=>”.
T_IN Literal text “in”.
BeginFlowControlBlock, EndFlowControlBlock These functions enable the use of certain flow control statements which would otherwise not be allowed in code blocks. In this case the “continue” and “break” keywords can be used in the loop’s body.
rterm_scope_require_side_effect A code block for which Icinga can’t prove that the last statement doesn’t modify the program state.
An example for a side-effect-free code block would be { 3 } because Icinga can prove that executing its last statement has no effect.

After matching the lexer tokens against its production rules the parser continues by constructing an abstract syntax tree (AST). The AST is an executable representation of the script’s code. Each node in the tree corresponds to an operation which Icinga can perform (e.g. “multiply two numbers”, “create an object”, “retrieve a variable’s value”). Here’s an example of an AST for the expression “2 + 3 * 5”:

Note how the parser supports operator precedence by placing the AddExpression AST node on top of the MultiplyExpression node.

Icinga’s AST supports a total of 52 different AST node types. Here are some of the more interesting ones:

Node Type Description
ArrayExpression An array definition, e.g. [ varName, "3", true ]. Its interior values are also AST nodes.
BreakpointExpression Spawns the script debugger console if Icinga is started with the -X command-line option.
ImportExpression Corresponds to the import keyword which can be used to import another object or template.
LogicalOrExpression Implements the || operator. This is one of the AST node types which don’t necessarily evaluate all of its interior AST nodes, e.g. for true || func() the function call never happens.
SetExpression This is essentially the = operator in its various forms, e.g. host_name = "localhost"

On their own the AST nodes just describe the semantical structure of the script. They don’t actually do anything in terms of performing any real actions.

VM

Icinga contains a virtual machine (in the language sense) for executing AST expressions (mostly lib/config/expression.cpp and lib/config/vmops.hpp). Given a reference to an AST node – such as root node from the example in the previous section – it attempts to evaluate that node. What that means exactly depends on the kind of AST node:

The LiteralExpression class represents bare values (e.g. strings, numbers and boolean literals). Evaluating a LiteralExpression AST node merely yields the value that is stored inside of that node, i.e. no calculation of any kind takes place. On the other hand, the AddExpression and MultiplyExpression AST nodes each have references to two other AST nodes. These are the operands which are used when asking an AddExpression or MultiplyExpression AST node for “their” value.

Some AST nodes require additional context in order to run. For example a script function needs a way to access its arguments. The VM provides these – and a way to store local variables for the duration of the script’s execution – through an instance of the StackFrame (lib/base/scriptframe.hpp) class.

Future Considerations

All in all the Icinga scripting language is actually fairly simple – at least when compared to other more sophisticated scripting engines like V8. In particular Icinga does not implement any kind of optimization.
A first step would be to get rid of the AST and implement a bytecode interpreter. This would most likely result in a significant performance boost – because it allows us to use the CPU cache much more effectively than with thousands, if not hundreds of thousands AST nodes scattered around the address space. It would also decrease memory usage both directly and indirectly by avoiding memory fragmentation.
However, for now the config compiler seems to be doing its job just fine and is probably one of the most solid parts of the Icinga codebase.

Flapping in Icinga 2.8.0

The author viewing the code for the first time


Flapping detection is a feature many monitoring suites offer. It is mainly used to detect unfortunately chosen thresholds, but can also help in detecting network issues or similar. In most cases two thresholds are used, high and low. If the flapping value, which is the percentage of state changes over a set time, gets higher than the high threshold, it is considered flapping. It will then stay flapping until the value drops below the low threshold.
Naturally Icinga 2 had such a feature, just that it implemented a different approach and didn’t work. For 2.8.0 we decided it was time to finally fix flapping, so I went to investigate. As I said the flapping was working differently from Icinga 1, Shinken, etc. Instead of two thresholds there was just one, instead of one flapping value there were two and they change based on the time since the last check. Broken down it looks like this:

positive; //value for state changes
negate; //value for stable changes
FLAPPING_INTERVAL; //Compile time constant to smoothen the values
OnCheckResult() {
  if (positive + negative > FLAPPING_INTERVAL) {
    pct = (positive + negative - FLAPPING_INTERVAL) / FLAPPING_INTERVAL;
    positive -= pct * positive;
    negative -= pct * negative;
  }
  weight = now - timeOfLastCheck;
  if (stateChange)
    positive += weight;
  else
    negative += weight;
}
IsFlapping() {
  return 100 * positive / (negative + positive);
}

The idea was to have the two flapping values (positive & negative) increase one or the other with every checkresult. Positive for state changes and negative for results which were not state changes, by the time since the last check result. The first problem which arises here, while in most cases the check interval is relatively stable, after a prolonged Icinga outage one of the values could be extremely inflated. Another problem is the degradation of the result, in my tests it took 17 consecutive stable results for a flapping value to calm down.
After some tweaking here and there, I decided it would be wisest to go with the old and proven style Icinga 1 was using. Save the last 20 checkresults, count the state changes and divide them by 20. I took inspiration in the way Shinken handles flapping and added weight to the sate changes, with the most recent one having a value of 1.2 and the 20th (oldest) one of 0.8. The issue of possibly wasting memory on saving the state changes could be resolved by using one integer as a bit array. This way we are actually using slightly less memory now \o/

The above example would then have a value of 39.1%, flapping in the case of default thresholds. More details on the usage and calculation of flapping in Icinga 2 can be found in the documentation once version 2.8.0 is released.

Diana Flach
Diana Flach
Developer

Geboren und aufgewachsen in Bamberg kam Diana, nach einem Ausflug an die Uni, als Azubi zu NETWAYS. Dort sitzt sie seit 2014 im Icinga 2 Core Entwicklungsteam.