From Monitoring to Observability: Distributed Tracing with Jaeger

From Monitoring to Observability: Distributed Tracing with Jaeger

Modern cloud environments and microservice architectures need a changed mindset when it comes to monitoring. Classic host/service object relations are not always applicable, containers run in Kubernetes are short lived, and application performance within distributed networks is hard to monitor. Especially with applications running millions of operations, where to start the root cause analysis for slow client responses in your web shop? Is it just slow connections to MySQL, or does the application’s debug log slow down the entire fleet?

This is where observability with tracing comes into play. In the cloud native space, OpenTracing evolved as vendor neutral standardized API including client instrumentation. Famous tools are Zipkin and Jaeger which was contributed from Uber to the CNCF.

Let’s have a look into Jaeger today.


Getting Started

The easiest way to try Jaeger is with using the Docker container explained in the documentation.

docker run -d --name jaeger \
  -p 5775:5775/udp \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 14268:14268 \
  -p 9411:9411 \

Navigate to http://localhost:16686 to get greeted by Jaeger.


Try it

A sample application is available as container. I’m using a modified port mapping with 8081-8084 here since port 8080 is already assigned.

docker run --rm -it \
  --link jaeger \
  -p8081-8084:8080-8083 \
  -e JAEGER_AGENT_HOST="jaeger" \
  jaegertracing/example-hotrod:1.16 \

Navigate to http://localhost:8081 and click the buttons to order some cars.

Within Jaeger itself, start analyzing the traces and for example learn that Redis runs into timeouts quite often delaying the HTTP responses to the clients.


Traces for applications

Jaeger provides officially supported client libraries for Go, Java, Python, NodeJS, C/C++, C#/.NET, others are in the making. Let’s see if we can add it into Icinga 2 and do some tracing here.

First off, clone the repository, build and install the client libraries. You’ll need CMake, a C++ compiler, etc. – basically everything which is required for Icinga 2 too and documented here. In this example, I’m compiling on my Macbook. There are additional libraries and headers required. Hint: Boost 1.72 has a bug which needs a patch.

brew install yaml-cpp thrift 
git clone && cd opentracing-cpp
# 1.6.0 doesn't work atm
git checkout v1.5.1
mkdir -p build && cd build
cmake ..
make && make install
cd ..

Then clone, cmake, make, install.

git clone && cd jaeger-client-cpp
git checkout v0.5.0
# regenerate thrift headers for 0.11.0
find idl/thrift/ -type f -name \*.thrift -exec thrift -gen cpp -out src/jaegertracing/thrift-gen {} \;
mkdir -p build && cd build
cmake ..
make && make install
cd ..

In order to add Jaeger into Icinga 2, there are the following steps necessary:

  • Add CMake functions to lookup yaml-cpp, opentracing, jaeger headers and libraries
  • Optionally enable Jaeger tracing code, link the icinga2 binary against it
  • Add a new tracer into the Config Compiler CLI command to measure the timing points

The majority of development time today was to resolve compile and linking issues, adding spans and traces is really simple. You can follow my progress in this Icinga PR – developers, get started wtih the client libraries and help your colleagues with enhanced observability!



Tracing application performance, cluster messages, end2end tests and any sort of event span provides valuable insights for both, devs and ops. With the new cloud native landscape evolving fast, we have additional possibilities to analyze our environments. Next to the now standardized tools for parsing and ingesting logs with Elastic Stack or Graylog, tracing has found its place in the observability stack.

Jaeger Tracing also is part of the GitLab observability suite which will be moved to the free core edition in 2020. Metrics, logging, alerts and tracing are key elements in modern cloud environments. Prometheus monitors everything from classic load checks to Kubernetes containers, Jaeger provides application performance insights and on top of that, Grafana combines the view on problems and trends. You can learn more about GitLab’s vision here.

Exploring these cool new features in GitLab are our mission in future trainings and workshops, watch this space in 2020!

DevOpsDays Ghent: Celebrating 10 years DevOps culture

DevOpsDays Ghent: Celebrating 10 years DevOps culture

Ten years ago the DevOps movement was started by Patrick Debois and Kris Buytaert in Ghent. Who would have guessed that it could become such an innovative movement and community. Today we’ve seen more than 80 DevOpsDays all over the world on each continent. All with the love and help from the core organizers team.

The NETWAYS family made their way to the 10th DevOpsdays anniversary in Ghent to participate and celebrate with the community. This time it also was really special with attending many DevOpsDays organizers from all over the world.

From a historical view, DevOpsDays started out small. In 2015 they’ve started to document how to organize and create communities and it all spread around the globe. More than 50 different countries and organizers met on the first day sharing their experiences and have a good time together. As with anything else in the Open Source world, good documentation is key for the first success to move on and spread the love.


DevOps is a culture, not a job title

Patrick shared his journey with making new friends in a new company after leaving the DevOpsDays organizer team five years ago. He’s identified the bottlenecks and silos in every department, and they all solved it with learning from each other. DevOps also is about valueing others work and understand their feelings and emotions.

After 10 years with many DevOpsDays held all over the world, DevOps as a term still is used wrong and needs improvements. It isn’t just creating a new team called “DevOps Engineers” now replacing the ops team. Neither is it about putting devs on call letting them eat their own dog food. Anyone trying to sell you the perfect DevOps world from a marketing slide with certifications and job titles is just plain wrong. There are many great tools in the wild which can help with bringing the DevOps mindset into the enterprise environment. Not every company is able to immediately dive into the culture, sometimes it takes months or even years to encourage for a change.

DevOpsDays is about sharing these thoughts and emotions, care about diversity and tell you not the ordinary tech story but something to think about. Raise awareness that DevOps is about culture and finding the harmony in your daily workflows. Achieve goals and visions in a shorter amount of time, combine tools and be a role model with sharing your expertise. The DevOps movement is not only a place to sharing experiences with tools and best practices but also talking about work ethics, communication styles and soft skills. With overemphasizing on the technical toolchain, sociotechnity seems to balance this as well in many environments. Emotions are a thing, software is not only code.

From the full-stack engineers not writing device drivers to the most important message: Our definition of the “full stack” only covers what we understand, and not what’s actually required to run the application. Stop pretending that things are easy, being on call 24/7 doesn’t burn out and new products will solve old problems. Care about high performing teams with the need of psychological safety. Now wait for the recordings, these talks are really interesting to learn from.


Wait for it

Slides changing every 15 seconds, 5 minutes time to pitch your story. I like them a lot being entertaining and sometimes you just feel the speaker’s pain with “waiting for slide”. The tradition says that this actually was a malfunction of Kris’ Linux notebook 😉

That way we’ve heard stories about hot takes, myths and false hoods about DevOps and the danger of DevOps certifications. Before Jason could start his ignite, Kris jumped in announcing that ConfigMgmtCamp registrations are now open. In return, he announced DeliveryConf and jumped right into the meta story being ignites. Watch the recordings when available, it literally made my day. Since announcing confs in ignites was now a thing, Blerim did so too with IcingaConf next year before diving into “Why monitoring is NOT killing observability”. Monversability combing both is a good idea with moving from traditional blackbox monitoring to applications providing metrics and insights. On the other hand, watching graphs all days still requires business process dashboards to immediately visualize failure with alerts. Last but not least, learning about Kubernetes in 5 minutes really nailed it.


Move on

We really enjoyed meeting friends old and new and aside from the talks, exploring beautiful Ghent with Belgium waffles and beer. One thing to note – the half marathon from the ground up to the ball room with many stairs was hard in the beginning. On the other hand, our fitness trackers were very happy 😉

Thanks to the organizers and sponsors for the great event – onwards to the next decade!

PS: DevOpsDays Berlin is happening soon. And many more great events near your city. If you don’t have one, kindly contact the core organizers, they are all in with helping to kick-off an amazing event and culture.


GitLab CI Runners with Auto-scaling on OpenStack


With migrating our CI/CD pipelines from Jenkins to GitLab CI in the past months, we’ve also looked into possible performance enhancements for binary package builds. GitLab and its CI functionality is really really great in this regard, and many things hide under the hood. Did you know that “Auto DevOps” is just an example template for your CI/CD pipeline running in the cloud or your own Kubernetes cluster? But there’s more, the GitLab CI runners can run jobs in different environments with using different hypervisors and the power of docker-machine.

One of them is OpenStack available at NWS and ready to use. The following examples are from the Icinga production environment and help us on a daily basis to build, test and release Icinga products.



Install the GitLab Runner on the GitLab instance or in a dedicated VM. Follow along in the docs where this is explained in detail. Install the docker-machine binary and inspect its option for creating a new machine.

curl -L | sudo bash
apt-get install -y gitlab-runner
curl -L`uname -s`-`uname -m` -o /usr/local/bin/docker-machine
chmod +x /usr/local/bin/docker-machine
docker-machine create --driver openstack --help

Next, register the GitLab CI initially. Note: This is just to ensure that the runner is up and running in the GitLab admin interface. You’ll need to modify the configuration in a bit.

gitlab-runner register \
  --non-interactive \
  --url \
  --tag-list docker \
  --registration-token SUPERSECRETKEKSI \
  --name "docker-machine on OpenStack" \
  --executor docker+machine \
  --docker-image alpine


Docker Machine with OpenStack Deployment

Edit “/etc/gitlab-runner/config.toml” and add/modify the “[[runners]]” section entry for OpenStack and Docker Machine. Ensure that the MachineDriver, MachineName and MachineOptions match the requirements. Within “MachineOptions”, add the credentials, flavors, network settings just as with other deployment providers. All available options are explained in the documentation.

vim /etc/gitlab-runner/config.toml

    IdleCount = 4
    IdleTime = 3600
    MaxBuilds = 100
    MachineDriver = "openstack"
    MachineName = "customer-%s"
    MachineOptions = [
      "openstack-image-name=Debian 10.1",
      "openstack-keypair-name=GitLab Runner"

The runners cache can be put onto S3 granted that you have this service available. NWS luckily provides S3 compatible object storage.

    Type = "s3"
    Shared = true
      ServerAddress = "s3provider.domain.localdomain"
      AccessKey = "supersecretaccesskey"
      SecretKey = "supersecretsecretkey"
      BucketName = "openstack-gitlab-runner"

Bootstrap Docker in the OpenStack VM

Last but not least, these VMs need to be bootstrapped with Docker inside a small script. Check the “–engine-install-url” parameter in the help output:

root@icinga-gitlab:/etc/gitlab-runner# docker-machine create --help
  --engine-install-url ""							Custom URL to use for engine installation 

You can use the official way of doing this, but putting this into a small script also allows customizations like QEMU used for Raspbian builds. Ensure that the script is available via HTTP e.g. from a dedicated GitLab repository 😉

# This script helps us to prepare a Docker host for the build system
# It is used with Docker Machine to install Docker, plus addons
# See --engine-install-url at docker-machine create --help

set -e

run() {
  (set -x; "$@")

echo "Installing Docker via"
run curl -LsS -o /tmp/
run sh /tmp/

echo "Installing QEMU and helpers"
run sudo apt-get update
run sudo apt-get install -y qemu-user-static binfmt-support

Once everything is up and running, the GitLab runners are ready to fire the jobs.



Jobs and builds are not run all the time, and especially with cloud resources, this should be a cost-efficient thing. When building Icinga 2 for example, the 20+ different distribution jobs generate a usage peak. With the same resources assigned all the time, this would tremendously slow down the build and release times. In that case, it is desirable to automatically spin up more VMs with Docker and let the GitLab runner take care of distributing the jobs. On the other hand, auto-scaling should also shut down resources in idle times.

By default, one has 4 VMs assigned to the GitLab runner. These builds run non-privileged in Docker, the example below also shows another runner which can run privileged builds. This is needed for Docker-in-Docker to create Docker images and push them to GitLab’s container registry.

root@icinga-gitlab:~# docker-machine ls
NAME                                               ACTIVE   DRIVER      STATE     URL                      SWARM   DOCKER     ERRORS
runner-privileged-icinga-1571900582-bed0b282       -        openstack   Running   tcp://           v19.03.4
runner-privileged-icinga-1571903235-379e0601       -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904408-5bb761b5   -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904408-52b9bcc4   -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904408-97bf8992   -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904408-97bf8992   -        openstack   Running   tcp://           v19.03.4

Once it detects a peak in the pending job pipeline, the runner is allowed to start additional VMs in OpenStack.

root@icinga-gitlab:~# docker-machine ls
NAME                                               ACTIVE   DRIVER      STATE     URL                      SWARM   DOCKER     ERRORS
runner-privileged-icinga-1571900582-bed0b282       -        openstack   Running   tcp://           v19.03.4
runner-privileged-icinga-1571903235-379e0601       -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904408-5bb761b5   -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904408-52b9bcc4   -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904408-97bf8992   -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904408-97bf8992   -        openstack   Running   tcp://           v19.03.4


runner-non-privileged-icinga-1571904534-0661c396   -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904543-6e9622fd   -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904549-c456e119   -        openstack   Running   tcp://           v19.03.4
runner-non-privileged-icinga-1571904750-8f6b08c8   -        openstack   Running   tcp://           v19.03.4


In order to achieve this setting, modify the runner configuration and increase the limit.

vim /etc/gitlab-runner/config.toml

  name = "docker-machine on OpenStack"
  limit = 24
  output_limit = 20480
  url = ""
  token = "supersecrettoken"
  executor = "docker+machine"

This would result in 24 OpenStack VMs after a while, and all are idle 24/7. In order to automatically decrease the deployed VMs, use the OffPeak settings. This also ensures that resources are available during workhours while spare time and weekend are considered “off peak” with shutting down unneeded resources automatically.

    OffPeakTimezone = "Europe/Berlin"
    OffPeakIdleCount = 2
    OffPeakIdleTime = 1800
    OffPeakPeriods = [
      "* * 0-8,22-23 * * mon-fri *",
      "* * * * * sat,sun *"

Pretty neat functionality 🙂


Troubleshooting & Monitoring

“docker-machine ls” provides the full overview and tells whenever e.g. a connection to OpenStack did not work, or if the VM is currently unavailable.

root@icinga-gitlab:~# docker-machine ls
NAME                                               ACTIVE   DRIVER      STATE     URL                      SWARM   DOCKER     ERRORS
runner-privileged-icinga-1571900582-bed0b282       -        openstack   Error                                      Unknown    Expected HTTP response code [200 203] when accessing [GET], but got 404 instead

In case you have deleted the running VMs to start fresh, provisioning might take a while and the above can be a false positive. Check the OpenStack management interface to see whether the VMs booted correctly. You can also remove a VM with “docker-machine rm <id>” and run “gitlab-runner restart” to automatically provision it again.

Whenever the VM provisioning fails, a gentle look into the syslog (or runner log) unveils what’s the problem. Lately we had used a wrong OpenStack flavor configuration which was fixed after investigating in the logs.

Oct 18 07:08:48 3 icinga-gitlab gitlab-runner[30988]:  #033[31;1mERROR: Error creating machine: Error in driver during machine creation: Unable to find flavor named 1234-customer-id-4-8#033[0;m  #033[31;1mdriver#033[0;m=openstack #033[31;1mname#033[0;m=runner-non-privilegued-icinga-1571375325-3f8176c3 #033[31;1moperation#033[0;m=create

Monitoring your GitLab CI runners is key, and with the help of the REST API, this becomes a breeze with Icinga checks. You can inspect the runner state and notify everyone on-call whenever CI pipelines are stuck.



Developers depend on fast CI feedback these days, speeding up their workflow – make them move fast again. Admins need to understand their requirements, and everyone needs a deep-dive into GitLab and its possibilities. Join our training sessions for more practical exercises or immediately start playing in NWS!

GitLab Commit London Recap

GitLab Commit London Recap

A while ago, when GitHub announced CI/CD for the upcoming actions feature, I’ve been sharing with Priyanka on Twitter how we use GitLab. The GitLab stack with the runners, Docker registry and all-in-one interface not only speeds up our development process and packaging pipelines, it also scales our infrastructure deployments even better. With the love for GitLab, we’ve also created our GitLab training sharing all the knowledge about this great tool stack.


GitLab, GitLab, GitLab

Priyanka was so kind to invite us over to GitLab Commit in London, GitLab’s first European user conference. At first glance, Bernd and I didn’t know what would happen – turns out, this wasn’t a product conference. Instead, meeting new people and learning how they use GitLab in their environments was put into focus. Sid Sijbrandij, GitLab’s CEO, kicked off the event in the morning and after sharing the roots of GitLab being in Europe, he asked the audience to “meet your neighbour and connect”. Really an icebreaker for the coming talks and sessions.

The next keynote was presented by engineers from Porsche sharing their move to GitLab. With Java Boot Spring and iOS application development, and the requirement of deeper collaboration between teams, they took the challenge and are using GitLab since ~1 year. Interesting to learn was that nearly everyone at GitLab Commit uses Terraform for deployments. Matt did so too in his live coding session with a full blown web application Kubernetes container setup all managed and deployed with GitLab and Terraform. After 20 minutes of talking way too fast, it worked. What a great way of showing what’s possible with today’s tool stack!


DevOps for everyone

One thing I’ve also recognized – everyone seems to be moving to Kubernetes and Terraform. Rancher, Jenkins and other tools in the same ecosystem seem to be falling short in modern DevOps environments. I really liked the security panel where ideas like automated dependency scanning in merge requests have been shared. Modern days with easy to use libraries typically pull in lots of unforeseen dependencies and who really knows about all the vulnerabilities? Blocking the merge request in case of emergency is a killer feature for current development workflows.

In terms of the product roadmap, GitLab has a huge vision which is not easy to summarize. On the other hand, having a maybe-not-reachable vision empowers a great team to work even harder. The short term improvements for CI are for example Directed Acyclic Graphs allowing parallel pipelines to continue faster. This will greatly enhance our package pipelines in the future. While tweeting about this, Jason was so kind to share the build matrix feature known from Travis coming soon with GitLab 12.6. Spot on, testing e.g. different PHP versions for the same job is greatly missed being as easy as Travis. GitLab Runners will receive support for ARM soon, and also Vault integration is coming. GitLab also announced their startup Meltano, an open source data to dashboard workflow platform – looks really promising.

The afternoon sessions were split into 3 tracks each, with even more user stories. Moving along from Delta with their many of thousands of repositories, we’ve also learned more about VMware’s cloud architects and how they incorporate GitLab & Terraform for deployments. Last but not least we’ve joined Philipp sharing his story on migrating from Jenkins to GitLab CI. Since we struggled from the same problems (XML config, plugins breaking upgrades, etc.) we were enlightened to see that he even developed a GitHub to GitLab issue migrator, fully open source. Moving to a central platform and away from 5+ browser tabs really is a key argument in stressful (development) times. Avoiding context switches for developers improves quality and ensures better releases from my experience.


Get the party started

The evening event took place at Swingers, a bar which had a Mini Golf playground built-in. We were going there with the iconic London Bus, and the nice people from GitLab even ensured that Gin&Tonic made this a great starter. Right after arriving, the fire alarm rang off and we had to move outside. Party like NETWAYS 😉 And finally we met Priyanka to say hi, lovely memory. We also met Brian, proud Irish, challenging us with funny stories and finding out that Germans do not know everything. Really charming and much to laugh.

Thanks GitLab for this top notch event and see you next year!

DEV stories: Icinga Core trainees in the making

DEV stories: Icinga Core trainees in the making

When my dev leads approached me with the idea to guide a trainee in the Icinga core topic, I was like … wow, sounds interesting and finally a chance to share my knowledge.

But where should I start and how can it be organized with my ongoing projects?


Prepare for the unexpected

My view on the code and how things are organized changed quite a bit since then. You cannot expect things, nor should you throw everything you know into the pool. While working on Icinga 2.11, I’ve collected ideas and issues for moving Henrik into these topics. In addition to that, we’ve improved on ticket and documentation quality, including technical concepts and much more.

My colleagues know me as the “Where is the test protocol?” PR reviewer. Also, reliable configuration and steps on reproducing the issues and problems are highly encouraged. Why? The past has proven that little to zero content in ticket makes debugging and problem analysis really hard. Knowledge transfer is an investment in the future of both NETWAYS and Icinga.

Some say, that documentation would replace their job. My mission is to document everything for my colleagues to make their life easier. Later on, they contribute to fixing bugs and implement new features while I’m moving into project management, architecture and future trainees. They learn what I know, especially “fresh” trainees can be challenged to learn new things and don’t necessarily need to change habits.


Clear instructions?

At the beginning, yes. Henrik started with the C++ basics, a really old book from my studies in 2002. C++11 is a thing here, still, the real “old fashioned” basics with short examples and feedback workshops prove the rule. Later on, we went for an online course and our own requirements.

Since Icinga 2 is a complex tool with an even more complex source code, I decided to not immediately throw Henrik into it. Instead, we had the chance to work with the Tinkerforge weather station. This follows the evaluation from our Startupdays and new product inside the NETWAYS shop. The instructions were simple, but not so detailed:

  • Put the components together and learn about the main functionality. This is where the “learn by playing” feeling helps a lot.
  • Explore the online documentation and learn how to use the API bindings to program the Tinkerforge bricks and sensors.
  • Use the existing check_tinkerforge plugin written in Python to see how it works
  • Write C++ code which talks to the API and fetches sensor data

Documentation, a blog post and keeping sales updated in an RT ticket was also part of the project. Having learned about the requirements, totally new environment and communication with multiple teams, this paves the way for future development projects.



In order to debug and analyse problems or implement new features, we need to first understand the overall functionality. Starting a new project allows for own code, experiences, feedback, refactoring and what not. Icinga 2 as core has grown since early 2012, so it is key to understand the components and how everything is put together.

Where to start? Yep, visit the official Icinga trainings for a sound base. Then start with some Icinga cluster scenarios, with just pointing to the docs. This takes a while to understand, so Henrik was granted two weeks to fully install, test and prepare his findings.

With the freedom provided, and the lessons learned about documentation and feedback, I was surprised with a Powerpoint presentation on the Icinga cluster exercises. Essentially we discussed everything in the main area in our new NETWAYS office. A big flat screen and the chance that colleagues stop by and listen or even add to the discussion. Henrik was so inspired to write a blogpost on TLS.


Focus on knowledge

In the latest session, I decided to prove things and did throw a lot of Icinga DSL exercises at Henrik, also with the main question – what’s a DSL anyways?

Many things in the Icinga DSL are hidden gems, with the base parts documented, but missing the bits on how to build them together. From my experience, you cannot explain them in one shot, specific user and customer questions or debugging sessions enforce you to put them together. At the point when lambda functions with callbacks were on the horizon, a 5 hour drive through the DSL ended. Can you explain the following snippet? 😘

object HostGroup "hg1" { assign where host.check_command == "dummy" }
object HostGroup "hg2" {
  assign where true
//  assign where in Cities

object Host "runtime" {
  check_command = "dummy"
  check_interval = 5s
  retry_interval = 5s

  vars.dummy_text = {{
    var mygroup = "hg2"
    var mylog = "henrik"
  //  var nodes = get_objects(Host).filter(node => mygroup in node.groups)

    f = function (node) use(mygroup, mylog) {
      log(LogCritical, "Filterfunc", )
      return mygroup in node.groups
    var nodes = get_objects(Host).filter(f)

    var nodenames = =>
    return nodenames.join(",")
   // return Json.encode( =>


We also did some live coding in the DSL, this is now a new howto on the Icinga community channels: “DSL: Count check plugin usage from service checks“. Maybe we’ll offer an Icinga DSL workshop in the future. This is where I want our trainees become an active part, since it also involves programming knowledge and building the Icinga architecture.



Henrik’s first PR was an isolated request by myself, with executing a check in-memory instead of forking a plugin process. We had drawn lots of pictures already how check execution generally works, including the macro resolver. The first PR approved and merged. What a feeling.

We didn’t stop there – our NETWAYS trainees are working together with creating PRs all over the Icinga project. Henrik had the chance to review a PR from Alex, and also merge it. Slowly granting responsibility and trust is key.

Thanks to trainees asking about this, Icinga 2 now also got a style guide. This includes modern programming techniques such as “auto”, lambda functions and function doc headers shown below.

 * Main interface for notification type to string representation.
 * @param type Notification type enum (int)
 * @return Type as string. Returns empty if not found.
String Notification::NotificationTypeToString(NotificationType type)
	auto typeMap = Notification::m_TypeFilterMap;

	auto it = std::find_if(typeMap.begin(), typeMap.end(),
		[&type](const std::pair<String, int>& p) {
			return p.second == type;

	if (it == typeMap.end())
		return Empty;

	return it->first;



Learn and improve

There are many more things in Icinga: The config compiler itself with AST expressions, the newly written network stack including the REST API parts, feature integration with Graphite or Elastic and even more. We’ll cover these topics with future exercises and workshops.

While Henrik is in school, I’m working on Icinga 2.11 with our core team. Thus far, the new release offers improved docs for future trainees and developers:

This also includes evaluating new technologies, writing unit tests and planning code rewrites and/or improvements. Here’s some ideas for future pair programming sessions:

  • Boost.DateTime instead of using C-ish APIs for date and time manipulation. This blocks other ideas with timezones for TimePeriods, etc.
  • DSL methods to print values and retrieve external data
  • Metric enhancements and status endpoints


Trainees rock your world

Treat them as colleagues, listen to their questions and see them “grow up”. I admit it, I am sometimes really tired in the evening after talking all day long. On the other day, it makes me smile to see a ready-to-merge pull request or a presentation with own ideas inspired by an old senior dev. This makes me a better person, every day.

I’m looking forward to September with our two new DEV trainees joining our adventure. We are always searching for passionate developers, so why not immediately dive into the above with us? 🙂 Promise, it will be fun with #lifeatnetways and #drageekeksi ❤️

OSDC 2019: Buzzwo…erm…DevOps, Agile & YAML programmers

OSDC 2019: Buzzwo…erm…DevOps, Agile & YAML programmers

Cheers from Berlin, Moabit, 11th round for OSDC keeping you in the loop with everything with and around DevOps, Kubernetes, log event management, config management, … and obviously magnificent food and enjoyable get-together.


Goooood mooooorning, Berlin!

DevOps neither is the question, nor the answer … Arnold Bechtoldt from inovex kicked off OSDC with a provocative talk title. After diving through several problems and scenarios in common environments, we learned to fail often and fail hard, and improve upon. DevOps really is a culture, and not a job title. Also funny – the CV driven development, or when you propose a tool to prepare for the next job 🙂 One key thing I’ve learned – everyone gets the SAME permissions, which is kind of hard with the learned admin philosophy. Well, and obviously we are YAML programmers now … wait … oh, that’s truly inspired by Mr. Buytaert, isn’t it? 😉

Next up, Nicolas Frankel took us on a journey into logs and scaling at Exoscale. Being not the only developer in the room, he showed off that debug logging with computed results actually eats a lot of resources. Passing the handles/pointers to lazy log function is key here, that reminds me of rewriting the logging backend for Icinga 2 😉 Digging deeper, he showed a UML diagram with the log flow – filebeat collects logs, logstash parses the logs into JSON and Elasticsearch stores that. If you want to go fast, you don’t care about the schema and let ES do the work. Running a query then will be slow, not really matching the best index then – lesson learned. To conclude with, we’ve learned that filebeat actually can parse the log events into JSON already, so if you don’t need advanced filtering, remove Logstash from your log event stream for better performance.

Right before the magnificent lunch, Dan Barker being a chief architect at RSA Security for the Archer platform shared stories from normal production environments to actually following the DevOps spirit. Or, to avoid these hard buzzwords, just like “agile”, and to quote “A former colleague told me: ‘I’ve now understood agile – it’s like waterfall but with shorter steps.'”. He’s also told about important things – you’re not alone, praise your team members publicly.


Something new at OSDC: Ignites

Ignite time after lunch – Werner Fischer challenged himself with a few seconds per slide explaining microcode debugging to the audience, while Time Meusel shared their awesome work within the Puppet community with logs of automation involved (modulesync, etc) at Voxpupuli. Dan Barker really talked fast about monitoring best practices, whereas one shouldn’t put metrics into log aggregation tools and use real business metrics.


The new hot shit

Demo time – James “purpleidea” Shubin showed the latest developments on mgmt configuration, including the DSL similar to Puppet. Seeing the realtime changes and detecting combined with dynamic processing of e.g. setting the CPU counts really looks promising. Also the sound exaggeration tests with the audience where just awesome. James not only needs hackers, docs writers, testers, but also sponsors for more awesome resource types and data collectors (similar to Puppet facts).

Our Achim “AL” Ledermüller shared the war stories on our storage system, ranging from commercial netApp to GlusterFS (“no one uses that in production”) up until the final destination with Ceph. Addictive story with Tim mimicking the customer asking why the clusterfuck happened again 😉

Kedar Bidarkar from Red Hat told us more about KubeVirt which extends the custom resource definitions available from k8s with the VM type. There are several components involved: operator, api, handler, launcher in order to actually run a virtual machine. If I understand that correctly, this combines Kubernetes and Libvirt to launch real VMs instead of containers – sounds interesting and complicated in the same sentence.

Kubernetes operators the easy way – Matt Jarvis from Mesosphere introduced Kudo today. Creating native Kubernetes operators can become really complex, as you need to know a lot about the internals of k8s. Kudo aims to simplify creating such operators with a universal declarative operator configured via YAML.


Oh, they have food too!

The many coffee breaks with delicious Käsekuchen (or: Kaiser Torte ;)) also invite to visit our sponsor booths too. Keep an eye on the peeps from Thomas-Krenn AG, they have #drageekeksi from Austria with them. We’re now off for the evening event at the Spree river, chatting about the things learnt thus far with a G&T or a beer 🙂

PS: Follow the #osdc stream and NetwaysEvents on Twitter for more, and join us next year!

Modern C++ programming: Coroutines with Boost


We’re rewriting our network stack in Icinga 2.11 in order to to eliminate bugs with timeouts, connection problems, improve the overall performance. Last but not least, we want to use modern library code instead of many thousands of lines of custom written code. More details can be found in this GitHub issue.

From a developer’s point of view, we’ve evaluated different libraries and frameworks before deciding on a possible solution. Alex created several PoCs and already did a deep-dive into several Boost libraries and modern application programming. This really is a challenge for me, keeping up with the new standards and possibilities. Always learning, always improving, so I had a read on the weekend in “Boost C++ Application Development Cookbook – Second Edition“.

One of things which are quite resource consuming in Icinga 2 Core is multi threading with locks, waits and context switching. The more threads you spawn and manage, the more work needs to be done in the Kernel, especially on (embedded) hardware with a single CPU core. Jean already shared insights how Go solves this with Goroutines, now I am looking into Coroutines in C++.


Coroutine – what’s that?

Typically, a function in a thread runs, waits for locks, and later returns, freeing the locked resource. What if such a function could be suspended at certain points, and continue once there’s resources available again? The benefit would also be that wait times for locks are reduced.

Boost Coroutine as library provides this functionality. Whenever a function is suspended, its frame is put onto the stack. At a later point, it is then resumed. In the background, the Kernel is not needed for context switching as only stack pointers are stored. This is done with Boost’s Context library which uses hardware registers, and is not portable. Some architectures don’t support it yet (like Sparc).

Boost.Context is a foundational library that provides a sort of cooperative multitasking on a single thread. By providing an abstraction of the current execution state in the current thread, including the stack (with local variables) and stack pointer, all registers and CPU flags, and the instruction pointer, a execution context represents a specific point in the application’s execution path. This is useful for building higher-level abstractions, like coroutinescooperative threads (userland threads) or an equivalent to C# keyword yield in C++.

callcc()/continuation provides the means to suspend the current execution path and to transfer execution control, thereby permitting another context to run on the current thread. This state full transfer mechanism enables a context to suspend execution from within nested functions and, later, to resume from where it was suspended. While the execution path represented by a continuation only runs on a single thread, it can be migrated to another thread at any given time.

context switch between threads requires system calls (involving the OS kernel), which can cost more than thousand CPU cycles on x86 CPUs. By contrast, transferring control vias callcc()/continuation requires only few CPU cycles because it does not involve system calls as it is done within a single thread.

TL;DR – in the way we write our code, we can suspend function calls and free resources for other functions requiring it, without typical thread context switches enforced by the Kernel. A more deep-dive into Coroutines, await and concurrency can be found in this presentation and this blog post.


A simple Example

$ vim coroutine.cpp

#include <boost/coroutine/all.hpp>
#include <iostream>

using namespace boost::coroutines;

void coro(coroutine::push_type &yield)
        std::cout << "[coro]: Helloooooooooo" << std::endl;
        /* Suspend here, wait for resume. */
        std::cout << "[coro]: Just awesome, this coroutine " << std::endl;

int main()
        coroutine::pull_type resume{coro};
        /* coro is called once, and returns here. */

        std::cout << "[main]: ....... " << std::endl; //flush here

        /* Now resume the coro. */

        std::cout << "[main]: here at NETWAYS! :)" << std::endl;


Build it

On macOS, you can install Boost like this, Linux and Windows require some more effort listed in the Icinga development docs). You’ll also need CMake and g++/clang as build tool.

brew install ccache boost cmake 

Add the following CMakeLists.txt file into the same directory:

$ vim CMakeLists.txt

cmake_minimum_required(VERSION 2.8.8)
set(BOOST_MIN_VERSION "1.66.0")

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++0x")

find_package(Boost ${BOOST_MIN_VERSION} COMPONENTS context coroutine date_time thread system program_options regex REQUIRED)

# Boost.Coroutine2 (the successor of Boost.Coroutine)
# (1) doesn't even exist in old Boost versions and
# (2) isn't supported by ASIO, yet.


set(base_DEPS ${CMAKE_DL_LIBS} ${Boost_LIBRARIES})



target_link_libraries(coroutine ${base_DEPS})

        coroutine PROPERTIES
        FOLDER Bin
        OUTPUT_NAME boost-coroutine

Next, run CMake to check for the requirements and invoke make to build the project afters.

cmake .


Run and understand the program

$ ./boost-coroutine
[coro]: Helloooooooooo
[main]: .......
[coro]: Just awesome, this coroutine
[main]: here at NETWAYS! :)

Now, what exactly happened here? The Boost coroutine library allows us to specify the “push_type” where this functions should be suspended, after reaching this point, a subsequent call to “yield()” is required to resume this function.

void coro(coroutine::push_type &yield)

Up until “yield()”, the function logs the first line to stdout.

The first call happens inside the “main()” function, by specifying the pull_type and directly calling the function as coroutine. The pull_type called “resume()” (free form naming!) must then be explicitly invoked in order to resume the coroutine.

coroutine::pull_type resume{coro};

After the first line is logged from the coroutine, it stops before “yield()”. The main function logs the second line.

[coro]: Helloooooooooo
[main]: .......

Now comes the fun part – let’s resume the coroutine. It doesn’t start again, but the function’s progress is stored as stack pointer, targeting “yield()”. Exactly this resume function is called with “resume()”.

        /* Now resume the coro. */

That being said, there’s more to log inside the coroutine.

[coro]: Just awesome, this coroutine

After that, it reaches the end and returns to the main function. That one logs the last line and terminates.

[main]: here at NETWAYS! :)

Without a coroutine, such synchronisation between functions and threads would need waits, condition variables and lock guards.


Icinga and Coroutines

With Boost ASIO, the spawn() method wraps coroutines on a higher level and hides the strand required. This is used in the current code and binds a function into its scope. We’re using lambda functions available with C++11 in most locations.

The following example implements the server side of our API waiting for new connections. An endless loop listens for incoming connections with “server->async_accept()”.

Then comes the tricky part:

  • 2.9 and before spawned a thread for each connection. Lots of threads, context switches and memory leaks with stalled connections.
  • 2.10 implemented a thread pool, managing the resources. Handling the client including asynchronous TLS handshakes are slower, and still many context switches ahead between multiple connections until everything stalls.
  • 2.11 spawns a coroutine which handles the client connection. The yield_context is required to suspend/resume the function inside.


void ApiListener::ListenerCoroutineProc(boost::asio::yield_context yc, const std::shared_ptr& server, const std::shared_ptr& sslContext)
	namespace asio = boost::asio;

	auto& io (server->get_io_service());

	for (;;) {
		try {
			auto sslConn (std::make_shared(io, *sslContext));

			server->async_accept(sslConn->lowest_layer(), yc);

			asio::spawn(io, [this, sslConn](asio::yield_context yc) { NewClientHandler(yc, sslConn, String(), RoleServer); });
		} catch (const std::exception& ex) {
			Log(LogCritical, "ApiListener")
				<< "Cannot accept new connection: " << DiagnosticInformation(ex, false);

The client handling is done in “NewClientHandlerInternal()” which follows this flow:

  • Asynchronous TLS handshake using the yield context (Boost does context switches for us), Boost ASIO internally suspends functions.
    • TLS Shutdown if needed (again, yield_context handled by Boost ASIO)
  • JSON-RPC client
    • Send hello message (and use the context for coroutines)

And again, this is the IOBoundWork done here. For the more CPU hungry tasks, we’re using a different CpuBoundWork pool which again spawns another coroutine. For JSON-RPC clients this mainly affects syncing runtime objects, config files and the replay logs.

Generally speaking, we’ve replaced our custom thread pool and message queues for IO handling with the power of Boost ASIO, Coroutines and Context thus far.


What’s next?

After finalizing the implementation, testing and benchmarks are on the schedule – snapshot packages are already available for you at Coroutines will certainly help embedded devices with low CPU power to run even faster with many network connections.

Boost ASIO is not yet compatible with Coroutine2. Once it is, the next shift in modernizing our code is planned. Up until then, there are more Boost features available with the move from 1.53 to 1.66. Our developers are hard at work with implementing bug fixes, features and learning all the good things.

There’s many cool things under the hood with Icinga 2 Core. If you want to learn more and become a future maintainer, join our adventure! 🙂

Ich habe einen iX-Artikel für Dich: GitLab, GitLab, GitLab

Servus zusammen,

Dinge die man selbst gelernt hat, anderen Leuten beizubringen und helfend beiseite zu stehen, ist ein echt gutes Gefühl. Bei mir zieht sich das seit vielen Jahren durch die Icinga Community, einer der schönsten weil überraschensten Momente war wohl das Foto als “Danke” auf dem Icinga Camp Berlin 2019. Dann kommt noch dazu, dass ich sehr gerne Dokumentation schreibe, oder einfach alles aufschreibe, was ich irgendwann mal brauchen könnte. Und vielleicht jemand anders, der mal meinen Job macht, und ich mich neuen Aufgaben widmen kann. Nach den ersten Gehversuchen mit der Icinga-Schulung (2.x natürlich ;)) haben das nunmehr meine Kollegen übernommen, und meistern die Wissensvermittlung mit Bravour. Wir Entwickler sorgen dann in unseren Releases dafür, dass auch ihnen nicht langweilig wird 🙂

Ich für mich habe aber auch festgestellt, dass man nicht nur “das eine” machen soll und auch kann, sondern immer “über den Tellerrand” schauen sollte. Und so kams, dass ich auf meiner ersten OSDC 2013 keinen Dunst von Puppet, Elastic, Graphite, Container-Plattformen oder CI/CD hatte. Auch die Jahre danach waren hart, und meine Kollegen durften mir viel erklären, etwa Ceph und OpenStack. Jetzt nach vielen Jahren hilft mir dieses Wissen in meiner tagtäglichen Arbeit, und auf eine gewisse Art und Weise bin ich stolz, wenn mich meine Kollegen und Freunde nach Themen fragen, die nicht unmittelbar mit Icinga zu tun haben.

Dann gibts da noch Git, die schwarze Magie der Entwickler. 2004 in Hagenberg hab ich meinen VHDL-Code noch in CVS eingecheckt, 2009 .at-DNS-Zonen-Files nach SVN geschoben und irgendwann dank Icinga auch Git gesehen. Um gleich mal mit “force push” den Master zu zerstören – aller Anfang ist schwer. Seitdem ist viel passiert, und irgendwie hat jeder einen Git-Kniff, der gerne ausgetauscht wird. Die Nachfrage nach einer Schulung, seitens DEV (Kurzform für unsere Development-Abteilung), wurde immer größer und so wurde vor knapp 2,5 Jahren die Git-Schulung aus dem Boden gestampft.

Seither hat sich einiges getan, und wir haben unsere Open-Source-Entwicklung vollständig auf GitHub migriert, sowohl Icinga als auch NETWAYS. Aus dem vormaligen self-hosted Gitorious wurde dann mal ein GitLab, und mit jedem Release kam etwas neues dazu. GitLab verwenden wir an vielen Stellen – intern fürs Infrastrukturmanagement, betreut von MyEngineer im Hosting, als App in NWS und natürlich für Kunden und interne Projekte auf und Die Möglichkeiten, die einem CI mit den Runnern bietet, sowie den Merge-Request-Workflow haben wir seitdem bei uns stetig etabliert und ausgebaut.

All diese Erfahrungen aus der Praxis, und die tagtägliche Arbeit lassen wir in die neu gestaltete GitLab-Schulung einfliessen. Im Vortrag von Nicole und Gabriel auf der OSDC 2018 habe ich dann auch endlich mal Auto-DevOps verstanden und die Web IDE besser kennen gelernt. All das und noch viel mehr erzähle ich Schulungsteilnehmern im Kesselhaus und freu mich über die gemeinsamen Lernerfolge.

© 2019 Heise Medien GmbH & Co. KG

Doch damit hats nicht aufgehört – nachdem ich letztes Jahr für die IX einen Artikel zu IoT-Monitoring rund um Icinga, Elastic, Graylog und MQTT schreiben durfte, hab ich auch GitLab mit Golang in den Raum geworfen. Es ist ein bisserl Zeit ins Land gegangen, und ich hab dank IcingaDB auch mehr Golang gelernt. Im neuen Jahr hab ich eine GitLab-Schulung gehalten, und mich am Wochenende drauf hingesetzt und für die aktuelle iX 04/19 einen Artikel über GitLab und CI/CD geschrieben. Und auch vorab die GitHub Actions evaluiert, wo ich netterweise einen Invite habe 🙂

Wer mich kennt, weiss, dass ich endlos schreiben und reden kann über Dinge, die mir Spass machen. So empfehle ich Dir zur Lektüre auch einige Kaffee-Tassen (und falls vorhanden: Dragee-Keksi). Soferns dann noch offene Fragen gibt, komm einfach auf uns zu – egal ob Workshops, Schulungen oder Consulting, wir kriegen das hin, dass Dein GitLab genauso schnurrt wie unseres 🙂

Bevor ich es vergesse, auf der OSMC 2019 mach ich einen GitLab-Workshop rund um DevOps-Workflows und CI. Die Zeit vergeht eh so schnell – gleich anmelden 😉

Wir lesen uns – Icinga 2.11 wartet und nächste Woche ist Henrik aus der Schule wieder da. “Mein” Azubi der in die Welt von Icinga Core eintauchen darf, ich werd alt ❤️



GitLab Training v2.5.0 released

We have released v2.5.0 of our GitLab training today. Based on the feedback from previous trainings, and many things learned together with the students, we aim for the next classes already.

Dive deep into Git rebase, merge, squash, cherry-pick, get to know real-life development workflows and explore the possibilities of CI/CD pipelines and even more fancy GitLab features. Check our training schedule and register now!

Request Tracker: Highlight tickets based on due dates

A while ago we’ve announced a new extension for Request Tracker which allows to highlight tickets in search results even better.

Next to “last updated by” and custom field conditions, we’ve now added a requirement from production:

  • light red coloring for tickets with a due date in 3 days
  • dark red coloring for everything where the due date already passed

When there’s a ticket which matches one of the other conditions, due date wins over “last updated by” which itself wins over custom field conditions.

v2.0.0 is available on GitHub. Consider asking our sales engineers when building a new RT instance in NWS 🙂