NETWAYS Blog

From Monitoring to Observability: Distributed Tracing with Jaeger

by Michael Friedrich | Jan 9, 2020 | DevOps, Monitoring & Observability, GitLab

Modern cloud environments and microservice architectures need a changed mindset when it comes to monitoring. Classic host/service object relations are not always applicable, containers run in Kubernetes are short lived, and application performance within distributed networks is hard to monitor. Especially with applications running millions of operations, where to start the root cause analysis for slow client responses in your web shop? Is it just slow connections to MySQL, or does the application’s debug log slow down the entire fleet?

This is where observability with tracing comes into play. In the cloud native space, OpenTracing evolved as vendor neutral standardized API including client instrumentation. Famous tools are Zipkin and Jaeger which was contributed from Uber to the CNCF.

Let’s have a look into Jaeger today.

Getting Started

The easiest way to try Jaeger is with using the Docker container explained in the documentation.

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \
  -p 5775:5775/udp \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 14268:14268 \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.16

Navigate to http://localhost:16686 to get greeted by Jaeger.

Try it

A sample application is available as container. I’m using a modified port mapping with 8081-8084 here since port 8080 is already assigned.

docker run --rm -it \
  --link jaeger \
  -p8081-8084:8080-8083 \
  -e JAEGER_AGENT_HOST="jaeger" \
  jaegertracing/example-hotrod:1.16 \
  all

Navigate to http://localhost:8081 and click the buttons to order some cars.

Within Jaeger itself, start analyzing the traces and for example learn that Redis runs into timeouts quite often delaying the HTTP responses to the clients.

Traces for applications

Jaeger provides officially supported client libraries for Go, Java, Python, NodeJS, C/C++, C#/.NET, others are in the making. Let’s see if we can add it into Icinga 2 and do some tracing here.

First off, clone the repository, build and install the client libraries. You’ll need CMake, a C++ compiler, etc. – basically everything which is required for Icinga 2 too and documented here. In this example, I’m compiling on my Macbook. There are additional libraries and headers required. Hint: Boost 1.72 has a bug which needs a patch.

brew install yaml-cpp thrift

git clone https://github.com/opentracing/opentracing-cpp && cd opentracing-cpp
# 1.6.0 doesn't work atm
git checkout v1.5.1
mkdir -p build && cd build
cmake ..
make && make install
cd ..

Then clone, cmake, make, install.

git clone https://github.com/jaegertracing/jaeger-client-cpp && cd jaeger-client-cpp
git checkout v0.5.0
# regenerate thrift headers for 0.11.0
find idl/thrift/ -type f -name \*.thrift -exec thrift -gen cpp -out src/jaegertracing/thrift-gen {} \;
mkdir -p build && cd build
cmake ..
make && make install
cd ..

In order to add Jaeger into Icinga 2, there are the following steps necessary:

Add CMake functions to lookup yaml-cpp, opentracing, jaeger headers and libraries
Optionally enable Jaeger tracing code, link the icinga2 binary against it
Add a new tracer into the Config Compiler CLI command to measure the timing points

The majority of development time today was to resolve compile and linking issues, adding spans and traces is really simple. You can follow my progress in this Icinga PR – developers, get started wtih the client libraries and help your colleagues with enhanced observability!

jaeger_tracing_icinga2_config_compiler_400k_services

Conclusion

Tracing application performance, cluster messages, end2end tests and any sort of event span provides valuable insights for both, devs and ops. With the new cloud native landscape evolving fast, we have additional possibilities to analyze our environments. Next to the now standardized tools for parsing and ingesting logs with Elastic Stack or Graylog, tracing has found its place in the observability stack.

Jaeger Tracing also is part of the GitLab observability suite which will be moved to the free core edition in 2020. Metrics, logging, alerts and tracing are key elements in modern cloud environments. Prometheus monitors everything from classic load checks to Kubernetes containers, Jaeger provides application performance insights and on top of that, Grafana combines the view on problems and trends. You can learn more about GitLab’s vision here.

Exploring these cool new features in GitLab are our mission in future trainings and workshops, watch this space in 2020!

DevOpsDays Ghent: Celebrating 10 years DevOps culture

by Michael Friedrich | Oct 31, 2019 | NETWAYS, DevOps, Events

Ten years ago the DevOps movement was started by Patrick Debois and Kris Buytaert in Ghent. Who would have guessed that it could become such an innovative movement and community. Today we’ve seen more than 80 DevOpsDays all over the world on each continent. All with the love and help from the core organizers team.

The NETWAYS family made their way to the 10th DevOpsdays anniversary in Ghent to participate and celebrate with the community. This time it also was really special with attending many DevOpsDays organizers from all over the world.

From a historical view, DevOpsDays started out small. In 2015 they’ve started to document how to organize and create communities and it all spread around the globe. More than 50 different countries and organizers met on the first day sharing their experiences and have a good time together. As with anything else in the Open Source world, good documentation is key for the first success to move on and spread the love.

DevOps is a culture, not a job title

Patrick shared his journey with making new friends in a new company after leaving the DevOpsDays organizer team five years ago. He’s identified the bottlenecks and silos in every department, and they all solved it with learning from each other. DevOps also is about valueing others work and understand their feelings and emotions.

After 10 years with many DevOpsDays held all over the world, DevOps as a term still is used wrong and needs improvements. It isn’t just creating a new team called “DevOps Engineers” now replacing the ops team. Neither is it about putting devs on call letting them eat their own dog food. Anyone trying to sell you the perfect DevOps world from a marketing slide with certifications and job titles is just plain wrong. There are many great tools in the wild which can help with bringing the DevOps mindset into the enterprise environment. Not every company is able to immediately dive into the culture, sometimes it takes months or even years to encourage for a change.

DevOpsDays is about sharing these thoughts and emotions, care about diversity and tell you not the ordinary tech story but something to think about. Raise awareness that DevOps is about culture and finding the harmony in your daily workflows. Achieve goals and visions in a shorter amount of time, combine tools and be a role model with sharing your expertise. The DevOps movement is not only a place to sharing experiences with tools and best practices but also talking about work ethics, communication styles and soft skills. With overemphasizing on the technical toolchain, sociotechnity seems to balance this as well in many environments. Emotions are a thing, software is not only code.

From the full-stack engineers not writing device drivers to the most important message: Our definition of the “full stack” only covers what we understand, and not what’s actually required to run the application. Stop pretending that things are easy, being on call 24/7 doesn’t burn out and new products will solve old problems. Care about high performing teams with the need of psychological safety. Now wait for the recordings, these talks are really interesting to learn from.

Wait for it

Slides changing every 15 seconds, 5 minutes time to pitch your story. I like them a lot being entertaining and sometimes you just feel the speaker’s pain with “waiting for slide”. The tradition says that this actually was a malfunction of Kris’ Linux notebook 😉

That way we’ve heard stories about hot takes, myths and false hoods about DevOps and the danger of DevOps certifications. Before Jason could start his ignite, Kris jumped in announcing that ConfigMgmtCamp registrations are now open. In return, he announced DeliveryConf and jumped right into the meta story being ignites. Watch the recordings when available, it literally made my day. Since announcing confs in ignites was now a thing, Blerim did so too with IcingaConf next year before diving into “Why monitoring is NOT killing observability”. Monversability combing both is a good idea with moving from traditional blackbox monitoring to applications providing metrics and insights. On the other hand, watching graphs all days still requires business process dashboards to immediately visualize failure with alerts. Last but not least, learning about Kubernetes in 5 minutes really nailed it.

Move on

We really enjoyed meeting friends old and new and aside from the talks, exploring beautiful Ghent with Belgium waffles and beer. One thing to note – the half marathon from the ground up to the ball room with many stairs was hard in the beginning. On the other hand, our fitness trackers were very happy 😉

Thanks to the organizers and sponsors for the great event – onwards to the next decade!

PS: DevOpsDays Berlin is happening soon. And many more great events near your city. If you don’t have one, kindly contact the core organizers, they are all in with helping to kick-off an amazing event and culture.

GitLab CI Runners with Auto-scaling on OpenStack

by Michael Friedrich | Oct 24, 2019 | GitLab, Container, OpenStack, NETWAYS, DevOps

With migrating our CI/CD pipelines from Jenkins to GitLab CI in the past months, we’ve also looked into possible performance enhancements for binary package builds. GitLab and its CI functionality is really really great in this regard, and many things hide under the hood. Did you know that “Auto DevOps” is just an example template for your CI/CD pipeline running in the cloud or your own Kubernetes cluster? But there’s more, the GitLab CI runners can run jobs in different environments with using different hypervisors and the power of docker-machine.

One of them is OpenStack available at NWS and ready to use. The following examples are from the Icinga production environment and help us on a daily basis to build, test and release Icinga products.

Preparations

Install the GitLab Runner on the GitLab instance or in a dedicated VM. Follow along in the docs where this is explained in detail. Install the docker-machine binary and inspect its option for creating a new machine.

curl -L https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh | sudo bash
apt-get install -y gitlab-runner
  
curl -L `uname -s`-`uname -m` -o /usr/local/bin/docker-machine
chmod +x /usr/local/bin/docker-machine
  
docker-machine create --driver openstack --help

Next, register the GitLab CI initially. Note: This is just to ensure that the runner is up and running in the GitLab admin interface. You’ll need to modify the configuration in a bit.

gitlab-runner register \
  --non-interactive \
  --url https://git.icinga.com/ \
  --tag-list docker \
  --registration-token SUPERSECRETKEKSI \
  --name "docker-machine on OpenStack" \
  --executor docker+machine \
  --docker-image alpine

Docker Machine with OpenStack Deployment

Edit “/etc/gitlab-runner/config.toml” and add/modify the “[[runners]]” section entry for OpenStack and Docker Machine. Ensure that the MachineDriver, MachineName and MachineOptions match the requirements. Within “MachineOptions”, add the credentials, flavors, network settings just as with other deployment providers. All available options are explained in the documentation.

vim /etc/gitlab-runner/config.toml

  [runners.machine]
    IdleCount = 4
    IdleTime = 3600
    MaxBuilds = 100
    MachineDriver = "openstack"
    MachineName = "customer-%s"
    MachineOptions = [
      "openstack-auth-url=https://cloud.netways.de:5000/v3/",
      "openstack-tenant-name=1234-openstack-customer",
      "openstack-username=customer-login",
      "openstack-password=sup3rS3cr3t4ndsup3rl0ng",
      "openstack-flavor-name=s1.large",
      "openstack-image-name=Debian 10.1",
      "openstack-domain-name=default",
      "openstack-net-name=customer-network",
      "openstack-sec-groups="mine",
      "openstack-ssh-user=debian",
      "openstack-user-data-file=/etc/gitlab-runner/user-data",
      "openstack-private-key-file=/etc/gitlab-runner/id_rsa",
      "openstack-keypair-name=GitLab Runner"
    ]

The runners cache can be put onto S3 granted that you have this service available. NWS luckily provides S3 compatible object storage.

  [runners.cache]
    Type = "s3"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "s3provider.domain.localdomain"
      AccessKey = "supersecretaccesskey"
      SecretKey = "supersecretsecretkey"
      BucketName = "openstack-gitlab-runner"

Bootstrap Docker in the OpenStack VM

Last but not least, these VMs need to be bootstrapped with Docker inside a small script. Check the “–engine-install-url” parameter in the help output:

root@icinga-gitlab:/etc/gitlab-runner# docker-machine create --help
  ...
  --engine-install-url "https://get.docker.com"							Custom URL to use for engine installation

You can use the official way of doing this, but putting this into a small script also allows customizations like QEMU used for Raspbian builds. Ensure that the script is available via HTTP e.g. from a dedicated GitLab repository 😉

#!/bin/sh
#
# This script helps us to prepare a Docker host for the build system
#
# It is used with Docker Machine to install Docker, plus addons
#
# See --engine-install-url at docker-machine create --help

set -e

run() {
  (set -x; "$@")
}

echo "Installing Docker via get.docker.com"
run curl -LsS https://get.docker.com -o /tmp/get-docker.sh
run sh /tmp/get-docker.sh

echo "Installing QEMU and helpers"
run sudo apt-get update
run sudo apt-get install -y qemu-user-static binfmt-support

Once everything is up and running, the GitLab runners are ready to fire the jobs.

Auto-Scaling

Jobs and builds are not run all the time, and especially with cloud resources, this should be a cost-efficient thing. When building Icinga 2 for example, the 20+ different distribution jobs generate a usage peak. With the same resources assigned all the time, this would tremendously slow down the build and release times. In that case, it is desirable to automatically spin up more VMs with Docker and let the GitLab runner take care of distributing the jobs. On the other hand, auto-scaling should also shut down resources in idle times.

By default, one has 4 VMs assigned to the GitLab runner. These builds run non-privileged in Docker, the example below also shows another runner which can run privileged builds. This is needed for Docker-in-Docker to create Docker images and push them to GitLab’s container registry.

root@icinga-gitlab:~# docker-machine ls
NAME                                               ACTIVE   DRIVER      STATE     URL                      SWARM   DOCKER     ERRORS
runner-privileged-icinga-1571900582-bed0b282       -        openstack   Running   tcp://10.10.27.10:2376           v19.03.4
runner-privileged-icinga-1571903235-379e0601       -        openstack   Running   tcp://10.10.27.11:2376           v19.03.4
runner-non-privileged-icinga-1571904408-5bb761b5   -        openstack   Running   tcp://10.10.27.20:2376           v19.03.4
runner-non-privileged-icinga-1571904408-52b9bcc4   -        openstack   Running   tcp://10.10.27.21:2376           v19.03.4
runner-non-privileged-icinga-1571904408-97bf8992   -        openstack   Running   tcp://10.10.27.22:2376           v19.03.4
runner-non-privileged-icinga-1571904408-97bf8992   -        openstack   Running   tcp://10.10.27.22:2376           v19.03.4

Once it detects a peak in the pending job pipeline, the runner is allowed to start additional VMs in OpenStack.

root@icinga-gitlab:~# docker-machine ls
NAME                                               ACTIVE   DRIVER      STATE     URL                      SWARM   DOCKER     ERRORS
runner-privileged-icinga-1571900582-bed0b282       -        openstack   Running   tcp://10.10.27.10:2376           v19.03.4
runner-privileged-icinga-1571903235-379e0601       -        openstack   Running   tcp://10.10.27.11:2376           v19.03.4
runner-non-privileged-icinga-1571904408-5bb761b5   -        openstack   Running   tcp://10.10.27.20:2376           v19.03.4
runner-non-privileged-icinga-1571904408-52b9bcc4   -        openstack   Running   tcp://10.10.27.21:2376           v19.03.4
runner-non-privileged-icinga-1571904408-97bf8992   -        openstack   Running   tcp://10.10.27.22:2376           v19.03.4
runner-non-privileged-icinga-1571904408-97bf8992   -        openstack   Running   tcp://10.10.27.23:2376           v19.03.4

...

runner-non-privileged-icinga-1571904534-0661c396   -        openstack   Running   tcp://10.10.27.24:2376           v19.03.4
runner-non-privileged-icinga-1571904543-6e9622fd   -        openstack   Running   tcp://10.10.27.25:2376           v19.03.4
runner-non-privileged-icinga-1571904549-c456e119   -        openstack   Running   tcp://10.10.27.27:2376           v19.03.4
runner-non-privileged-icinga-1571904750-8f6b08c8   -        openstack   Running   tcp://10.10.27.29:2376           v19.03.4

In order to achieve this setting, modify the runner configuration and increase the limit.

vim /etc/gitlab-runner/config.toml

[[runners]]
  name = "docker-machine on OpenStack"
  limit = 24
  output_limit = 20480
  url = "https://git.icinga.com/"
  token = "supersecrettoken"
  executor = "docker+machine"

This would result in 24 OpenStack VMs after a while, and all are idle 24/7. In order to automatically decrease the deployed VMs, use the OffPeak settings. This also ensures that resources are available during workhours while spare time and weekend are considered “off peak” with shutting down unneeded resources automatically.

    OffPeakTimezone = "Europe/Berlin"
    OffPeakIdleCount = 2
    OffPeakIdleTime = 1800
    OffPeakPeriods = [
      "* * 0-8,22-23 * * mon-fri *",
      "* * * * * sat,sun *"
    ]

Pretty neat functionality 🙂

Troubleshooting & Monitoring

“docker-machine ls” provides the full overview and tells whenever e.g. a connection to OpenStack did not work, or if the VM is currently unavailable.

root@icinga-gitlab:~# docker-machine ls
NAME                                               ACTIVE   DRIVER      STATE     URL                      SWARM   DOCKER     ERRORS
runner-privileged-icinga-1571900582-bed0b282       -        openstack   Error                                      Unknown    Expected HTTP response code [200 203] when accessing [GET https://cloud.netways.de:8774/v2.1/servers/], but got 404 instead

In case you have deleted the running VMs to start fresh, provisioning might take a while and the above can be a false positive. Check the OpenStack management interface to see whether the VMs booted correctly. You can also remove a VM with “docker-machine rm <id>” and run “gitlab-runner restart” to automatically provision it again.

Whenever the VM provisioning fails, a gentle look into the syslog (or runner log) unveils what’s the problem. Lately we had used a wrong OpenStack flavor configuration which was fixed after investigating in the logs.

Oct 18 07:08:48 3 icinga-gitlab gitlab-runner[30988]:  #033[31;1mERROR: Error creating machine: Error in driver during machine creation: Unable to find flavor named 1234-customer-id-4-8#033[0;m  #033[31;1mdriver#033[0;m=openstack #033[31;1mname#033[0;m=runner-non-privilegued-icinga-1571375325-3f8176c3 #033[31;1moperation#033[0;m=create

Monitoring your GitLab CI runners is key, and with the help of the REST API, this becomes a breeze with Icinga checks. You can inspect the runner state and notify everyone on-call whenever CI pipelines are stuck.

Conclusion

Developers depend on fast CI feedback these days, speeding up their workflow – make them move fast again. Admins need to understand their requirements, and everyone needs a deep-dive into GitLab and its possibilities. Join our training sessions for more practical exercises or immediately start playing in NWS!

GitLab Commit London Recap

by Michael Friedrich | Oct 10, 2019 | GitLab, DevOps

A while ago, when GitHub announced CI/CD for the upcoming actions feature, I’ve been sharing with Priyanka on Twitter how we use GitLab. The GitLab stack with the runners, Docker registry and all-in-one interface not only speeds up our development process and packaging pipelines, it also scales our infrastructure deployments even better. With the love for GitLab, we’ve also created our GitLab training sharing all the knowledge about this great tool stack.

GitLab, GitLab, GitLab

Priyanka was so kind to invite us over to GitLab Commit in London, GitLab’s first European user conference. At first glance, Bernd and I didn’t know what would happen – turns out, this wasn’t a product conference. Instead, meeting new people and learning how they use GitLab in their environments was put into focus. Sid Sijbrandij, GitLab’s CEO, kicked off the event in the morning and after sharing the roots of GitLab being in Europe, he asked the audience to “meet your neighbour and connect”. Really an icebreaker for the coming talks and sessions.

The next keynote was presented by engineers from Porsche sharing their move to GitLab. With Java Boot Spring and iOS application development, and the requirement of deeper collaboration between teams, they took the challenge and are using GitLab since ~1 year. Interesting to learn was that nearly everyone at GitLab Commit uses Terraform for deployments. Matt did so too in his live coding session with a full blown web application Kubernetes container setup all managed and deployed with GitLab and Terraform. After 20 minutes of talking way too fast, it worked. What a great way of showing what’s possible with today’s tool stack!

DevOps for everyone

One thing I’ve also recognized – everyone seems to be moving to Kubernetes and Terraform. Rancher, Jenkins and other tools in the same ecosystem seem to be falling short in modern DevOps environments. I really liked the security panel where ideas like automated dependency scanning in merge requests have been shared. Modern days with easy to use libraries typically pull in lots of unforeseen dependencies and who really knows about all the vulnerabilities? Blocking the merge request in case of emergency is a killer feature for current development workflows.

In terms of the product roadmap, GitLab has a huge vision which is not easy to summarize. On the other hand, having a maybe-not-reachable vision empowers a great team to work even harder. The short term improvements for CI are for example Directed Acyclic Graphs allowing parallel pipelines to continue faster. This will greatly enhance our package pipelines in the future. While tweeting about this, Jason was so kind to share the build matrix feature known from Travis coming soon with GitLab 12.6. Spot on, testing e.g. different PHP versions for the same job is greatly missed being as easy as Travis. GitLab Runners will receive support for ARM soon, and also Vault integration is coming. GitLab also announced their startup Meltano, an open source data to dashboard workflow platform – looks really promising.

The afternoon sessions were split into 3 tracks each, with even more user stories. Moving along from Delta with their many of thousands of repositories, we’ve also learned more about VMware’s cloud architects and how they incorporate GitLab & Terraform for deployments. Last but not least we’ve joined Philipp sharing his story on migrating from Jenkins to GitLab CI. Since we struggled from the same problems (XML config, plugins breaking upgrades, etc.) we were enlightened to see that he even developed a GitHub to GitLab issue migrator, fully open source. Moving to a central platform and away from 5+ browser tabs really is a key argument in stressful (development) times. Avoiding context switches for developers improves quality and ensures better releases from my experience.

Get the party started

The evening event took place at Swingers, a bar which had a Mini Golf playground built-in. We were going there with the iconic London Bus, and the nice people from GitLab even ensured that Gin&Tonic made this a great starter. Right after arriving, the fire alarm rang off and we had to move outside. Party like NETWAYS 😉 And finally we met Priyanka to say hi, lovely memory. We also met Brian, proud Irish, challenging us with funny stories and finding out that Germans do not know everything. Really charming and much to laugh.

Thanks GitLab for this top notch event and see you next year!

DEV stories: Icinga Core trainees in the making

by Michael Friedrich | Jul 18, 2019 | Icinga, Team, Development, C++

When my dev leads approached me with the idea to guide a trainee in the Icinga core topic, I was like … wow, sounds interesting and finally a chance to share my knowledge.

But where should I start and how can it be organized with my ongoing projects?

Prepare for the unexpected

My view on the code and how things are organized changed quite a bit since then. You cannot expect things, nor should you throw everything you know into the pool. While working on Icinga 2.11, I’ve collected ideas and issues for moving Henrik into these topics. In addition to that, we’ve improved on ticket and documentation quality, including technical concepts and much more.

My colleagues know me as the “Where is the test protocol?” PR reviewer. Also, reliable configuration and steps on reproducing the issues and problems are highly encouraged. Why? The past has proven that little to zero content in ticket makes debugging and problem analysis really hard. Knowledge transfer is an investment in the future of both NETWAYS and Icinga.

Some say, that documentation would replace their job. My mission is to document everything for my colleagues to make their life easier. Later on, they contribute to fixing bugs and implement new features while I’m moving into project management, architecture and future trainees. They learn what I know, especially “fresh” trainees can be challenged to learn new things and don’t necessarily need to change habits.

Clear instructions?

At the beginning, yes. Henrik started with the C++ basics, a really old book from my studies in 2002. C++11 is a thing here, still, the real “old fashioned” basics with short examples and feedback workshops prove the rule. Later on, we went for an online course and our own requirements.

Since Icinga 2 is a complex tool with an even more complex source code, I decided to not immediately throw Henrik into it. Instead, we had the chance to work with the Tinkerforge weather station. This follows the evaluation from our Startupdays and new product inside the NETWAYS shop. The instructions were simple, but not so detailed:

Put the components together and learn about the main functionality. This is where the “learn by playing” feeling helps a lot.
Explore the online documentation and learn how to use the API bindings to program the Tinkerforge bricks and sensors.
Use the existing check_tinkerforge plugin written in Python to see how it works
Write C++ code which talks to the API and fetches sensor data

Documentation, a blog post and keeping sales updated in an RT ticket was also part of the project. Having learned about the requirements, totally new environment and communication with multiple teams, this paves the way for future development projects.

Freedom

In order to debug and analyse problems or implement new features, we need to first understand the overall functionality. Starting a new project allows for own code, experiences, feedback, refactoring and what not. Icinga 2 as core has grown since early 2012, so it is key to understand the components and how everything is put together.

Where to start? Yep, visit the official Icinga trainings for a sound base. Then start with some Icinga cluster scenarios, with just pointing to the docs. This takes a while to understand, so Henrik was granted two weeks to fully install, test and prepare his findings.

With the freedom provided, and the lessons learned about documentation and feedback, I was surprised with a Powerpoint presentation on the Icinga cluster exercises. Essentially we discussed everything in the main area in our new NETWAYS office. A big flat screen and the chance that colleagues stop by and listen or even add to the discussion. Henrik was so inspired to write a blogpost on TLS.

Focus on knowledge

In the latest session, I decided to prove things and did throw a lot of Icinga DSL exercises at Henrik, also with the main question – what’s a DSL anyways?

Many things in the Icinga DSL are hidden gems, with the base parts documented, but missing the bits on how to build them together. From my experience, you cannot explain them in one shot, specific user and customer questions or debugging sessions enforce you to put them together. At the point when lambda functions with callbacks were on the horizon, a 5 hour drive through the DSL ended. Can you explain the following snippet? ?

object HostGroup "hg1" { assign where host.check_command == "dummy" }
object HostGroup "hg2" {
  assign where true
//  assign where host.name in Cities
 }

object Host "runtime" {
  check_command = "dummy"
  check_interval = 5s
  retry_interval = 5s

  vars.dummy_text = {{
    var mygroup = "hg2"
    var mylog = "henrik"
  //  var nodes = get_objects(Host).filter(node => mygroup in node.groups)

    f = function (node) use(mygroup, mylog) {
      log(LogCritical, "Filterfunc", mylog+node.name )
      return mygroup in node.groups
    }
    var nodes = get_objects(Host).filter(f)

    var nodenames = nodes.map(n => n.name)
    return nodenames.join(",")
   // return Json.encode(nodes.map(n => n.name))
  }}

}

We also did some live coding in the DSL, this is now a new howto on the Icinga community channels: “DSL: Count check plugin usage from service checks“. Maybe we’ll offer an Icinga DSL workshop in the future. This is where I want our trainees become an active part, since it also involves programming knowledge and building the Icinga architecture.

Code?

Henrik’s first PR was an isolated request by myself, with executing a check in-memory instead of forking a plugin process. We had drawn lots of pictures already how check execution generally works, including the macro resolver. The first PR approved and merged. What a feeling.

We didn’t stop there – our NETWAYS trainees are working together with creating PRs all over the Icinga project. Henrik had the chance to review a PR from Alex, and also merge it. Slowly granting responsibility and trust is key.

Thanks to trainees asking about this, Icinga 2 now also got a style guide. This includes modern programming techniques such as “auto”, lambda functions and function doc headers shown below.

/**
 * Main interface for notification type to string representation.
 *
 * @param type Notification type enum (int)
 * @return Type as string. Returns empty if not found.
 */
String Notification::NotificationTypeToString(NotificationType type)
{
	auto typeMap = Notification::m_TypeFilterMap;

	auto it = std::find_if(typeMap.begin(), typeMap.end(),
		[&type](const std::pair<String, int>& p) {
			return p.second == type;
	});

	if (it == typeMap.end())
		return Empty;

	return it->first;
}

Learn and improve

There are many more things in Icinga: The config compiler itself with AST expressions, the newly written network stack including the REST API parts, feature integration with Graphite or Elastic and even more. We’ll cover these topics with future exercises and workshops.

While Henrik is in school, I’m working on Icinga 2.11 with our core team. Thus far, the new release offers improved docs for future trainees and developers:

Overhauled development docs with dev environments on macOS, Linux, Windows and a code style.
Technical concepts including cluster messages. Helps developers and support.
Service monitoring with writing good plugins and a better integration.
Enhanced troubleshooting docs with TLS ciphers, sslscan is a really nifty tool for example.

This also includes evaluating new technologies, writing unit tests and planning code rewrites and/or improvements. Here’s some ideas for future pair programming sessions:

Boost.DateTime instead of using C-ish APIs for date and time manipulation. This blocks other ideas with timezones for TimePeriods, etc.
DSL methods to print values and retrieve external data
Metric enhancements and status endpoints

Trainees rock your world

Treat them as colleagues, listen to their questions and see them “grow up”. I admit it, I am sometimes really tired in the evening after talking all day long. On the other day, it makes me smile to see a ready-to-merge pull request or a presentation with own ideas inspired by an old senior dev. This makes me a better person, every day.

I’m looking forward to September with our two new DEV trainees joining our adventure. We are always searching for passionate developers, so why not immediately dive into the above with us? 🙂 Promise, it will be fun with #lifeatnetways and #drageekeksi ❤️