NETWAYS Blog

OSMC 2023 | Know your Data: The Stats Behind your Alerts

by Tobias Bauriedel | Feb 16, 2024 | Uncategorized

At the last OSMC we had the honour of welcoming Dave McAllister from America. Dave McAllister works at NGINX in America and has been working in the world of observability and monitoring for a long time.
Although he works at NGINX, his talk was not about NGINX as you might expect. Rather, he gave us an insight into data and how to process and read it.

When you are in the observability space, you are usually overwhelmed by a lot of data. In order to get meaningful results from this amount of data, it needs to be processed (maybe even sampled) and we need to understand what this data is telling us before we process it.

So, How is Data Processed?

Much of the data is aggregated using the average. But even here there are different methods.
Most statistics are based on the mean, median, or mode. But what is it?

The ‘mean’ is the actual average of a set of values. For a data set of ‘1,5,8,7,1’, the mean would be 4.4. This is calculated by adding the data points and dividing by the set. In our example, (1+5+8+7+1)/5 = 4.4. The ‘median’ is the value in the middle of a set of data in an ordered sequence. In our example, it would be 5 (1,1,5,7,8).
The ‘mode’ is not calculated using a formula (unlike the previous ones), otherwise it is formed from the most common value. In our example this would be the number 1. 1 occurs twice in our set, while all other points are unique.

You can see from the above examples how important it is to know how to treat your data. The “wrong” use of algorithms can lead to undesirable results. It is therefore important to know what you want to determine before aggregating the data.

Data Sampling

As a monitoring and observability engineer, you are usually overwhelmed by the amount of data that needs to be managed, aggregated and processed. Dave McAllister talks about a customer he visited who was dealing with 42TB of data per hour. No one can handle that amount of data.
As the amount of data increases, so does the difficulty of analysing it. Sampling provides a solution to this problem. In sampling, large amounts of data are converted into smaller amounts of data, or collected in a sampled fashion.
As you can imagine, there are many different ways of doing sampling. The basic ones are ‘head-based’ and ‘tail-based’.

With head-based, a decision is made before a trace is collected whether or not to keep it. This cannot affect the validity of the data, as both good and bad traces are collected.
Tail-based sampling is the exact opposite. With tail-based sampling, you let a trace finish first and then decide whether to keep it or not. In this respect, it is more like “filtering” the data than random sampling.

Calculate probabilities

In addition to the important aspects of aggregation and sampling, Dave McAllister also gave us an insight into the calculation of probabilities.

If you have a large amount of meaningful data, you can use it to predict probabilities.
A classic example from IT is predicting whether a hard drive will fail within a certain period of time.

Again, there are different ways of making predictions. Dave’s talk gives us a small but nice and detailed insight into stochastics.
The “Weibull” method can be used to predict when a hard drive is likely to fail. Weibull is mainly used to calculate “time to failure”. Another method is “exponential”. This can be used to calculate the time between two events.
These two methods are just a small glimpse into stochastics.

Summary

As you have seen from the few examples in this blog post, statistics are the way to analyse data.
With the correct use of data, good and meaningful statistics and even predictions / probabilities can be calculated and only very few principles are used in most decisions, so everyone can use them.

By now you should have realized how important it is to use data correctly and what is possible with it.
If you want an even deeper and more detailed insight into the topic, I can only recommend the video of Dave McAllister’s talk.

Tobias Bauriedel

Assistant Manager Operations

Tobias ist ein offener und gelassener Mensch, dem vor allem der Spaß an der Arbeit wichtig ist. Bei uns hat er seine Ausbildung zum Fachinformatiker für Systemintegration abgeschlossen und arbeitet nun im NETWAYS Professional Services - Team Operations und entwickelt nebenbei Projekte für die NPS. In seiner Freizeit engagiert er sich ehrenamtlich aktiv bei der Freiwilligen Feuerwehr als Atemschutzgerätetrager und Maschinist, bereist die Welt und unternimmt gerne etwas mit Freunden.

Read more from Tobias and meet the Team

Unser Jahresrückblick 2023

by Katja Kotschenreuther | Dec 19, 2023 | Uncategorized

Schneller als man denkt, neigte sich auch das Jahr 2023 einem Ende zu. Viele tolle Ereignisse, zahlreiche neue Gesichter, einzigartige Events, und jede Menge Know-How prägten unsere diesjährigen Blogaktivitäten. Werfen wir nun gemeinsam einen Blick zurück auf alle besonderen Momente dieses Jahres, an die wir uns gerne zurückerinnern.

#Life@NETWAYS

Anfang des Jahres haben wir gemeinsam einen kurzen Skiurlaub im Bayerischen Wald genossen. Skifahren, Snowboarden, Wandern und gemeinsames Kochen haben unsere Reise zu einem unvergesslichen Erlebnis gemacht. Kurz darauf stand bei unseren Azubis deren alljährliche Projektwoche an. Gemeinsam arbeiteten sie eine Woche daran einen Arcade-Automaten ins Leben zu rufen. Ein Ergebnis, das sich definitiv sehen lassen kann! Auch für Partys und sonstige interne Veranstaltungen waren wir immer zu haben. Egal ob beim „Tanz in den Mai“, der Sommerparty oder zuletzt bei unserer gemeinsamen Weihnachtsfeier – einen Grund zu feiern gibt es bei uns immer!

Unsere Events

Da wir gerade bei Veranstaltungen sind, leiten wir direkt weiter zu den zwei wichtigsten Konferenzen, die NETWAYS Event Services dieses Jahr organisierte: stackconf und OSMC. Beide Open Source Konferenzen waren ein großer Erfolg, den wir in erster Linie unserem Event-Team, also Markus und Lukas zu verdanken haben. Aber auch allen anderen Mitarbeitenden, die die Veranstaltungen durch Moderation, Social Media, Fotographie oder Videoaufnahme unterstützt haben, gilt ein großer Dank!

Neue Gesichter

Auch in diesem Jahr freuten wir uns über eine Vielzahl an neuen Gesichtern, die frischen Wind mit in die Firma brachten. Ingrida und Sebastian unterstützen seit diesem Jahr als Marketing Specialists unsere Marketingabteilung, Lucy als Consultant das Team NETWAYS Professional Services und Noé und Alvar sind als Developer bei Icinga angesiedelt. Außerdem haben wir sechs neue Azubis hinzugewonnen.

Hipp Hipp Hurra!

Ein besonderer Dank gilt in diesem Jahr Marius und Sefan G., die bereits seit 20 Jahren Teil des Teams sind, sowie Nadja, Markus W., und Thomas W., welche ihr 10-jähriges Jubiläum feiern durften. Nochmals herzlichen Glückwunsch und vielen Dank für Eure langjährige Mitarbeit!

Neues von NETWAYS Web Services

Auch unser NETWAYS Managed Services Team hatte dieses Jahr einige erwähnenswerte Neuerungen. Das NWS Customer Interface wurde um die NWS-ID erweitert, was bedeutet, dass Du in Deinem eigenen Konto Organisationen erstellen, Benutzergruppen bestimmte Berechtigungen erteilen und die dazugehörigen Benutzer hinzufügen kannst. Egal, ob Du ein Cloud-Projekt oder einen Kubernetes-Cluster eingerichtet hast, beide sind jetzt mit NWS-ID integriert! Zudem gab es für alle Kubernetes Nutzer ein neues Release: Cilium – ein fortschrittliches CNI, welches ausgefeilte Netzwerk- und Sicherheitsfunktionen bietet. Im SaaS-Portfolio kam Managed Bookstack hinzu, eine leistungsstarke, benutzerfreundliche Wiki Software. Außerdem haben wir unsere Tutorial Reihe um einige Kubernetes und Cloud Tutorial erweitert.

Neues von NETWAYS Professional Services

Wir haben auch dieses Jahr unser Produktportfolio erweitert. Seit Kurzem bietet NETWAYS Professional Services auch Supportverträge für das Open Source Automation Tool Ansible, an. Unsere Support Engineers unterstützen Dich gerne bei dem Betrieb Deiner Ansible Umgebung – für Deinen Erfolg!

Ein großer Dank gilt an dieser auch unseren Mitarbeitern aus dem Vertriebsteam, welche ganzjährig unsere Kunden rund um unsere Open Source Produkte sowie Consulting- und Support-Dienstleistungen beraten!

Highlights aus dem Blog:

Zum Abschluss haben wir jetzt noch ein Ranking mit den besten besten und beliebtesten Blogposts dieses Jahres zusammengestellt. Falls Ihr sie nicht schon gelesen habt, ist jetzt die beste Gelegenheit dazu:

Viel Spaß beim Durchblättern!

Wir wünschen Euch allen ein schönes Weihnachtsfest und einen guten Rutsch ins neue Jahr. Wir freuen uns auf 2024 und darauf, wieder viele tolle Momente, inspirierende Projekte, Know-How und Einblicke in unser #Life@NETWAYS mit Euch teilen zu dürfen!

Katja Kotschenreuther

Manager Marketing

Katja ist seit Oktober 2020 Teil des Marketing Teams. Als Manager Marketing kümmert sie sich um das Marketing für die Konferenzen stackconf und OSMC, die DevOpsDays Berlin, Open Source Camps, sowie unsere Trainings. In ihrer Freizeit reist sie gerne, bastelt, backt und im Sommer kümmert sie sich außerdem um ihren viel zu großen Gemüseanbau.

Read more from Katja and meet the Team

OSMC 2023 | Day 3 Recap

by Lorenz Kästle | Nov 9, 2023 | OSMC, Uncategorized

Day two of the OSMC 2023 started rather quiet, but with a interesting set of talks. The following is a summary and review of some talks I watched and was interested in. Therefore not all of the talks are mentionend here and this should not be interpreted as a judgement of their quality or significance.

Automated update management with Renovate

Sebastian Gumprich describes his journey of introducing Renovate at scale at his work place. Renovate is a software for updating dependencies in software projects, which can be self-hosted and is therefore applicable in practically every environment.

Renovate analyses the software project which is called upon, detects the dependencies, fetches data about the available versions of those and applies then updates, if any are available, and it is configured to do so.

To integrate it better into the existing development process and to not apply more load on the developers, an application as a GitLab pipeline was chosen and realized. This approach was also scalable over a huge number of different projects and repositories then.

To work correctly (and do anything) Renovate needs some configuration, which is presented as JSON and, in most cases, rather small and easy to do

The presentation was partly about the technical ideas and problem, but also, arguably more importantly, about the human part, which I found most interesting. Part of this was, unsurprisingly, structured and extensive documentation of the relevant steps and procedures and common problems. But also some programmatic features were introduced, for example, automatically opening Issues in GitLab for faulty Renovate configuration.

To further reduce the hurdles to apply Renove to a specific project, the “Onboarding” Merge Request applying the relevant changes were quite verbose in what it should do, what the consequences would be and where and whom to ask in case of open questions.

These point may seem obvious or even trivial, but, and this is the opinion of the author, organizing different people and groups of people and communicate in a constructive and efficient way is one of the biggest hurdles in the business and approaches to this set of problems are often quite interesting and helpful.

Replacing NSClient++ for Windows Monitoring

The second talk I want to advertise here is Sven Nierlein’s presentation of a replacement for the NSClient++.

The start of talk was the expectable review of the NSClient++, a monitoring agent which was quite common in different availability and status monitoring setups in the past, especially on windows operating systems. Sadly the developement is progressing slower nowadays than in the past and some problems, which were not fixed, are increasingly a dealbreaker. Especially, some problems with the lack of current TLS protocols are problematic.

Writing a new agent was not really the first choice, but a comparison of current alternatives did not present a good solution since the introduction of completely new configuration, new protocols and different workflows was not a feasible way to go. The resolution was therefore to write a completely new, but compatible monitoring agent.

This offered some freedoms regarding the choice of tools. The choicethen went in the direction of the Golang language and the related toolchain. The new agent was called SNClient+ (where SN stands forSecure Naemon) and supports multiple protocols from the side of themonitoring system.

One of the is the NRPE protocol for compatibility reasons and, the prefered method, an HTTP-based method, which can be used with chec_nsc_web.

Additionally, to add more features, a general Prometheus exporter wasintegrated, which exposes the general operating system exporters of thePrometheus ecosystem. Therefore, the SNClient+ can also be used as the default node exporter.

To stay compatible and enhance the functionality further, there are not only built in plugins to test different properties on the host machine, but a generic functionality to execute third-party plugins is included.

A self-updating functionality is also built-in to make updates as easy as possible.

In summary, this is a promising new solution for an old problem and is likely worth a try.

Running the Infra at FOSDEM

Rather spontaneously, Sebastian Schubert made a presentation about the infrastructure at FOSDEM, one of the largest Free and Open Source Software events in the world. The event occurs yearly at the beginning of February in Brussels, and they expect around 10.000 visitors/day with around 20.000 devices which need to be connected to the internet. This would be, by itself, a challenging task, but it is a totally different scenario to deploy that kind of infrastructure for just a few days and there are no paid professionals, just volunteers which might turn up with no idea what, where and how.

The astonishing fact, that this kind of organization actually works (and that repeatedly and successfully) can probably not be admired enough.

Additional to providing network access (and some services there), there is also the video and streaming setup for the hundreds of different talks, which must not only be recorded, but also, ideally, be live-streamed to the internet (currently over third parties).

For this purpose, self-designed hardware boxes were used in the past to re-encode the video and audio in first step on site, which are increasingly replaced by more common laptops. These serves as a kind of “render farm” to prepare the material for the viewer.

Following that was a short introduction to the tools used in the network setup and especially some problems regarding using IPv6-only network in the 2020s where some parts of the internet are still only reachable via IPv4. One example here was the usage coreDNS as a replacement for bind9 (for resource usage reasons).

A generally good idea mentioned then was the introduction of monitoring on- and off-site where data was replicated and still available when there was an incident which took the equipment of the FOSDEM crew at the university offline.

Another interesting point added was the general availability of practically all relevant material to the, public which allows interested parties to get some ideas how everything works there and maybe allows the adaption to other purposes.

openITCOCKPIT Community Edition – Einfache Konfiguration, Module, API und mehr

In this talk, Jens Michelsons presented openITCOCKPIT monitoring system, which is one of the “Nagios-similar” monitoring systems they created at the it-novum company.

The focus lies there on creating an easily usable web-based system, where everything is integrated. A powerful HTTP API serves as the main interface for all the different components and is well documented. This allows small scale configurations via the web interface or more automated setups with other tools.

A speciality of openITCockpit is problably their own monitoring agent for remote hosts and the strong integration of other tools, including the CheckMK agent, into their systems. A migration of an existing setup in openITCockpit or extending one with other tools is therefore less painful than it could be.

Remarkable was also the extended live demo (always a risk in a presentation) which presented a typical but not simple workflow for adding some systems to the monitoring, including a combining logic of different tests.

Zabbix – Powerful enterprise grade monitoring driven by Open Source

Appropriately, the following talk was about Zabbix, a system quite similar in many regards to openITCockpit. Wolfgang Alper described the working principles of Zabbix and what the main concepts and functionalities are.

The direct comparison was quite interesting, as one can recognize common ideas and components, but also where philosophies and ideas differ and how different problems were addressed.

One of the most important ideas in Zabbix is the separation of concerns, where gathering of data, storage, problem detection, alarming and escalation are split up programmatically and can be treated individually. The definition of these steps and their interfaces allows developers to focus on a specific part without having to worry about the whole.

Another part of the talk was dedicated to how Zabbix handles large scale and distributed setups. At this point, a part of the Zabbix software components which is called “Proxy” comes into play, and relays directions from the central system to outliers and data the other way round.

All in all Zabbix is probably a capable tool to do the classic network monitoring task, but of course not limited to that.

Lorenz Kästle

Systems Engineer

Lorenz hat seinen Bachelor der Informatik an der FAU gemacht und sich zuletzt mit Betriebssystemen dort beschäftigt. In seiner Freizeit beschäftigt er sich ein wenig mit XMPP und der Programmiersprache Erlang.

Read more from Lorenz and meet the Team

OSMC 2022 | Unifying Observability: Weaving Prometheus, Jaeger, and Open Source Together to Win

by Matthias Döhler | Mar 7, 2023 | OSMC, Uncategorized

In his talk at the Open Source Monitoring Conference 2022 (OSMC) Jonah Kowall – having more than 15 years of experience in the fields Ops, network, security, and performance engineering under his belt – tells us a lot about observability in the open source market. He also focusses on possible problems regarding licensing.

In the following I will give you a brief overview of the topics and concepts behind.

What is Observability?

First things first, what is observability? And how does it differ from monitoring?

To greatly simplify:

Monitoring is used to track specific criteria of given hosts/devices across your infrastructure. Thus, monitoring means having an eye on specific metrics such as CPU load or RAM usage. This enables you to notice problems as they occur and act accordingly.
Observability on the other hand means collecting “all” data. Based on the inputs a system receives and its respective outputs you are meant to be able to draw conclusions about your system’s state.

Sticking with the RAM example, monitoring can show you that your system runs low on memory, while observability can tell you why that is. This “why” is also helpful in order to act appropriately before the “that” happens. So, monitoring effectively follows a reactive approach and observability follows a proactive one.

Now let’s let his presentation give us an explanation.

Commercial vs. Open Source solutions

As Jonah goes on to explain, commercial tools for observability tend to be more coherent and complete out of the box when it comes to the user interface (UI).

Meanwhile – due to the nature of the open source world – open source solutions are oftentimes highly fragmented requiring a combination of multiple tools to fill in the complete picture. This in turn leads to more complexity due to multiple different underlaying architectures. As an example he brings up the ELK stack (Elasticsearch + Logstash + Kibana) which is just three parts of a more extensive system.

But even though probably nobody likes complexity itself open source solutions still seem to be vastly popular with companies and make up the majority of the observability landscape. In Jonah’s opinion this trend is also “the future of where things are going”.

Licensing

Many of us are used to at least seeing a license every once in a while. MIT, Apache and GPL are common terms to encounter when dealing with open source products.
You yourself might not have to deal with licenses directly but in one way or another you could be affected as well.

Imagine finding a new open source project or code snippets that help you with building your own project. Maybe those fix something that you just could not do or didn’t have time to do. Now licensing is important. Can I use this code? In what way can I use it? Could it backfire? The last question is especially important, according to Jonah.

There seems to be a trend with so called “copyleft licenses”. In this context copyleft effectively means: If you use that code in your own project, you need to open source your own code within that project as well. This is certainly something most companies don’t want to or simply cannot afford to do. After all, companies are still about making money.

But not only do companies have to deal with such issues. Communities surrounding open source projects also have to be careful what they bring into projects. Amongst other disagreements – for example about the current path of a project – licensing is also a contributing factor when it comes to forks popping up.

If you want to know a bit more about a certain fork in the open source observability world that might potentially achieve unified observability, be sure to give Jonah Kowall a few minutes of your time.

The recording and slides of this talk and all other OSMC talks can be found in our Archives. Check it out!

The next OSMC takes place from November 7 – 9, 2023 in Nuremberg. Early Bird tickets are already on sale!

Matthias Döhler

Consultant

Seit erfolgreichem Abschluss seiner Ausbildung zum Fachinformatikter für Systemintegration bei NETWAYS bringt sich Matthias als Consultant ein. Dabei interessieren ihn besonders die Themen Icinga und Ansible (er nimmt sie sogar mit nach Hause). Neben seinem Interesse an der IT begeistert er sich außerdem für Horrorfilme - und seien sie noch so schlecht. Den klassischen Beschäftigungen wie Freunde treffen und die Sonne im Freien genießen geht er ebenso nach wie dem Abtauchen in diverse Videospielwelten. Mamma mia!

Read more from Matthias and meet the Team

Let me introduce: NWS-ID

by Achim Ledermueller | Dec 1, 2022 | Web Services, Uncategorized

We’re really excite to share an enhancement with you that puts your NWS Customer Interface experience to a whole new level! NWS-ID – our new core for managing your personal identity and access to nws.netways.de! Even if identity management sounds a bit dull to some, NWS-ID enables us to bring some new features to you. But what are these new features, you wonder? Okay, let’s get right into it and answer some questions, you might have. First:

What is NWS-ID?

NWS-ID is the future home for your personal user profile and a much desired integration to the current customer interface. Here, you can update your password, configure 2FA and edit your profile data, although we rarely save any of it. The introduction of personal accounts allows us to provide new features to the NWS Customer Interface and the associated products – including a user and group management.

User and Group Management

The first and probably biggest thing is the integration of NWS-ID with our Customer Interface at nws.netways.de, which enables us to release user and group management – a feature many customers requested and that we’re now thrilled to provide. It basically allows you to give your team access to your account and products. The role-based approach allows you to easily create user groups with appropriate permissions and invite your colleagues with their own personal NWS-ID. Thanks to fine-grained authorization settings, you decide who can access and manage your projects or even the whole organisation!

Managing multiple organisations at NWS?

No problem with NWS-ID! It’s never been easier. If you are in charge of managing several organisations at NWS you will love NWS-ID. Your user can be associated with multiple organisations and it’s easy to switch between them with a single click! You no longer have to log in again or use multiple browsers.

When will NWS-ID be available?

We will release NWS-ID in two weeks, on December 14th. All existing accounts will be migrated automatically – if you are a current NWS customer, you will receive an e-mail to renew your password on that day. That’s it! From then on, your NWS-ID is active and the user and group management is available! Don’t forget to enable two-factor authentication! It does not only sound easy, it is easy! We can’t wait for you to use and implement NWS-ID into your everyday life and to see and hear, what benefits it brings to you.

What does the future hold?

With NWS-ID as the new core for our identity management, not only you benefit from this enhancement, but also our products, which you’ll be able to access more effortlessly. Our portfolio will be gradually integrated, which simplifies the access to products and projects for your whole team. SSO is the buzzword here. Give us a little time to implement the integrations and we will of course come back to you as soon as possible!

I hope you are looking forward to the new home for your user profile! I am sure that NWS-ID complements our portfolio well and is the base for simple and good authentication and authorisation. If you have any questions along the way, please feel free to contact us – we’re always there to help answer any open questions.

Achim Ledermüller

Senior Manager Cloud

Der Exil Regensburger kam 2012 zu NETWAYS, nachdem er dort sein Wirtschaftsinformatik Studium beendet hatte. In der Managed Services Abteilung ist er für den Betrieb und die Weiterentwicklung unserer Cloud-Plattform verantwortlich.

Read more from Achim and meet the Team