Select Page

NETWAYS Blog

OSMC 2023 | Journey to Observability: Tracking every Function Execution in Production

In his talk at OSMC 2023 Lucas Copi, Kubernetes Expert at IBM Cloud, tells us about their journey to observability in their modern cloud environment based on RedHat Openshift.

First of all, let’s look at the differences between observability and monitoring.

  • Monitoring means tracking things happening on your infrastructure. It helps you to detect issues as they occur and to take action in order to counter them.
  • Observability, on the other hand, involves the collection of data. By analyzing them, it allows you to get insights about the system’s overall state.

As Lucas and his team at IBM Cloud faced issues with their old infrastructure as a big monolithic, they decided to separate it into many smaller parts – you could call them microservices. They integrated tons of tests, like about 50k of regression cases, and refactored many parts of their infrastructure’s code for better unit tests. All of that made them learn one lesson: Testing in pre production environments is not always enough.

Not testing in prod is like not practicing with the full orchestra because your solo sounded fine at home.

Usually, even the best pre-prod environment is much smaller than the actual prod environment and therefore not suitable for certain tests. Testing in production does not mean only testing in production.
Another lesson they learned: It’s not always possible to fix issues in your environment, due to not having enough metrics and logs. There are 4 golden pillars for every operation: Latency, Throughput, Errors and Saturation. There are some existing solutions that are great at adding observability to the interactions between services. They include Grafana, OpenTelemetry, istio and honeycomb. But all these were not able to satisfy all needs of Lucas’ Team. As a solution, they made a custom tool in golang, called “The Observability context”. Basically, it provides consistency throughout execution flows and across the observability pillars. They are using the new tool for measuring code performance.

Observability changed their mindset. Now, it’s not only about features and “Runs everything?”, but more “How good is it working?”. Introducing observability actually decreased the number of problems customers are facing. This shift not only overcomes testing limitations but also minimizes customer-facing issues. Observability emerges as a key catalyst for continuous improvement and reliability in modern cloud environments.

Björn Berg
Björn Berg
Junior Consultant

Björn hat nach seinem Abitur 2019 Datenschutz und IT-Sicherheit in Ansbach studiert. Nach einigen Semestern entschied er sich auf eine Ausbildung zum Fachinformatiker für Systemintegration umzusteigen und fing im September 2021 bei NETWAYS Professional Services an. Auch in seiner Freizeit sitzt er viel vor seinem PC und hat Spaß mit diversen Spielen, experimentiert auch mit verschiedenen Linux-Distributionen herum und geht im Sommer gerne mal campen.

OSMC 2023 | What’s new with Grafana Labs’s Open Source Observability stack

I already mentioned in my recap of this year’s OSMC that I will go into more details about Sebastian Schubert’s talk giving an update on Grafana Labs’s Open Source Observability stack. In fact, I was so interested in the topic I volunteered for this blog post and made our Event team assign me the talk.
You may ask why, so you very likely are one of those who know Grafana very well but have not heard of all the other tools Grafana Labs has added to their stack over the last years. I myself just started a while ago digging deeper into it, and it feels like I can find some gold nugget down there. So I want to spread the word and perhaps cause a gold rush! 😉

 

Grafana

Sebastian started with a short introduction of him and by asking the audience who knows Grafana Logo Grafana. He was excited for sure that everyone in crowd did know about Grafana. So let’s start with the updates on Grafana. With Grafana being around for quite a while and having become the dashboard solution to go for most people it does not wonder that the most improvements are small but helpful convenience features. His examples were how the empty dashboard and panel editor were improved to help users to get the best representation of their data. Another improvement was the UI helping specifically with writing TraceQL queries instead of just taking an already existing statement which could be hard to come up with.

A completely new thing is the Visual Studio Code integration for editing and previewing dashboards. Looking at the number of colleagues using it, seeing integrating in many other tools as their Web IDE (integrated development environment) and personally thinking it is the best solution developed by Microsoft, having such an integration will make many people happy and grow the user base further.

But Grafana Labs does not only want to grow the user base, they also want to make developer’s life easier by working on a Developer Portal which combines all related information in one place. Please learn a lesson from this if working on a project where it is hard to get all the information needed to get into it!

Screencapture of the talk at the beginning

Mimir

While Mimir Logo Mimir is also around for some time, the metric solution of the stack needs some introduction especially compared to its more well-known competitors. Mimir is the (or one) successor of Cortex and the equivalent to Grafana Enterprise Metrics as an Open Source solution. Something Grafana Labs did with the complete stack, providing an equivalent to its Enterprise solution for the community as true Open Source.

Mimir was extended and improved over the last year. Most improvements I would summarize as performance enhancement in some way. But there were also new features added like allowing for alerts being sent to Webex or support for Redis as caching solution and Hashicorp Vault for more secure credential storage.

 

Loki

Loki Logo Loki has a similar problem like Mimir. It has more established competitors as log management solutions so it is not known by so many people, but I think it has some advantages you should be aware of. Sebastian did compare it more to Prometheus than its competitors as it uses a similar label-based design. On his format slide he explained this very well that an entry for Loki consists of a timestamp in nanoseconds and labels which get indexed to speed up queries and the not indexed content allowing for post-processing where all other solutions require you to optimize your data for the expected queries already before storing them.

Loki format

As you may guess post-processing could be the bottleneck in such a design, but Loki has solved this already quite well and Grafana Labs is constantly reducing the resource consumption what made me happy to hear.

 

Tempo and all the other components

Speaking about Tempo Logo Tempo Sebastian had to increase his at this point of the talk recognizing he can barely fit all the updates from one year for the complete stack in one talk. So starting with the solution for traces he got less into details. Tempo is comparable to the other solutions mentioned earlier but for traces and for this it needs many data.

Beyla Logo Beyla is another tool for tracing with a release pending and very likely to be shown in detail to the public in the near future.

Faro Logo Faro adds Frontend/Browser monitoring to the stack allowing to get details on the real user experience.

And last but not least Pyroscope Logo Pyroscope adds profiling which makes the stack cover a very big amount of data. All of those being visualized in the end as a dashboard in Grafana.

Screencapture of the talk at the end

So I really recommend at least having a look into the stack and watching the recording of Sebastian Schubert’s talk “What’s new with Grafana Labs’s Open Source Observability stack” is a good starting point for this! Another starting point could be our training on InfluxDB & Grafana.

We hope to see you around at OSMC 2024! Stay in touch and subscribe to our Newsletter!

Dirk Götz
Dirk Götz
Principal Consultant

Dirk ist Red Hat Spezialist und arbeitet bei NETWAYS im Bereich Consulting für Icinga, Puppet, Ansible, Foreman und andere Systems-Management-Lösungen. Früher war er bei einem Träger der gesetzlichen Rentenversicherung als Senior Administrator beschäftigt und auch für die Ausbildung der Azubis verantwortlich wie nun bei NETWAYS.

OSMC 2023 | Take a Walk Down Memory Lane!

Exciting news – the OSMC 2023 archives are now online! Whether you attended or missed out, you can now catch up on all the talks, speaker slides, and awesome photos from the conference.

 

Video Recordings

Dive back into the insightful talks of OSMC 2023. Our archives provide all video recordings, allowing you to immerse yourself in the expertise and engaging discussions shared by industry experts and leaders. From expert insights to cool demos and case studies, there’s a diverse range of content. Perfect for both beginners and pros.

 

Speaker Slides

Follow along with the presentations using our speakers’ slides. It’s a great way to review key points and get a deeper understanding of the topics discussed. Whether you’re a visual learner or simply want to revisit the material at your own pace, the inclusion of speaker slides enhances your learning experience.

 

Event Photos

A picture is worth a thousand words, and our collection of photos from OSMC 2023 tells a story of its own. Take a look back and relive the conference, capturing the vibrant atmosphere, engaged participants, and memorable moments.

So, what are you waiting for? Visit our event website, head to the Archives section, and enjoy!

Katja Kotschenreuther
Katja Kotschenreuther
Manager Marketing

Katja ist seit Oktober 2020 Teil des Marketing Teams. Als Manager Marketing kümmert sie sich hauptsächlich um das Marketing für die Konferenzen stackconf und OSMC sowie unsere Trainings. Zudem unterstützt sie das Icinga Team mit verschiedenen Social Media Kampagnen und der Bewerbung der Icinga Camps. Sie ist SEO-Verantwortliche für all unsere Websites und sehr viel in unserem Blog unterwegs. In ihrer Freizeit reist sie gerne, bastelt, backt und engagiert sich bei Foodsharing. Im Sommer kümmert sie sich außerdem um ihren viel zu großen Gemüseanbau.

OSMC 2023 | Behind the Scenes Part 2/2

As a trainee in marketing, I had the opportunity to attend OSMC 2023 on 8th of November. Today, I will tell you about my first-hand experience at the event and give you a few insights into what happened behind the scenes.

Insights on the Eventee App

To enhance the OSMC experience, we provided our attendees the Eventee App. It was used to create a personal agenda, get updates, connect with other attendees, and stay informed. After scanning the QR code at the back of the participants’ conference badge’s they signed up easily, and then networked like on Tinder. Swiped through profiles, chatted, and stayed in the loop with the feed’s updates. Planning their individual schedule was also an easy task with the app!

So many Chances to win a Price!

We have also used the app to send out some tips regarding our OSMC raffle. In the giveaway, there was a glass safe with a Lego set inside. Participants had to crack a four-digit code to access the prize. Hints for the number combinations were sent through the app. I have to say, you all impressed me with your quick and skillful solving of challenges, earning some fantastic Lego set rewards. Impressive!

NETWAYS WEB SERVICES, our Silver Sponsor, brought in a fun guessing game. People tried to guess how many M&M’s were in a bottle. The prize was a cool Baby Yoda Lego set. It was tricky and engaging to guess the sweets.

Our Silver Sponsor, Elastic, also had a thrilling price competition. By scanning the QR code on the front of the attendee’s conference badges, they’ve got a chance to win a Lego set as well. Cheers to the winner whose luck was on their side!

Food, Catering, and more Food

Now let’s talk about the food. Brace yourselves because the buffet at the conference hall was a nice feast. From breakfast menus with pretzels to sandwiches, delectable desserts, and vegan options, the choices were endless. And just when you thought you couldn’t possibly eat more, the catering team swooped in with their never-ending supply of delicious treats.

Our Camera Team

A big thanks to our camera team – Cecilia, Tatevik, Björn, and Saeid. They recorded all the presentations, enabling you to revisit them at your convenience on YouTube and in the conference archives. In case you missed this year’s OSMC, it’s a great way to level up your skills.

The Evening Event at Korn´s

And let’s not forget the Dinner & Drinks event at KORN´S. It was the perfect place for deep conversations, establishing valuable networks, and, of course, having a great time together. Grabbing a drink of choice, diving into discussions with the amazing community, and following up with live music – the evening could not have gone better. Even as a non-technical person, I soaked up so much knowledge on monitoring. Thank you all for sharing your expertise with me!

Our Thanks to You

We couldn’t let you leave without showing our appreciation. So, as you headed home, we handed out lunch boxes and a personal OSMC cup, your souvenir to cherish this memory with us.

We hope to see you again at OSMC 2024, where we can continue this journey together. I had absolute fun attending OSMC 2023, and I hope I was able to give you a glimpse into my view of this event. Thank you for joining us, stay tuned!

Irene Hahn
Irene Hahn
Junior Account Manager

Irene startete ihre Ausbildung bei NETWAYS im September 2023. Sie ist gespannt, wie abwechslungsreich und außergewöhnlich ihre kommenden Aufgaben werden. In ihrer Freizeit malt sie entweder an Bilder rum oder zockt an ihrer Switch.

OSMC 2023 | Day 3 Recap

Day two of the OSMC 2023 started rather quiet, but with a interesting set of talks. The following is a summary and review of some talks I watched and was interested in. Therefore not all of the talks are mentionend here and this should not be interpreted as a judgement of their quality or significance.

 

Automated update management with Renovate

Sebastian Gumprich describes his journey of introducing Renovate at scale at his work place. Renovate is a software for updating dependencies in software projects, which can be self-hosted and is therefore applicable in practically every environment.

Renovate analyses the software project which is called upon, detects the dependencies, fetches data about the available versions of those and applies then updates, if any are available, and it is configured to do so.

To integrate it better into the existing development process and to not apply more load on the developers, an application as a GitLab pipeline was chosen and realized. This approach was also scalable over a huge number of different projects and repositories then.

To work correctly (and do anything) Renovate needs some configuration, which is presented as JSON and, in most cases, rather small and easy to do

The presentation was partly about the technical ideas and problem, but also, arguably more importantly, about the human part, which I found most interesting. Part of this was, unsurprisingly, structured and extensive documentation of the relevant steps and procedures and common problems. But also some programmatic features were introduced, for example, automatically opening Issues in GitLab for faulty Renovate configuration.

To further reduce the hurdles to apply Renove to a specific project, the “Onboarding” Merge Request applying the relevant changes were quite verbose in what it should do, what the consequences would be and where and whom to ask in case of open questions.

These point may seem obvious or even trivial, but, and this is the opinion of the author, organizing different people and groups of people and communicate in a constructive and efficient way is one of the biggest hurdles in the business and approaches to this set of problems are often quite interesting and helpful.

 

Replacing NSClient++ for Windows Monitoring

The second talk I want to advertise here is Sven Nierlein’s presentation of a replacement for the NSClient++.

The start of talk was the expectable review of the NSClient++, a monitoring agent which was quite common in different availability and status monitoring setups in the past, especially on windows operating systems. Sadly the developement is progressing slower nowadays than in the past and some problems, which were not fixed, are increasingly a dealbreaker. Especially, some problems with the lack of current TLS protocols are problematic.

Writing a new agent was not really the first choice, but a comparison of current alternatives did not present a good solution since the introduction of completely new configuration, new protocols and different workflows was not a feasible way to go. The resolution was therefore to write a completely new, but compatible monitoring agent.

This offered some freedoms regarding the choice of tools. The choicethen went in the direction of the Golang language and the related toolchain. The new agent was called SNClient+ (where SN stands forSecure Naemon) and supports multiple protocols from the side of themonitoring system.

One of the is the NRPE protocol for compatibility reasons and, the prefered method, an HTTP-based method, which can be used with chec_nsc_web.

Additionally, to add more features, a general Prometheus exporter wasintegrated, which exposes the general operating system exporters of thePrometheus ecosystem. Therefore, the SNClient+ can also be used as the default node exporter.

To stay compatible and enhance the functionality further, there are not only built in plugins to test different properties on the host machine, but a generic functionality to execute third-party plugins is included.

A self-updating functionality is also built-in to make updates as easy as possible.

In summary, this is a promising new solution for an old problem and is likely worth a try.

 

Running the Infra at FOSDEM

Rather spontaneously, Sebastian Schubert made a presentation about the infrastructure at FOSDEM, one of the largest Free and Open Source Software events in the world. The event occurs yearly at the beginning of February in Brussels, and they expect around 10.000 visitors/day with around 20.000 devices which need to be connected to the internet. This would be, by itself, a challenging task, but it is a totally different scenario to deploy that kind of infrastructure for just a few days and there are no paid professionals, just volunteers which might turn up with no idea what, where and how.

The astonishing fact, that this kind of organization actually works (and that repeatedly and successfully) can probably not be admired enough.

Additional to providing network access (and some services there), there is also the video and streaming setup for the hundreds of different talks, which must not only be recorded, but also, ideally, be live-streamed to the internet (currently over third parties).

For this purpose, self-designed hardware boxes were used in the past to re-encode the video and audio in first step on site, which are increasingly replaced by more common laptops. These serves as a kind of “render farm” to prepare the material for the viewer.

Following that was a short introduction to the tools used in the network setup and especially some problems regarding using IPv6-only network in the 2020s where some parts of the internet are still only reachable via IPv4. One example here was the usage coreDNS as a replacement for bind9 (for resource usage reasons).

A generally good idea mentioned then was the introduction of monitoring on- and off-site where data was replicated and still available when there was an incident which took the equipment of the FOSDEM crew at the university offline.

Another interesting point added was the general availability of practically all relevant material to the, public which allows interested parties to get some ideas how everything works there and maybe allows the adaption to other purposes.

 

openITCOCKPIT Community Edition – Einfache Konfiguration, Module, API und mehr

In this talk, Jens Michelsons presented openITCOCKPIT monitoring system, which is one of the “Nagios-similar” monitoring systems they created at the it-novum company.

The focus lies there on creating an easily usable web-based system, where everything is integrated. A powerful HTTP API serves as the main interface for all the different components and is well documented. This allows small scale configurations via the web interface or more automated setups with other tools.

A speciality of openITCockpit is problably their own monitoring agent for remote hosts and the strong integration of other tools, including the CheckMK agent, into their systems. A migration of an existing setup in openITCockpit or extending one with other tools is therefore less painful than it could be.

Remarkable was also the extended live demo (always a risk in a presentation) which presented a typical but not simple workflow for adding some systems to the monitoring, including a combining logic of different tests.

 

Zabbix – Powerful enterprise grade monitoring driven by Open Source

Appropriately, the following talk was about Zabbix, a system quite similar in many regards to openITCockpit. Wolfgang Alper described the working principles of Zabbix and what the main concepts and functionalities are.

The direct comparison was quite interesting, as one can recognize common ideas and components, but also where philosophies and ideas differ and how different problems were addressed.

One of the most important ideas in Zabbix is the separation of concerns, where gathering of data, storage, problem detection, alarming and escalation are split up programmatically and can be treated individually. The definition of these steps and their interfaces allows developers to focus on a specific part without having to worry about the whole.

Another part of the talk was dedicated to how Zabbix handles large scale and distributed setups. At this point, a part of the Zabbix software components which is called “Proxy” comes into play, and relays directions from the central system to outliers and data the other way round.

All in all Zabbix is probably a capable tool to do the classic network monitoring task, but of course not limited to that.

 

Lorenz Kästle
Lorenz Kästle
Consultant

Lorenz hat seinen Bachelor der Informatik an der FAU gemacht und sich zuletzt mit Betriebssystemen dort beschäftigt. In seiner Freizeit beschäftigt er sich ein wenig mit XMPP und der Programmiersprache Erlang.