Seite wählen

OSMC 2023 | Journey to Observability: Tracking every Function Execution in Production

von | Jan 30, 2024 | OSMC

In his talk at OSMC 2023 Lucas Copi, Kubernetes Expert at IBM Cloud, tells us about their journey to observability in their modern cloud environment based on RedHat Openshift.

First of all, let’s look at the differences between observability and monitoring.

  • Monitoring means tracking things happening on your infrastructure. It helps you to detect issues as they occur and to take action in order to counter them.
  • Observability, on the other hand, involves the collection of data. By analyzing them, it allows you to get insights about the system’s overall state.

As Lucas and his team at IBM Cloud faced issues with their old infrastructure as a big monolithic, they decided to separate it into many smaller parts – you could call them microservices. They integrated tons of tests, like about 50k of regression cases, and refactored many parts of their infrastructure’s code for better unit tests. All of that made them learn one lesson: Testing in pre production environments is not always enough.

Not testing in prod is like not practicing with the full orchestra because your solo sounded fine at home.

Usually, even the best pre-prod environment is much smaller than the actual prod environment and therefore not suitable for certain tests. Testing in production does not mean only testing in production.
Another lesson they learned: It’s not always possible to fix issues in your environment, due to not having enough metrics and logs. There are 4 golden pillars for every operation: Latency, Throughput, Errors and Saturation. There are some existing solutions that are great at adding observability to the interactions between services. They include Grafana, OpenTelemetry, istio and honeycomb. But all these were not able to satisfy all needs of Lucas‘ Team. As a solution, they made a custom tool in golang, called „The Observability context“. Basically, it provides consistency throughout execution flows and across the observability pillars. They are using the new tool for measuring code performance.

Observability changed their mindset. Now, it’s not only about features and „Runs everything?“, but more „How good is it working?“. Introducing observability actually decreased the number of problems customers are facing. This shift not only overcomes testing limitations but also minimizes customer-facing issues. Observability emerges as a key catalyst for continuous improvement and reliability in modern cloud environments.

Björn Berg
Björn Berg
Junior Consultant

Björn hat nach seinem Abitur 2019 Datenschutz und IT-Sicherheit in Ansbach studiert. Nach einigen Semestern entschied er sich auf eine Ausbildung zum Fachinformatiker für Systemintegration umzusteigen und fing im September 2021 bei NETWAYS Professional Services an. Auch in seiner Freizeit sitzt er viel vor seinem PC und hat Spaß mit diversen Spielen, experimentiert auch mit verschiedenen Linux-Distributionen herum und geht im Sommer gerne mal campen.

0 Kommentare

Einen Kommentar abschicken

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert

Mehr Beiträge zum Thema OSMC

OSMC 2023 | Take a Walk Down Memory Lane!

Exciting news – the OSMC 2023 archives are now online! Whether you attended or missed out, you can now catch up on all the talks, speaker slides, and awesome photos from the conference.   Video Recordings Dive back into the insightful talks of OSMC 2023. Our...

OSMC 2023 | Behind the Scenes Part 2/2

As a trainee in marketing, I had the opportunity to attend OSMC 2023 on 8th of November. Today, I will tell you about my first-hand experience at the event and give you a few insights into what happened behind the scenes. Insights on the Eventee App To enhance the...

OSMC 2023 | Day 3 Recap

Day two of the OSMC 2023 started rather quiet, but with a interesting set of talks. The following is a summary and review of some talks I watched and was interested in. Therefore not all of the talks are mentionend here and this should not be interpreted as a...