Select Page

OSMC 2023 | Journey to Observability: Tracking every Function Execution in Production

by | Jan 30, 2024 | OSMC

In his talk at OSMC 2023 Lucas Copi, Kubernetes Expert at IBM Cloud, tells us about their journey to observability in their modern cloud environment based on RedHat Openshift.

First of all, let’s look at the differences between observability and monitoring.

  • Monitoring means tracking things happening on your infrastructure. It helps you to detect issues as they occur and to take action in order to counter them.
  • Observability, on the other hand, involves the collection of data. By analyzing them, it allows you to get insights about the system’s overall state.

As Lucas and his team at IBM Cloud faced issues with their old infrastructure as a big monolithic, they decided to separate it into many smaller parts – you could call them microservices. They integrated tons of tests, like about 50k of regression cases, and refactored many parts of their infrastructure’s code for better unit tests. All of that made them learn one lesson: Testing in pre production environments is not always enough.

Not testing in prod is like not practicing with the full orchestra because your solo sounded fine at home.

Usually, even the best pre-prod environment is much smaller than the actual prod environment and therefore not suitable for certain tests. Testing in production does not mean only testing in production.
Another lesson they learned: It’s not always possible to fix issues in your environment, due to not having enough metrics and logs. There are 4 golden pillars for every operation: Latency, Throughput, Errors and Saturation. There are some existing solutions that are great at adding observability to the interactions between services. They include Grafana, OpenTelemetry, istio and honeycomb. But all these were not able to satisfy all needs of Lucas’ Team. As a solution, they made a custom tool in golang, called “The Observability context”. Basically, it provides consistency throughout execution flows and across the observability pillars. They are using the new tool for measuring code performance.

Observability changed their mindset. Now, it’s not only about features and “Runs everything?”, but more “How good is it working?”. Introducing observability actually decreased the number of problems customers are facing. This shift not only overcomes testing limitations but also minimizes customer-facing issues. Observability emerges as a key catalyst for continuous improvement and reliability in modern cloud environments.

Björn Berg
Björn Berg
Consultant

Björn berät als Consultant zu allerhand Themen rund um Magie - also Monitoring, Automatisierung, Graphing und Information-/Eventmanagement. Besonders reizend findet er dabei den Grafana-Stack, Ansible, Icinga und Prometheus. Neben der IT brennt er einerseits fürs Bouldern und Radfahren, andererseits aber auch für das Kino und Filme sammeln.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

More posts on the topic OSMC

OSMC 2024 | Workshop Update!

Earlier, we announced the first set of workshops for this year's Open Source Monitoring Conference. Now, we’re excited to announce two additional workshop topics that will be available on November 19.   OpenTelemetry Fundamentals This workshop, held by Markus...

Submit Your Talk for OSMC 2024!

Attention all tech enthusiasts, developers, and IT professionals! The Open Source Monitoring Conference (OSMC) 2024 is fast approaching, and this is your reminder call to submit your talks! If you have deep technical insights, innovative solutions, or groundbreaking...