By now, Prometheus has become the defacto standard for monitoring containerised applications, especially when you are using orchestration tools like Kubernetes or Consul. However, when it comes to monitoring multiple Kubernetes clusters through a single plane of glass, additional tools are required. In his talk at the Open Source Monitoring Conference 2022 (OSMC), Pascal Fries showed how to set up a production monitoring landscape based on Prometheus and Thanos, spanning several Kubernetes clusters. Focussing on examples and best practices. He also elaborates on how to securely communicate between the individual components.
Pascal Fries works as a IT Consultant at the ATIX AG in Garching (near Munich) in Germany. As a specialist for Cloud Native technologies, among his main fields of expertise are configuration and administration of multi cluster Kubernetes environments. Including their monitoring with Prometheus and Thanos.
As an introduction, Paul asked the audience if they were using Kubernetes (in a single or multi cluster setup) or Prometheus. Prometheus works very well at monitoring multiple Kubernetes Clusters. But there are several problem areas that you should be aware of, like long term storage, high availability and redundancy. To illustrate this with an example, he mentioned a customer story about Kubernetes usage where multiple teams use Kubernetes, in sum over 20 clusters. There are shared, managed and owned clusters all together in that environment, and all teams need to have a single endpoint for getting their metrics. Other requirements are long term storage, high availability and push based monitoring. How can we make yure to meet all there requirements with our monitoring setup?
Thanos, a storage layer for Prometheus
To solve these requirements, he introduced us to Thanos. It is basically a storage layer for Prometheus.
To set it up, you can use sample configurations. Its also more secure, as gRPC and TLS auth are being used instead of REST and basic authentication. When Thanos is communicating with Thanos, they use TLS auth – otherwise of course basic auth.
The long term storage is realized by saving the data in an S3, which can be self hosted or on a cloud service like NETWAYS Web Services. For realizing a push-based monitoring, Thanos can act as a receiver and compactor. The receiver is a short term DB and shards/replicates the timeseries by labels. Unfortunately, the results are duplicates in the S3. The compactor on the other side deduplicates in the S3, and downsamples the data for faster queries.
All in all, Thanos is a great tool for a redundant and highly available long term storage for Prometheus Monitoring.
And the OSMC is a great conference with many interesting talks every year! Especially if you want to learn more about everything related to Open Source Monitoring. There were talks about monitoring, automation and open source in general. And many interesting talks with the attendees. Hearing and discussing the different opinions and use cases of and about technology is exciting for me. I have been working at NETWAYS as a Junior Consultant for 2 years now, and it was the second OSMC. I am looking forward to next year already!