Current State of Icinga by Bernd Erk | OSMC 2019

This entry is part of 1 in the series OSMC 2019 | Recap

 

 

At the Open Source Monitoring Conference (OSMC) 2019 in Nuremberg, Bernd Erk presented the „Current State of Icinga”. You have missed him speaking? We have got something for you: Watch the video of Bernd‘s presentation and read a summary (below).

The OSMC is the annual meeting of international monitoring experts, where future trends and objectives are set. Since 2006 the event takes place every autumn in Nuremberg, Germany. Leading specialists present the full scope of Open Source monitoring and are ready to answer your hardest questions. Learn new techniques, exchange knowledge and discuss with top developers.

In-depth workshops the day prior to the conference and a Hackathon provide further possibilities to extend your skills and deepen your knowledge in IT monitoring and management.

The next OSMC takes place November 16 – 19, 2020 in Nuremberg.

More information and tickets at osmc.de.


Current State of Icinga

In the talk „Current State of Icinga“ Bernd Erk shortly introduces himself and the team behind Icinga. Bernd’s presentation gives at first a quick overview over Icinga, followed by a really funny presentation with Emojis (Long story short: Icinga makes you happy!). After that Bernd explains the blog, the ongoing user survey, IcingaConf, Icinga Camp and Icinga partners all around the world. He also presents the reason why we don’t know most of the Icinga users: As it is an open-source product, anyone can download it and use it for free.

In the main part Bernd gives a product update for Icinga and everything connected to it. Thereupon he reports about what happened during summer 2019 regarding development. After that he goes deeper into the new features and innovations of Icinga version 2.11, which is the current major version as of November 2019. In the second part of the main presentation he illustrates Icinga Web 2 and the new features of the current version 2.7 (November 2019) and its accessibility features. vSphere version 1.1 is the third main theme and Bernd touches on its new features and improvements. The topic after vSphere is Icinga Director and its current version 1.7 (November 2019). Bernd discloses the new features of the Director and what it is capable of. He completes his talk with aspects of Icinga Business Process Monitoring in its newest version 2.2 (November 2019). Here the main components are the Drag & Drop feature, Export & Import and Usability. With regard to this Bernd presents the mentioned features and innovations in a short live demo.

After the live demo Bernd introduces Icinga for Windows. A short video of the installation of Icinga on Windows is shown. Christian Stein, a long-standing member of the Icinga team, is the main developer of this outstanding Icinga innovation. After having presented Windows monitoring, the focus moved on to Icinga for AWS (Amazon Web Services) version 1.0 (November 2019) and its possibilities. Thereafter Bernd goes in deep with the Icinga module for JIRA in version 1.0 (November 2019) and how it works.

Icinga DB is the last topic covered. For that a few last year’s slides were repeated, followed by a funny video about why it took so long. After this hilarious video a live demo of all the new features and innovations is given.

Bernd concludes his talk with a summary:

  • Icinga Director 1.7.2 is out now
  • Icinga Module for vSphere 1.1.0 is out now
  • Icinga Module for JIRA 1.0.1 is out now
  • Icinga 2.12 RC will be ready later

And that is basically Bernd’s entire talk, though highly compressed. If you are interested in the full talk with all its details and funny moments I recommend watching the whole video. It is worth every second, entertaining and highly informative.

Nathaniel Donahue
Nathaniel Donahue
Junior Consultant

Nathaniel hat 2019 die Wirtschaftsschule abgeschlossen. Wegen seinem Interesse am IT-Bereich entschied er sich dafür eine Ausbildung zum Fachinformatiker im Bereich Systemintegration zu machen und fing im September 2019 bei NETWAYS Professional Services an. Auch in seiner Freizeit sitzt er gerne am Computer, allerdings meistens zum Spielen, oder er unternimmt etwas mit seinen Freunden.

Monitoring Kubernetes mit Prometheus

This entry is part 5 of 6 in the series Kubernetes - so startest Du durch!

Monitoring – für viele eine gewisse Hass-Liebe. Die einen mögen es, die anderen verteufeln es. Ich gehöre zu denen, die es meist eher verteufeln, dann aber meckern, wenn man gewisse Metriken und Informationen nicht einsehen kann. Unabhängig der persönlichen Neigungen zu diesem Thema ist der Konsens aller jedoch sicher: Monitoring ist wichtig und ein Setup ist auch nur so gut wie sein dazugehöriges Monitoring.

Wer seine Anwendungen auf Basis von Kubernetes entwickeln und betreiben will, stellt sich zwangsläufig früher oder später die Frage, wie man diese Anwendungen und den Kubernetes Cluster überwachen kann. Eine Variante ist der Einsatz der Monitoringlösung Prometheus; genauer gesagt durch die Verwendung des Kubernetes Prometheus Operators. Eine beispielhafte und funktionale Lösung wird in diesem Blogpost gezeigt.

Kubernetes Operator

Kubernetes Operators sind kurz erklärt Erweiterungen, mit denen sich eigene Ressourcentypen erstellen lassen. Neben den Standard-Kubernetes-Ressourcen wie Pods, DaemonSets, Services usw. kann man mit Hilfe eines Operators auch eigene Ressourcen nutzen. In unserem Beispiel kommen neu hinzu: Prometheus, ServiceMonitor und weitere. Operators sind dann von großem Nutzen, wenn man für seine Anwendung spezielle manuelle Tasks ausführen muss, um sie ordentlich betreiben zu können. Das könnten beispielsweise Datenbank-Schema-Updates bei Versionsupdates sein, spezielle Backupjobs oder das Steuern von Ereignissen in verteilten Systemen. In der Regel laufen Operators – wie gewöhnliche Anwendungen auch – als Container innerhalb des Clusters.

Wie funktioniert es?

Die Grundidee ist, dass mit dem Prometheus Operator ein oder viele Prometheus-Instanzen gestartet werden, die wiederum durch den ServiceMonitor dynamisch konfiguriert werden. Das heißt, es kann an einem gewöhnlichen Kubernetes Service mit einem ServiceMonitor angedockt werden, der wiederum ebenfalls die Endpoints auslesen kann und die zugehörige Prometheus-Instanz entsprechend konfiguriert. Verändern sich der Service respektive die Endpoints, zum Beispiel in der Anzahl oder die Endpoints haben neue IPs, erkennt das der ServiceMonitor und konfiguriert die Prometheus-Instanz jedes Mal neu. Zusätzlich kann über Configmaps auch eine manuelle Konfiguration vorgenommen werden.

Voraussetzungen

Voraussetzung ist ein funktionierendes Kubernetes Cluster. Für das folgende Beispiel verwende ich einen NWS Managed Kubernetes Cluster in der Version 1.16.2.

Installation Prometheus Operator

Zuerst wird der Prometheus-Operator bereitgestellt. Es werden ein Deployment, eine benötigte ClusterRole mit zugehörigem ClusterRoleBinding und einem ServiceAccount definiert.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: prometheus-operator
    app.kubernetes.io/version: v0.38.0
  name: prometheus-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-operator
subjects:
- kind: ServiceAccount
  name: prometheus-operator
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: prometheus-operator
    app.kubernetes.io/version: v0.38.0
  name: prometheus-operator
rules:
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - create
- apiGroups:
  - apiextensions.k8s.io
  resourceNames:
  - alertmanagers.monitoring.coreos.com
  - podmonitors.monitoring.coreos.com
  - prometheuses.monitoring.coreos.com
  - prometheusrules.monitoring.coreos.com
  - servicemonitors.monitoring.coreos.com
  - thanosrulers.monitoring.coreos.com
  resources:
  - customresourcedefinitions
  verbs:
  - get
  - update
- apiGroups:
  - monitoring.coreos.com
  resources:
  - alertmanagers
  - alertmanagers/finalizers
  - prometheuses
  - prometheuses/finalizers
  - thanosrulers
  - thanosrulers/finalizers
  - servicemonitors
  - podmonitors
  - prometheusrules
  verbs:
  - '*'
- apiGroups:
  - apps
  resources:
  - statefulsets
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - list
  - delete
- apiGroups:
  - ""
  resources:
  - services
  - services/finalizers
  - endpoints
  verbs:
  - get
  - create
  - update
  - delete
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get
  - list
  - watch
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: prometheus-operator
    app.kubernetes.io/version: v0.38.0
  name: prometheus-operator
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: controller
      app.kubernetes.io/name: prometheus-operator
  template:
    metadata:
      labels:
        app.kubernetes.io/component: controller
        app.kubernetes.io/name: prometheus-operator
        app.kubernetes.io/version: v0.38.0
    spec:
      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --logtostderr=true
        - --config-reloader-image=jimmidyson/configmap-reload:v0.3.0
        - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.38.0
        image: quay.io/coreos/prometheus-operator:v0.38.0
        name: prometheus-operator
        ports:
        - containerPort: 8080
          name: http
        resources:
          limits:
            cpu: 200m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        securityContext:
          allowPrivilegeEscalation: false
      nodeSelector:
        beta.kubernetes.io/os: linux
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
      serviceAccountName: prometheus-operator
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: prometheus-operator
    app.kubernetes.io/version: v0.38.0
  name: prometheus-operator
  namespace: default
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: prometheus-operator
    app.kubernetes.io/version: v0.38.0
  name: prometheus-operator
  namespace: default
spec:
  clusterIP: None
  ports:
  - name: http
    port: 8080
    targetPort: http
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/name: prometheus-operator
$ kubectl apply -f 00-prometheus-operator.yaml
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
serviceaccount/prometheus-operator created
service/prometheus-operator created

Role Based Access Control

Zusätzlich werden entsprechende Role Based Access Control (RBAC) Policies benötigt. Die Prometheus-Instanzen (StatefulSets), gestartet durch den Prometheus-Operator, starten Container unter dem gleichnamigen ServiceAccount “Prometheus”. Dieser Account benötigt lesenden Zugriff auf die Kubernetes API, um später die Informationen über Services und Endpoints auslesen zu können.

Clusterrole

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
$ kubectl apply -f 01-clusterrole.yaml
clusterrole.rbac.authorization.k8s.io/prometheus created

 

ClusterRoleBinding

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default
$ kubectl apply -f 01-clusterrolebinding.yaml
clusterrolebinding.rbac.authorization.k8s.io/prometheus created

 

ServiceAccount

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
$ kubectl apply -f 01-serviceaccount.yaml
serviceaccount/prometheus created

Monitoring von Kubernetes Cluster Nodes

Es gibt diverse Metriken, die aus einem Kubernetes Cluster ausgelesen werden können. In diesem Beispiel wird zunächst nur auf die Systemwerte der Kubernetes Nodes eingegangen. Für die Überwachung der Kubernetes Cluster Nodes bietet sich die ebenfalls vom Prometheus-Projekt bereitgestellte Software “Node Exporter” an. Diese liest sämtliche Metriken über CPU, Memory sowie I/O aus und stellt diese Werte unter /metrics zum Abruf bereit. Prometheus selbst “crawlet” diese Metriken später in regelmäßigen Abständen. Ein DaemonSet steuert, dass jeweils ein Container/Pod auf einem Kubernetes Node gestartet wird. Mit Hilfe des Services werden alle Endpoints unter einer Cluster IP zusammengefasst.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
      name: node-exporter
    spec:
      hostNetwork: true
      hostPID: true
      containers:
      - image: quay.io/prometheus/node-exporter:v0.18.1
        name: node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          name: scrape
        resources:
          requests:
            memory: 30Mi
            cpu: 100m
          limits:
            memory: 50Mi
            cpu: 200m
        volumeMounts:
        - name: proc
          readOnly:  true
          mountPath: /host/proc
        - name: sys
          readOnly: true
          mountPath: /host/sys
      volumes:
      - name: proc
        hostPath:
          path: /proc
      - name: sys
        hostPath:
          path: /sys
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: node-exporter
  annotations:
    prometheus.io/scrape: 'true'
  name: node-exporter
spec:
  ports:
  - name: metrics
    port: 9100
    protocol: TCP
  selector:
    app: node-exporter
$ kubectl apply -f 02-exporters.yaml
daemonset.apps/node-exporter created
service/node-exporter created

Service Monitor

Mit der sogenannten Third Party Ressource “ServiceMonitor”, bereitgestellt durch den Prometheus Operator, ist es möglich, den zuvor gestarteten Service, in unserem Fall node-exporter, für die zukünftige Überwachung aufzunehmen. Die TPR selbst erhält ein Label team: frontend, das wiederum später als Selector für die Prometheus-Instanz genutzt wird.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: node-exporter
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: node-exporter
  endpoints:
  - port: metrics
$ kubectl apply -f 03-service-monitor-node-exporter.yaml
servicemonitor.monitoring.coreos.com/node-exporter created

Prometheus-Instanz

Es wird eine Prometheus-Instanz definiert, die nun alle Services anhand der Labels sammelt und von deren Endpoints die Metriken bezieht.

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  serviceAccountName: prometheus
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  resources:
    requests:
      memory: 400Mi
  enableAdminAPI: false
$ kubectl apply -f 04-prometheus-service-monitor-selector.yaml
prometheus.monitoring.coreos.com/prometheus created

Prometheus Service

Die gestartete Prometheus-Instanz wird mit einem Service-Objekt exponiert. Nach einer kurzen Wartezeit ist ein Cloud-Loadbalancer gestartet, der aus dem Internet erreichbar ist und Anfragen zu unserer Prometheus-Instanz durchreicht.

apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  type: LoadBalancer
  ports:
  - name: web
    port: 9090
    protocol: TCP
    targetPort: web
  selector:
    prometheus: prometheus
$ kubectl apply -f 05-prometheus-service.yaml
service/prometheus created
$ kubectl get services
NAME         TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
prometheus   LoadBalancer   10.254.146.112    pending      9090:30214/TCP   58s

Sobald die externe IP-Adresse verfügbar ist, kann diese über http://x.x.x.x:9090/targets aufgerufen werden und man sieht alle seine Kubernetes Nodes, deren Metriken ab sofort regelmäßig abgerufen werden. Kommen später weitere Nodes hinzu, so werden diese automatisch mit aufgenommen bzw. wieder entfernt.

Visualisierung mit Grafana

Die gesammelten Metriken, lassen sich leicht und ansprechend mit Grafana visualisieren. Grafana ist ein Analyse-Tool, das diverse Datenbackends unterstützt.

apiVersion: v1
kind: Service
metadata:
  name: grafana
spec:
#  type: LoadBalancer
  ports:
  - port: 3000
    targetPort: 3000
  selector:
    app: grafana
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: grafana
  name: grafana
spec:
  selector:
    matchLabels:
      app: grafana
  replicas: 1
  revisionHistoryLimit: 2
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - image: grafana/grafana:latest
        name: grafana
        imagePullPolicy: Always
        ports:
        - containerPort: 3000
        env:
          - name: GF_AUTH_BASIC_ENABLED
            value: "false"
          - name: GF_AUTH_ANONYMOUS_ENABLED
            value: "true"
          - name: GF_AUTH_ANONYMOUS_ORG_ROLE
            value: Admin
          - name: GF_SERVER_ROOT_URL
            value: /api/v1/namespaces/default/services/grafana/proxy/
$ kubectl apply -f grafana.yaml
service/grafana created
deployment.apps/grafana created
$ kubectl proxy
Starting to serve on 127.0.0.1:8001

Sobald die Proxy-Verbindung durch kubectl verfügbar ist, kann die gestartete Grafana-Instanz via http://localhost:8001/api/v1/namespaces/default/services/grafana/proxy/ im Browser aufgerufen werden. Damit die in Prometheus vorliegenden Metriken jetzt auch visuell ansprechend dargestellt werden können, sind nur noch wenige weitere Schritte notwendig. Zuerst wird eine neue Data-Source vom Typ Prometheus angelegt. Dank des kuberneteseigenen und -internen DNS lautet die URL http://prometheus.default.svc:9090. Das Schema ist servicename.namespace.svc. Alternativ kann natürlich auch die Cluster-IP verwendet werden.

Für die gesammelten Metriken des node-exporters gibt es bereits ein sehr vollständiges Grafana-Dashboard, das sich über die Import-Funktion importieren lässt. Die ID des Dashboards ist 1860.

Nach dem erfolgreichem Import des Dashboards können jetzt die Metriken begutachtet werden.

Monitoring weiterer Anwendungen

Neben diesen eher technischen Statistiken sind auch weitere Metriken der eigenen Anwendungen möglich, beispielsweise HTTP Requests, SQL Queries, Business-Logik und vieles mehr. Hier werden einem durch das sehr flexible Datenformat kaum Grenzen gesetzt. Um seine eigenen Metriken zu sammeln, gibt es wie immer mehrere Lösungsansätze. Einer davon ist, seine Anwendung mit einem /metrics Endpunkt auszustatten. Manche Frameworks wie z.B. Ruby on Rails haben bereits brauchbare Erweiterungen. Ein weiterer Ansatz sind sogenannte Sidecars. Ein Sidecar ist ein zusätzlicher Container, der neben dem eigentlichen Anwendungscontainer mitläuft. Beide zusammen ergeben einen Pod, der sich Namespace, Netzwerk etc. teilt. In dem Sidecar läuft dann Code, der die Anwendung prüft und die Ergebnisse als parsebare Werte für Prometheus zur Verfügung stellt. Im Wesentlichen können beide Ansätze, wie im oben gezeigten Beispiel, mit dem Prometheus Operator verknüpft werden.

Sebastian Saemann
Sebastian Saemann
Head of Managed Services

Sebastian kam von einem großen deutschen Hostingprovider zu NETWAYS, weil ihm dort zu langweilig war. Bei uns kann er sich nun besser verwirklichen, denn er leitet das Managed Services Team. Wenn er nicht gerade Cloud-Komponenten patched, versucht er mit seinem Motorrad einen neuen Rundenrekord aufzustellen.

Tick Tock: What the heck is time-series data? by Tanay Pant | OSDC 2019

This entry is part 6 of 6 in the series OSDC 2019 | Recap

 

The rise of IoT and smart infrastructure has led to the generation of massive amounts of complex data. In his talk at the Open Source Data Center Conference (OSDC) 2019 Tanay Pant brought up a question to gather insights: Tick Tock: What the heck is time-series data? See the video of Tanay‘s presentation and read a summary (below).

The former OSDC will be held for the first time in 2020 under the new name stackconf. With the changes in modern IT in recent years, the focus of the conference has increasingly shifted from a mainly static infrastructure approach to a broader spectrum that includes agile methods, continuous integration, container, hybrid and cloud solutions. This development is taken into account by changing the name of the conference and opening the topic area for further innovations.

Due to concerns around the coronavirus (COVID-19), the decision was made to hold stackconf 2020 as an online conference. The online event will now take place from June 16 – 18, 2020. Join us, live online! Save your ticket now at: stackconf.eu/ticket/


Tick Tock: What the heck is time-series data?

Today we are going to talk about topics like what is time-series and how the load of different file forms are distributed, different use cases where time-series are used frequently. Then we’ll talk about how Create-DB helps to communicate with machine files.

What are time series?

To answer this question we present a sensor that sends the files in a period of time. When we want to read in or display this file, the time would be an axis. Compared to other workloads this file is not added to the database as an update, the time-series is added as an input and this is the primary way for this process. Time-series in database is basically introducing efficiencies through temporal treatment and this allows us to intuitively have this set of files like monitoring in different times in all aspects of our operation.

Now we have a view on time-series. If you create an abstract, look at different use cases of time-series and the way the data was generated. You can categorize them in two different ways. The first one is IT and monitoring, what can be described as a traditional use of time-series databases. When we have a look at the properties in this, one can say there are tens or hundreds of metrics or sensors as well as a lot of complex data and queries that are often larger than several gigabytes. Flux DB is a good example in this category.

We have industrial sensor data and this is an emerging sector that has not been much talked about. There are also hundreds or thousands of sensors or metrics, too. So the real-time queries are under pressure, which must be able to access all the gigabytes of data. Create-DB is a good example in this case.

We start with core technology and see what exactly Create-DB is and how it differs from other databases in this segment. Create-DB is a new type of distribution continuation database that is best suited for handling industrial sensor data, due to its ease of use and ability to handle a lot of different data, as well as a thousand different sensor data. Create-DB supports distributed SQL with full-text search and data queries, and also coordinates different nodes in a DB Cluster seamlessly with one another. In addition, the execution of write and query operations across nodes in clusters are automatically distributed. Create-DB has columnar caches for time-series in memory SQL performance so time-series normally require all data in main memory to fit, which limits the amount of data that can be managed within a specific time.

One solution for time-series performance without data volume restrictions is to implement the residence of memory in filled caches at each node, so that the caches tell the query engine whether there are any records on this node and where those records are. Distributed query processing also contributes to fast performance and a query planner that makes wise decisions about which nodes are best suited for execution. And it has machine data functions with a cloud native that makes it seamless in the cloud. Finally, we look at a few advantages of Create-DB. The Create-DB installation is simple. You can create an instance of Create-DB with a single line on the terminal or docker. It has a distributed query engine that supports full-text queries. It can handle economic hardware and instances well, and it is easy to scale the architecture.

Saeid Hassan-Abadi
Saeid Hassan-Abadi
Junior Consultant

Saeid hat im September 2019 seine Ausbildung zum Fachinformatiker im Bereich Systemintegration gestartet. Der gebürtige Perser hat in seinem Heimatland Iran Wirtschaftsindustrie-Ingenieurwesen studiert. Er arbeitet leidenschaftlich gerne am Computer und eignet sich gerne neues Wissen an. Seine Hobbys sind Musik hören, Sport treiben und mit seinen Freunden Zeit verbringen.
Bringing Bulk Editing to a new Level with Icinga DB

Bringing Bulk Editing to a new Level with Icinga DB

Those of you, who already tried out the new Icinga DB Web Module might have noticed a new button next to the search bar in list views.

With Icinga DB we are proud to introduce a powerful new feature for bulk editing multiple objects.

You might be already familiar with the present multiselect feature, which enables you to select multiple list objects by Ctrl(Cmd on Mac)- or Shift-clicking. It enabled you to do bulk actions on multiple objects. This was helpful, but reached its limits – usability and perfomance-wise – when you were intending to deal with a large number of objects.

Continue With Filter goes beyond that and brings bulk editing to a whole new level. Setting downtimes, comments or acknowledgements works now a lot more efficient, because you can apply them to a filtered set of objects.

Setting downtimes to a large number of objects

So, let’s say you want to set downtimes on functional services on a certain host.

1) Create a filter with the filter editor
2) Select all resulting objects by clicking “Continue with filter”
3) You can now execute actions on all of the objects

See it in action

 

For the realisation of this feature we made multiselect urls handle filter parameters. And because it is also more performant, we use it in different places all around the new web module:

* The new badges in the host and service groups view
* The new service widget in the host detail view

You can now reach a new bulk object detail view by passing the filter syntax via the url.

`/icingadb/hosts?([filter syntax])`

It can also be easily added among other parameters, e.g.

`/icingaweb2/icingadb/hosts?state.soft_state=0&([filter syntax])`

To learn more about the new UI Features of the Icinga DB Web module, you can have a look at this post.

Happy bulk editing!

Florian Strohmaier
Florian Strohmaier
Senior UX Designer

Mit seinen Spezialgebieten UI-Konzeption, Prototyping und Frontendentwicklung unterstützt Florian das Dev-Team bei NETWAYS. Trotz seines Design-Backgrounds fühlt er sich auch in der Technik zuhause. Gerade die Kombination aus beidem hat für ihn einen besonderen Reiz.

HP controller firmware issues to check

Just about a week ago I posted a short blog post introducing a new check to verify firmware of SSD disks by HPE. Since our customer informed us about another bulletin he has to take care of, we extended the check to support RAID controllers, and verify if a problematic firmware needs to be patched.

HPE says about the issue in bulletin a00097210:

HPE Smart Array SR Gen10 Controller Firmware Version 2.65 (or later) provided in the Resolution section of this document is required to prevent a potential data inconsistency on select RAID configurations with Smart Array Gen10 Firmware Version 1.98 through 2.62, based on the following scenarios. HPE strongly recommends performing this upgrade at the customer’s earliest opportunity per the “Action Required” in the table located in the Resolution section. Neglecting to perform the recommended resolution could result in potential subsequent errors and potential data inconsistency.

Important: Please read the full document and verify with your used hardware.

For controllers, the check will alert you with a CRITICAL when the firmware is in the affected range with:

  • if you have RAID 1/10/ADM – update immediately!
  • if you have RAID 5/6/50/60 – update immediately!

And it will add a short note when firmware older than affected or firmware has been updated. At the moment the plugin does not verify configured logical drives, but we believe you should update in any case.

Please see the repository and README on GitHub for all details, you can download the binaries from releases.

All information about affected disks can be found on GitHub or the previous blogpost.

OK - All 2 controllers and 33 drives seem fine
[OK] controller (0) model=p816i-a serial=XXX firmware=1.65 - firmware older than affected
[OK] controller (4) model=p408e-p serial=XXX firmware=1.65 - firmware older than affected
[OK] (0.9 ) model=MO003200JWFWR serial=XXX firmware=HPD2 hours=8086
[OK] (0.11) model=EK000400GWEPE serial=XXX firmware=HPG0 hours=8086
[OK] (0.12) model=EK000400GWEPE serial=XXX firmware=HPG0 hours=8086
[OK] (0.14) model=MO003200JWFWR serial=XXX firmware=HPD2 hours=8086
[OK] (4.0 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.1 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.2 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.3 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.4 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.5 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.6 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.24) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.25) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.26) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.27) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.28) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.29) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.30) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.31) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.50) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.51) model=MO003200JWFWR serial=XXX firmware=HPD2 hours=7568
...
Markus Frosch
Markus Frosch
Principal Consultant

Markus arbeitet bei NETWAYS als Principal Consultant und unterstützt Kunden bei der Implementierung von Nagios, Icinga und anderen Open Source Systems Management Tools. Neben seiner beruflichen Tätigkeit ist Markus aktiver Mitarbeiter im Debian Projekt.

HPE SSD drives vulnerable to uptime counter bug

With two bulletins published by Hewlett Packard Enterprise (HPE), several solid state disks (SSD) were declared vulnerable to a software bug, which causes the counter for uptime hours to overflow after 32768 or 4000 hours and renders the disk completely inaccessible. A quote from the vendor:

This … firmware is considered a critical fix and is required to address the issue detailed below. HPE strongly recommends immediate application of this critical fix. Neglecting to update to SSD Firmware Version … will result in drive failure and data loss at 40,000 (or 32,768) hours of operation and require restoration of data from backup if there is no fault tolerance, such as RAID 0 or even in a fault tolerance RAID mode if more SSDs fail than can be supported by the fault tolerance of the RAID mode on the logical drive. Example: RAID 5 logical drive with two failed SSDs.

One of our customers asked us for help with identifying the affected drives, since they noticed some of their servers being affected. We have written a custom Icinga plugin to check for affected drives and to identify where firmware updates are required. The only requirement is SNMP access to the servers or devices that need to be checked. The plugin lists all found drives, compares them against a list of affected models and compares the firmware version against the recommended fix by HPE.

When everything is fine, you should see something like this:

OK - All 2 controllers and 33 drives seem fine
[OK] controller (0) model=p816i-a serial=XXX firmware=1.65 - firmware older than affected
[OK] controller (4) model=p408e-p serial=XXX firmware=1.65 - firmware older than affected
[OK] (0.9 ) model=MO003200JWFWR serial=XXX firmware=HPD2 hours=8086
[OK] (0.11) model=EK000400GWEPE serial=XXX firmware=HPG0 hours=8086
[OK] (0.12) model=EK000400GWEPE serial=XXX firmware=HPG0 hours=8086
[OK] (0.14) model=MO003200JWFWR serial=XXX firmware=HPD2 hours=8086
[OK] (4.0 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.1 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.2 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.3 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.4 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.5 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied

You can find the plugin on GitHub under check_hp_firmware where the release page provides the built binaries for Linux.

Feedback or questions are welcome as GitHub issues, directly in the project.

Please make sure you have also read the official documents from HPE:

Update 2020-04-09: The plugin was enhanced to check for controller firmware vulnerabilities as well, and is now named check_hp_firmware. See the new blog post.

Markus Frosch
Markus Frosch
Principal Consultant

Markus arbeitet bei NETWAYS als Principal Consultant und unterstützt Kunden bei der Implementierung von Nagios, Icinga und anderen Open Source Systems Management Tools. Neben seiner beruflichen Tätigkeit ist Markus aktiver Mitarbeiter im Debian Projekt.