Seite wählen

NETWAYS Blog

OSDC 2019: Buzzwo…erm…DevOps, Agile & YAML programmers

Cheers from Berlin, Moabit, 11th round for OSDC keeping you in the loop with everything with and around DevOps, Kubernetes, log event management, config management, … and obviously magnificent food and enjoyable get-together.

 

Goooood mooooorning, Berlin!

DevOps neither is the question, nor the answer … Arnold Bechtoldt from inovex kicked off OSDC with a provocative talk title. After diving through several problems and scenarios in common environments, we learned to fail often and fail hard, and improve upon. DevOps really is a culture, and not a job title. Also funny – the CV driven development, or when you propose a tool to prepare for the next job 🙂 One key thing I’ve learned – everyone gets the SAME permissions, which is kind of hard with the learned admin philosophy. Well, and obviously we are YAML programmers now … wait … oh, that’s truly inspired by Mr. Buytaert, isn’t it? 😉

Next up, Nicolas Frankel took us on a journey into logs and scaling at Exoscale. Being not the only developer in the room, he showed off that debug logging with computed results actually eats a lot of resources. Passing the handles/pointers to lazy log function is key here, that reminds me of rewriting the logging backend for Icinga 2 😉 Digging deeper, he showed a UML diagram with the log flow – filebeat collects logs, logstash parses the logs into JSON and Elasticsearch stores that. If you want to go fast, you don’t care about the schema and let ES do the work. Running a query then will be slow, not really matching the best index then – lesson learned. To conclude with, we’ve learned that filebeat actually can parse the log events into JSON already, so if you don’t need advanced filtering, remove Logstash from your log event stream for better performance.

Right before the magnificent lunch, Dan Barker being a chief architect at RSA Security for the Archer platform shared stories from normal production environments to actually following the DevOps spirit. Or, to avoid these hard buzzwords, just like „agile“, and to quote „A former colleague told me: ‚I’ve now understood agile – it’s like waterfall but with shorter steps.'“. He’s also told about important things – you’re not alone, praise your team members publicly.

 

Something new at OSDC: Ignites

Ignite time after lunch – Werner Fischer challenged himself with a few seconds per slide explaining microcode debugging to the audience, while Time Meusel shared their awesome work within the Puppet community with logs of automation involved (modulesync, etc) at Voxpupuli. Dan Barker really talked fast about monitoring best practices, whereas one shouldn’t put metrics into log aggregation tools and use real business metrics.

 

The new hot shit

Demo time – James „purpleidea“ Shubin showed the latest developments on mgmt configuration, including the DSL similar to Puppet. Seeing the realtime changes and detecting combined with dynamic processing of e.g. setting the CPU counts really looks promising. Also the sound exaggeration tests with the audience where just awesome. James not only needs hackers, docs writers, testers, but also sponsors for more awesome resource types and data collectors (similar to Puppet facts).

Our Achim „AL“ Ledermüller shared the war stories on our storage system, ranging from commercial netApp to GlusterFS („no one uses that in production“) up until the final destination with Ceph. Addictive story with Tim mimicking the customer asking why the clusterfuck happened again 😉

Kedar Bidarkar from Red Hat told us more about KubeVirt which extends the custom resource definitions available from k8s with the VM type. There are several components involved: operator, api, handler, launcher in order to actually run a virtual machine. If I understand that correctly, this combines Kubernetes and Libvirt to launch real VMs instead of containers – sounds interesting and complicated in the same sentence.

Kubernetes operators the easy way – Matt Jarvis from Mesosphere introduced Kudo today. Creating native Kubernetes operators can become really complex, as you need to know a lot about the internals of k8s. Kudo aims to simplify creating such operators with a universal declarative operator configured via YAML.

 

Oh, they have food too!

The many coffee breaks with delicious Käsekuchen (or: Kaiser Torte ;)) also invite to visit our sponsor booths too. Keep an eye on the peeps from Thomas-Krenn AG, they have #drageekeksi from Austria with them. We’re now off for the evening event at the Spree river, chatting about the things learnt thus far with a G&T or a beer 🙂

PS: Follow the #osdc stream and NetwaysEvents on Twitter for more, and join us next year!

Modern C++ programming: Coroutines with Boost

(c) https://xkcd.com/303/

We’re rewriting our network stack in Icinga 2.11 in order to to eliminate bugs with timeouts, connection problems, improve the overall performance. Last but not least, we want to use modern library code instead of many thousands of lines of custom written code. More details can be found in this GitHub issue.

From a developer’s point of view, we’ve evaluated different libraries and frameworks before deciding on a possible solution. Alex created several PoCs and already did a deep-dive into several Boost libraries and modern application programming. This really is a challenge for me, keeping up with the new standards and possibilities. Always learning, always improving, so I had a read on the weekend in „Boost C++ Application Development Cookbook – Second Edition„.

One of things which are quite resource consuming in Icinga 2 Core is multi threading with locks, waits and context switching. The more threads you spawn and manage, the more work needs to be done in the Kernel, especially on (embedded) hardware with a single CPU core. Jean already shared insights how Go solves this with Goroutines, now I am looking into Coroutines in C++.

 

Coroutine – what’s that?

Typically, a function in a thread runs, waits for locks, and later returns, freeing the locked resource. What if such a function could be suspended at certain points, and continue once there’s resources available again? The benefit would also be that wait times for locks are reduced.

Boost Coroutine as library provides this functionality. Whenever a function is suspended, its frame is put onto the stack. At a later point, it is then resumed. In the background, the Kernel is not needed for context switching as only stack pointers are stored. This is done with Boost’s Context library which uses hardware registers, and is not portable. Some architectures don’t support it yet (like Sparc).

Boost.Context is a foundational library that provides a sort of cooperative multitasking on a single thread. By providing an abstraction of the current execution state in the current thread, including the stack (with local variables) and stack pointer, all registers and CPU flags, and the instruction pointer, a execution context represents a specific point in the application’s execution path. This is useful for building higher-level abstractions, like coroutinescooperative threads (userland threads) or an equivalent to C# keyword yield in C++.

callcc()/continuation provides the means to suspend the current execution path and to transfer execution control, thereby permitting another context to run on the current thread. This state full transfer mechanism enables a context to suspend execution from within nested functions and, later, to resume from where it was suspended. While the execution path represented by a continuation only runs on a single thread, it can be migrated to another thread at any given time.

context switch between threads requires system calls (involving the OS kernel), which can cost more than thousand CPU cycles on x86 CPUs. By contrast, transferring control vias callcc()/continuation requires only few CPU cycles because it does not involve system calls as it is done within a single thread.

TL;DR – in the way we write our code, we can suspend function calls and free resources for other functions requiring it, without typical thread context switches enforced by the Kernel. A more deep-dive into Coroutines, await and concurrency can be found in this presentation and this blog post.

 

A simple Example

$ vim coroutine.cpp

#include <boost/coroutine/all.hpp>
#include <iostream>

using namespace boost::coroutines;

void coro(coroutine::push_type &yield)
{
        std::cout << "[coro]: Helloooooooooo" << std::endl;
        /* Suspend here, wait for resume. */
        yield();
        std::cout << "[coro]: Just awesome, this coroutine " << std::endl;
}

int main()
{
        coroutine::pull_type resume{coro};
        /* coro is called once, and returns here. */

        std::cout << "[main]: ....... " << std::endl; //flush here

        /* Now resume the coro. */
        resume();

        std::cout << "[main]: here at NETWAYS! :)" << std::endl;
}

 

Build it

On macOS, you can install Boost like this, Linux and Windows require some more effort listed in the Icinga development docs). You’ll also need CMake and g++/clang as build tool.

brew install ccache boost cmake 

Add the following CMakeLists.txt file into the same directory:

$ vim CMakeLists.txt

cmake_minimum_required(VERSION 2.8.8)
set(BOOST_MIN_VERSION "1.66.0")

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++0x")

find_package(Boost ${BOOST_MIN_VERSION} COMPONENTS context coroutine date_time thread system program_options regex REQUIRED)

# Boost.Coroutine2 (the successor of Boost.Coroutine)
# (1) doesn't even exist in old Boost versions and
# (2) isn't supported by ASIO, yet.
add_definitions(-DBOOST_COROUTINES_NO_DEPRECATION_WARNING)

link_directories(${Boost_LIBRARY_DIRS})
include_directories(${Boost_INCLUDE_DIRS})

set(base_DEPS ${CMAKE_DL_LIBS} ${Boost_LIBRARIES})

set(base_SOURCES
  coroutine.cpp
)

add_executable(coroutine
        ${base_SOURCES}
)

target_link_libraries(coroutine ${base_DEPS})

set_target_properties(
        coroutine PROPERTIES
        FOLDER Bin
        OUTPUT_NAME boost-coroutine
)

Next, run CMake to check for the requirements and invoke make to build the project afters.

cmake .
make

 

Run and understand the program

$ ./boost-coroutine
[coro]: Helloooooooooo
[main]: .......
[coro]: Just awesome, this coroutine
[main]: here at NETWAYS! :)

Now, what exactly happened here? The Boost coroutine library allows us to specify the „push_type“ where this functions should be suspended, after reaching this point, a subsequent call to „yield()“ is required to resume this function.

void coro(coroutine::push_type &yield)

Up until „yield()“, the function logs the first line to stdout.

The first call happens inside the „main()“ function, by specifying the pull_type and directly calling the function as coroutine. The pull_type called „resume()“ (free form naming!) must then be explicitly invoked in order to resume the coroutine.

coroutine::pull_type resume{coro};

After the first line is logged from the coroutine, it stops before „yield()“. The main function logs the second line.

[coro]: Helloooooooooo
[main]: .......

Now comes the fun part – let’s resume the coroutine. It doesn’t start again, but the function’s progress is stored as stack pointer, targeting „yield()“. Exactly this resume function is called with „resume()“.

        /* Now resume the coro. */
        resume();

That being said, there’s more to log inside the coroutine.

[coro]: Just awesome, this coroutine

After that, it reaches the end and returns to the main function. That one logs the last line and terminates.

[main]: here at NETWAYS! :)

Without a coroutine, such synchronisation between functions and threads would need waits, condition variables and lock guards.

 

Icinga and Coroutines

With Boost ASIO, the spawn() method wraps coroutines on a higher level and hides the strand required. This is used in the current code and binds a function into its scope. We’re using lambda functions available with C++11 in most locations.

The following example implements the server side of our API waiting for new connections. An endless loop listens for incoming connections with „server->async_accept()“.

Then comes the tricky part:

  • 2.9 and before spawned a thread for each connection. Lots of threads, context switches and memory leaks with stalled connections.
  • 2.10 implemented a thread pool, managing the resources. Handling the client including asynchronous TLS handshakes are slower, and still many context switches ahead between multiple connections until everything stalls.
  • 2.11 spawns a coroutine which handles the client connection. The yield_context is required to suspend/resume the function inside.

 

void ApiListener::ListenerCoroutineProc(boost::asio::yield_context yc, const std::shared_ptr& server, const std::shared_ptr& sslContext)
{
	namespace asio = boost::asio;

	auto& io (server->get_io_service());

	for (;;) {
		try {
			auto sslConn (std::make_shared(io, *sslContext));

			server->async_accept(sslConn->lowest_layer(), yc);

			asio::spawn(io, [this, sslConn](asio::yield_context yc) { NewClientHandler(yc, sslConn, String(), RoleServer); });
		} catch (const std::exception& ex) {
			Log(LogCritical, "ApiListener")
				<< "Cannot accept new connection: " << DiagnosticInformation(ex, false);
		}
	}
}

The client handling is done in „NewClientHandlerInternal()“ which follows this flow:

  • Asynchronous TLS handshake using the yield context (Boost does context switches for us), Boost ASIO internally suspends functions.
    • TLS Shutdown if needed (again, yield_context handled by Boost ASIO)
  • JSON-RPC client
    • Send hello message (and use the context for coroutines)

And again, this is the IOBoundWork done here. For the more CPU hungry tasks, we’re using a different CpuBoundWork pool which again spawns another coroutine. For JSON-RPC clients this mainly affects syncing runtime objects, config files and the replay logs.

Generally speaking, we’ve replaced our custom thread pool and message queues for IO handling with the power of Boost ASIO, Coroutines and Context thus far.

 

What’s next?

After finalizing the implementation, testing and benchmarks are on the schedule – snapshot packages are already available for you at packages.icinga.com. Coroutines will certainly help embedded devices with low CPU power to run even faster with many network connections.

Boost ASIO is not yet compatible with Coroutine2. Once it is, the next shift in modernizing our code is planned. Up until then, there are more Boost features available with the move from 1.53 to 1.66. Our developers are hard at work with implementing bug fixes, features and learning all the good things.

There’s many cool things under the hood with Icinga 2 Core. If you want to learn more and become a future maintainer, join our adventure! 🙂

Ich habe einen iX-Artikel für Dich: GitLab, GitLab, GitLab

Servus zusammen,

Dinge die man selbst gelernt hat, anderen Leuten beizubringen und helfend beiseite zu stehen, ist ein echt gutes Gefühl. Bei mir zieht sich das seit vielen Jahren durch die Icinga Community, einer der schönsten weil überraschensten Momente war wohl das Foto als „Danke“ auf dem Icinga Camp Berlin 2019. Dann kommt noch dazu, dass ich sehr gerne Dokumentation schreibe, oder einfach alles aufschreibe, was ich irgendwann mal brauchen könnte. Und vielleicht jemand anders, der mal meinen Job macht, und ich mich neuen Aufgaben widmen kann. Nach den ersten Gehversuchen mit der Icinga-Schulung (2.x natürlich ;)) haben das nunmehr meine Kollegen übernommen, und meistern die Wissensvermittlung mit Bravour. Wir Entwickler sorgen dann in unseren Releases dafür, dass auch ihnen nicht langweilig wird 🙂

Ich für mich habe aber auch festgestellt, dass man nicht nur „das eine“ machen soll und auch kann, sondern immer „über den Tellerrand“ schauen sollte. Und so kams, dass ich auf meiner ersten OSDC 2013 keinen Dunst von Puppet, Elastic, Graphite, Container-Plattformen oder CI/CD hatte. Auch die Jahre danach waren hart, und meine Kollegen durften mir viel erklären, etwa Ceph und OpenStack. Jetzt nach vielen Jahren hilft mir dieses Wissen in meiner tagtäglichen Arbeit, und auf eine gewisse Art und Weise bin ich stolz, wenn mich meine Kollegen und Freunde nach Themen fragen, die nicht unmittelbar mit Icinga zu tun haben.

Dann gibts da noch Git, die schwarze Magie der Entwickler. 2004 in Hagenberg hab ich meinen VHDL-Code noch in CVS eingecheckt, 2009 .at-DNS-Zonen-Files nach SVN geschoben und irgendwann dank Icinga auch Git gesehen. Um gleich mal mit „force push“ den Master zu zerstören – aller Anfang ist schwer. Seitdem ist viel passiert, und irgendwie hat jeder einen Git-Kniff, der gerne ausgetauscht wird. Die Nachfrage nach einer Schulung, seitens DEV (Kurzform für unsere Development-Abteilung), wurde immer größer und so wurde vor knapp 2,5 Jahren die Git-Schulung aus dem Boden gestampft.

Seither hat sich einiges getan, und wir haben unsere Open-Source-Entwicklung vollständig auf GitHub migriert, sowohl Icinga als auch NETWAYS. Aus dem vormaligen self-hosted Gitorious wurde dann mal ein GitLab, und mit jedem Release kam etwas neues dazu. GitLab verwenden wir an vielen Stellen – intern fürs Infrastrukturmanagement, betreut von MyEngineer im Hosting, als App in NWS und natürlich für Kunden und interne Projekte auf git.netways.de und git.icinga.com. Die Möglichkeiten, die einem CI mit den Runnern bietet, sowie den Merge-Request-Workflow haben wir seitdem bei uns stetig etabliert und ausgebaut.

All diese Erfahrungen aus der Praxis, und die tagtägliche Arbeit lassen wir in die neu gestaltete GitLab-Schulung einfliessen. Im Vortrag von Nicole und Gabriel auf der OSDC 2018 habe ich dann auch endlich mal Auto-DevOps verstanden und die Web IDE besser kennen gelernt. All das und noch viel mehr erzähle ich Schulungsteilnehmern im Kesselhaus und freu mich über die gemeinsamen Lernerfolge.

© 2019 Heise Medien GmbH & Co. KG

Doch damit hats nicht aufgehört – nachdem ich letztes Jahr für die IX einen Artikel zu IoT-Monitoring rund um Icinga, Elastic, Graylog und MQTT schreiben durfte, hab ich auch GitLab mit Golang in den Raum geworfen. Es ist ein bisserl Zeit ins Land gegangen, und ich hab dank IcingaDB auch mehr Golang gelernt. Im neuen Jahr hab ich eine GitLab-Schulung gehalten, und mich am Wochenende drauf hingesetzt und für die aktuelle iX 04/19 einen Artikel über GitLab und CI/CD geschrieben. Und auch vorab die GitHub Actions evaluiert, wo ich netterweise einen Invite habe 🙂

Wer mich kennt, weiss, dass ich endlos schreiben und reden kann über Dinge, die mir Spass machen. So empfehle ich Dir zur Lektüre auch einige Kaffee-Tassen (und falls vorhanden: Dragee-Keksi). Soferns dann noch offene Fragen gibt, komm einfach auf uns zu – egal ob Workshops, Schulungen oder Consulting, wir kriegen das hin, dass Dein GitLab genauso schnurrt wie unseres 🙂

Bevor ich es vergesse, auf der OSMC 2019 mach ich einen GitLab-Workshop rund um DevOps-Workflows und CI. Die Zeit vergeht eh so schnell – gleich anmelden 😉

Wir lesen uns – Icinga 2.11 wartet und nächste Woche ist Henrik aus der Schule wieder da. „Mein“ Azubi der in die Welt von Icinga Core eintauchen darf, ich werd alt ❤️

 

 

GitLab Training v2.5.0 released

We have released v2.5.0 of our GitLab training today. Based on the feedback from previous trainings, and many things learned together with the students, we aim for the next classes already.

Dive deep into Git rebase, merge, squash, cherry-pick, get to know real-life development workflows and explore the possibilities of CI/CD pipelines and even more fancy GitLab features. Check our training schedule and register now!

Request Tracker: Highlight tickets based on due dates

A while ago we’ve announced a new extension for Request Tracker which allows to highlight tickets in search results even better.

Next to „last updated by“ and custom field conditions, we’ve now added a requirement from production:

  • light red coloring for tickets with a due date in 3 days
  • dark red coloring for everything where the due date already passed

When there’s a ticket which matches one of the other conditions, due date wins over „last updated by“ which itself wins over custom field conditions.

v2.0.0 is available on GitHub. Consider asking our sales engineers when building a new RT instance in NWS 🙂