Website //Blog //Community //OSMC //OSDC
  
 
Implementing a large monitoring infrastructure with Nagios and GangliaApplication monitoring - Bridging the gap...


 


Implementing a large monitoring infrastructure with Nagios and Ganglia

When the number of hosts and services we wanted to monitor grew past a few thousands we started to hit limitations with our single head nagiosinstallation. After an evaluation of common techniques like nsca or nrpe and overhead of maintaining a multi-head nagios setup we decided to look into leveraging the ganglia infrastructure we had in place, which is designed to scale and could offer us a great data collection and transport medium. We then went ahead and wrote some glue code (in python) that would allow nagios to interface to ganglia and that to date has proven to be very reliable and able to scale thousands of nodes without adding any maintenance overhead.

 

Spike presents on the various considerations during the analysis phase, the current infrastructure, its properties and TCO, and the glue code.




Spike Morelli

As the Linden Lab Monitoring project leader Spike Morelli has spent the last year and a half redesigning the Lab's monitoring infrastructure to meet the ever growing demand for more control over our systems, migrating from a centralized to a distributed design, all based on freesoftware (primarily nagios, ganglia). Along with that he is involved with the systems and configuration management automation project, developing new solutions to aid with the deployment and maintenance of a large network (~10K nodes). Prior to Linden Spike spent 5+ years working on similar topics, but smaller scale, for a different number of customers across europe, carrying out several nagios deployments.

Download Presentation


<--- Zurück zur Übersicht