Ian Unruh

Ian Unruh


Site Reliability Engineer at LinkedIn. Former K-State Wildcat. Craft beer and coffee enthusiast. Voluntaryist.

Based in Silicon Valley, California


Links

Blogroll

Sensu and Flapjack

25 May 2014


In the previous post, I started on collection and visualization of metrics using Sensu, Graphite, and Grafana. In this post, I'll cover Sensu service checks and integrating Sensu with Flapjack, an alert notification router.

Sensu service checks

In the last post, I covered using Sensu to collect metrics. Now I'll cover how to use Sensu in a Nagios-like fashion with service checks. On the client node that we setup in the previous post, install Nginx to play as a web server in production.

apt-get install -y nginx

Then install a Sensu plugin that can be used to check if Nginx is running. Since we're doing that, we might as well gather some metrics from Nginx too! Isn't Sensu awesome?

gem install sensu-plugin --no-ri --no-rdoc

git clone git://github.com/sensu/sensu-community-plugins.git
cd sensu-community-plugins/plugins

cp nginx/nginx-metrics.sh /etc/sensu/plugins
cp processes/check-procs.rb /etc/sensu/plugins

Now add a subscription to the Sensu client called nginx.

{
  "client": {
    "name": "app1",
    "address": "192.168.12.11",
    "subscriptions": ["nginx"]
  }
}

Restart the Sensu client with service sensu-client restart. Over on the Sensu server, create /etc/sensu/conf.d/nginx.json with the following contents.

{
  "checks": {
    "nginx_check": {
      "command": "check-procs.rb -p nginx",
      "interval": 30,
      "subscribers": ["nginx"],
      "handlers": []
    },
    "nginx_metrics": {
      "type": "metric",
      "command": "nginx-metrics.rb",
      "interval": 10,
      "subscribers": ["nginx"],
      "handlers": ["relay"]
    }
  }
}

Restart Sensu with service sensu-server restart.

Now, metrics will be gathered from Nginx and relayed to Graphite every 10 seconds and Sensu will check if Nginx is running every 30 seconds. The Nginx check was configured without any handlers, so at the moment, the only way to see if the check failed is through the Sensu dashboard.

Sensu already has guides on adding handlers for check failures, so instead I'll try to cover something more advanced. Instead of using Sensu's handlers to deliver alerts, I'm going to offload that onto a tool called Flapjack.

Flapjack

While researching for this blog series, I came across Flapjack, an alert notification router that works with check execution engines like Nagios and Sensu. Despite Sensu having built in notification and rollup, using these features can have various drawbacks that aren't immediately apparent. The latest iteration of Flapjack was built from the ground up to focus on alert routing. The Unix philosophy of doing one thing and doing it well really appeals to me, so I decided to look into this tool more.

There is an excellent SpeakerDeck covering the benefits of using Flapjack.

Flapjack screenshot
Flapjack architectural diagram

In my case, I'm integrating Flapjack with Sensu instead of Nagios, so there is no need for the Nagios receiver. Instead, Sensu will be configured to feed check events directly into the Redis queue that is processed by Flapjack.

Install Flapjack using the following script, taken from my monitoring scripts project.

echo "deb http://packages.flapjack.io/deb precise main" > /etc/apt/sources.list.d/flapjack.list

apt-get update
# Ignore unauthenticated package prompt with --force-yes
apt-get install -y --force-yes flapjack

The Flapjack omnibus package includes its own instance of Redis on port 6380. You'll need to adjust /etc/flapjack/flapjack_config.yaml to fit your needs.

On the Sensu server, we need to install the Flapjack handler.

git clone git://github.com/sensu/sensu-community-plugins.git
cp sensu-community-plugins/extensions/handlers/flapjack.rb /etc/sensu/extensions/handlers

Create /etc/sensu/conf.d/flapjack.json with the following contents. It should match the configuration used by Flapjack to connect to Redis.

{
  "flapjack": {
    "host": "localhost",
    "port": 6380,
    "db": "0"
  }
}

Let's take the Nginx check from earlier and make some adjustments to it.

{
  "checks": {
    "nginx_check": {
      "type": "metric",
      "command": "check-procs.rb -p nginx",
      "interval": 30,
      "subscribers": ["nginx"],
      "handlers": ["flapjack"]
    }
}

Notice that I added the metric type to this check. This is because we want to feed all check results to Flapjack, including successful ones.

After restarting the Sensu server, you should be able to see check results flowing into the Flapjack web interface on port 3080. If you want to test out a failure, you can either use the simulate-failed-check tool included with Flapjack, or create some havok yourself with service nginx stop.

Wrap-up

In this post, I introduced Sensu service checks and integrating them with Flapjack. There is a lot more configuration that needs to be done to make Flapjack useful, but I won't get to in-depth on that topic. There is already a wiki containing more instructions. The next post will be about configuring the Etsy Kale stack and Statsd.



comments powered by Disqus