Ian Unruh

Ian Unruh


Site Reliability Engineer at LinkedIn. Former K-State Wildcat. Craft beer and coffee enthusiast. Voluntaryist.

Based in Silicon Valley, California


Links

Blogroll

The State of Highly-Available Syslog

02 Aug 2014


Corrections made based off of feedback from commenters on the Reddit post.

So, you've played with Logstash or one of the multitude of other log aggregation tools (Flume, Fluentd, rsyslog, syslog-ng, etc.) Now you want to start shipping logs from your networking and legacy devices to your logging infrastructure. For most situations, you can just turn on a syslog receiver on your indexer and point your shipper at it. However, when you have to perform maintenance or your indexer experiences downtime, log loss may occur.

There are a lot of different situations at hand that can affect the solution you use for minimizing or eliminating log loss.

  • The shipper uses a reliable protocol like TCP
  • The shipper uses an unreliable protocol like UDP
  • The shipper can fanout to multiple receivers
  • The shipper has receiver failover support
  • The receiver can journal incoming events to disk
  • The receiver can perform de-duplication of events

There are a few things I've noticed in regards to the current landscape of networking devices and logging.

  • Network devices generally support both TCP and UDP syslog
  • Business-class devices (SANs, firewalls, L3 switches, routers, etc.) generally support fanout to multiple syslog receivers
  • I don't know of any network devices that support receiver failover

When it comes to journaling, some receivers come really close.

  • The commercial version of syslog-ng can journal enough to protect against itself crashing, but not the OS crashing.
  • Flume supports durable channels, but it is unclear if the sources themselves are durable.
  • Heka definitely does not journal at the time of this writing, as evidenced by conversations I've had in the Heka IRC channel.
  • Fluentd decided that the tradeoff between durability and speed should be entirely in favor of speed. They decided to use at-most-once delivery semantics.
  • Logstash makes no claims about durability that I could find
  • Graylog2 makes no claims about durability that I could find

Given all of this, it appears that there is no easy way to perform exactly-once delivery. We have to settle for one of the following solutions.

  • At-most-once delivery - Have two receivers in active/standby mode with a shared VIP
  • At-least-once delivery - Have two receivers in active/active mode (with de-duplication it can become exactly-once delivery)

Active/standby

If your goal is to just minimize log loss, then you can stand up two instances of your log receiver of choice and use keepalived. Use this with TCP or UDP syslog, depending on your network reliability.

VIP failover with keepalived

On each indexer, it's trivial to setup keepalived. If you're using Ubuntu 14.04, just install with apt-get install keepalived and replace /etc/keepalived/keepalived.conf with the following contents. Adjust it as needed for your environment.

vrrp_instance vi1 {
    state MASTER
    interface eth1
    virtual_router_id 51
    priority 101
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass supers3cret
    }
    virtual_ipaddress {
        192.168.5.30/24
    }
}

Point your shipper at the shared IP address and now you have nearly-free failover with at-most-once semantics.

Note that this topology will result in event loss if the indexer crashes, but the failure window should be minimal. When performing maintenance, simply stop keepalived on one of the nodes and gracefully stop the indexer. Once you're finished, restart the indexer and keepalived. This should result in zero event loss.

Active/active

Sometimes log loss is unacceptable, like when dealing with strict auditing requirements. In this case, you would need to stand up multiple receivers, then have your shipper fanout events to them all. Of course, you'll need to perform some kind of de-duplication on the log stream.

There are different places in your pipeline you can perform de-duplication:

  • In a message queue shared between the first line of receivers
  • In a downstream indexer
  • In the database that the logs get persisted

Obviously the further up the pipeline you can do de-duplication, the more efficient your remaining pipeline elements can be.

To perform de-duplication after the first line of receivers, you will need to have them push all the events into a queue on RabbitMQ. You can then have a set of workers perform de-duplication and push the unique events into a queue that your downstream indexers use. I describe this solution further down in the post.

When it comes to the log indexers themselves, I couldn't find any that directly support de-duplication. However, if your log indexer has a filter that can hash the event to introduce a unique identifier, you can combine that with a de-duplicating database. I was informed by ragzilla on Reddit of this approach. Logstash provides a checksum filter that can be used with an Elasticsearch output. The checksum will be used as the document ID which means that when events are inserted into Elasticsearch, it will only accept the first unique event. This will be the easiest solution by far, especially if you only have the front-line receivers and Elasticsearch in your pipeline.

De-duplicating workers

If your pipeline is more complex, you may need to introduce your own de-duplication solution earlier in the logging pipeline. This solution should scale to high volumes of logs, but will take a bit of thought to implement.

De-duplication with multiple receivers

Basically, your shipper will fanout the logs to multiple receivers. These receivers will push directly into a "new events" queue in a RabbitMQ cluster. One or more workers will pull from this queue, and perform the following on each event:

  1. Hash the event or grab a unique identifier from the event
  2. Atomically check and set the key in a shared Redis instance (with an expiration time, to prune events that are outside of your duplication window)
  3. If the event has been "seen" before, discard it
  4. If the event is new, place it into the "processed" queue in the RabbitMQ cluster

Then the rest of your logging pipeline now can consume the de-duplicated stream of events. You can scale out the de-dup workers as much as needed because they use Redis as a concurrent set.

Some notes about this solution:

  • The de-dup worker will prefetch some number of events, process them, then ACK them. If a de-dup worker were to crash, another de-dup worker can pickup the un-acked events.
  • If you have a single Redis instance, this can introduce downtime into your de-dup worker pool. Consider using some sort of automatic failover with multiple Redis instances.
  • To get better performance out of Redis, you can turn off persistence entirely. This will keep your "seen" event set in-memory. This is only acceptable if you can accept that a crash can result in duplicate events.
  • There is a period of time where the de-dup worker could place the event into the "seen" set and crash before it puts the event into the "processed" queue. To combat this, it would be possible to keep track of the number of times an event has been seen (using the INCR operation, for example). If the number of times is greater than the expected number of duplicates, that would indicate something funky going on, and the duplicate should be allowed. This would eliminate event loss.

This hasn't been implemented yet, but it would be easy enough to do in Go, Python or Ruby. If there is enough interest, I would even be willing to take a crack at it. This seems to be the most promising solution for durable, exactly-once log delivery.

UPDATE: One possible project to use for this solution is duplog (blog post). It appears to be geared more towards Splunk, but might fit in perfectly in some logging pipelines. It has the ability to degrade gracefully in the face of a Redis failure, but also appears to be able to lose events if a worker were to crash directly after recording the event.

Wrap-up

In this post, I discussed some of the barriers to highly-available logging from network devices that use the Syslog protocol. I presented some solutions, based on which delivery semantics would be desirable to the reader. In my personal opinion, it should never be a requirement for logs to never be lost. Durable logging is a tough path to travel down, one which will introduce a lot of complexity into a system that should otherwise be simple to maintain.



comments powered by Disqus