StatsD + Graphite Cluster on Kubernetes

<TL;DR>
You can find a deployment-ready StatsD & Graphite cluster for Kubernetes 1.5.X here:

</TL;DR>

Who doesn’t need monitoring? We all do.
Today, Graphite is one of the most common picks as a monitoring backend for obvious reasons: It is very easy to set up and start feeding your data into it. Many companies have developed great tools for the Graphite eco-system. A few examples would be Etsy’s StatsD for aggregation and shipping, Grafana for visualization and Cabot for alerting. All of these reasons make Graphite a solid choice as your monitoring backend.

While Graphite is pretty easy to setup up as a single instance, things become a little bit more complicated once your data rate increases, you need replication to make sure your data isn’t lost and a single instance is just not enough anymore.
Luckily, Graphite was designed with clustering in mind. You just have to be familiar with the different parts of it, how they operate and the relationships between them. Creating a robust Graphite cluster on top of Kubernetes presents some difficulties but we’ll show how we overcame them using some of the new features introduced in Kubernetes 1.5.

I’m assuming you have a basic knowledge of the following:

  1. How statsd works, what is a flush interval and how does it relate to graphite’s storage-schemas (2 mins read)
  2. What are kubernetes deployments ,services and the new StatefulSetfeature in kubernetes 1.5.

What Are We Going To Build?

This is how our cluster is going to look like when we’re done:

metrics data flow

Let’s go over each of the layers, understand their role and how we implemented them.

StatsD Proxy

The first station metrics from our system arrive is the statsd proxy. Why do we need a proxy? that’s a good question.
Let’s assume we have two statsd daemon instances: statsd-1 and statsd-2 and 100 different services increment a counter called purchases.completed once in a flush interval. If each increment gets to a random statsd daemon we will end up with two different aggregations for the purchases.completed metric: stastd-1 will hold X and statsd-2 100-X. At the end of a flush interval only one of the values will persist in Graphite meaning we lost approximately half of the counter’s value. This is why it is crucial that the same metric arrives to the same statsd daemon instance. statsd proxy uses consistent hashing to make sure the same metric always arrive to the same StatsD Instance to avoid data loss.

Since statsd Proxies are stateless, we can just create a Kubernetes deployment and service to spread the metrics between the proxies.

StatsD Daemon

The next step of our metrics is the statsd daemon which aggregates the metrics and flushes them over each flush interval. Unfortunately, we can’t use the same Kubernetes deployment/service we used for the proxies since each of the daemons needs to have a unique network identity to enable the Proxies’ consistent hashing. If we would have set the daemons behind a regular service the daemon a metric arrived to would have been controlled by Kubernetes and not by the statsd Proxies themselves as consistent hashing requires.

Kubernetes 1.5 presented the StatefulSet construct (PetSet in 1.4) which allows exactly that — Every daemon instance gets its own resolvable constant DNS record. That means we will have daemon-0.statsd, daemon-1.statsd,…, daemon-n.statsd DNS records which we can set as the statsd proxy nodes configuration. This setup makes sure a certain metric is always delivered to the same daemon instance. I won’t get into the specific of how StatefulSets work but I’ll just mention that we have to create the StatefulSet itself and a headless service to control the DNS domain.

Note: we could still achieve the same setup without StatefulSets but that would be cumbersome: We’d have to create a separate deployment and a service for each daemon instance in order to have a resolvable DNS record that points to each of the instances. Ugly but possible.

Carbon Relay

So we have a distributed statsd cluster which aggregates metrics correctly after we made sure each metric arrives to the same statsd daemon instance.
Our next stop is a carbon relay service.

Carbon relay is part of the carbon daemons (along with carbon-cache and carbon-aggregator) which is responsible for distributing and replicating metrics across several carbon-cache daemons. It has two methods for distributing metrics: relay-rules and consistent-hashing ( remember that one? 🙂 )

Note: We’re going to ignore the relay-rules option here but if you have a specific use case it’s certainly worth checking.

Why consistent hashing again? The reason is the same — we need a specific metric to be stored on a single graphite node instance because of the way graphite clusters work: when graphite receives a request for data in a cluster mode, it sends the request to each of its cluster nodes. Then, for each graphite series (metric), it takes the first answer it received for that series. That means that if data points of the same series are distributed between different graphite nodes we will receive partial series data when querying the cluster.

The 3 interesting settings for the relay’s carbon.conf are:

  1. RELAY_METHOD = consistent-hashing
  2. REPLICATION_FACTOR = 2
  3. DESTINATIONS = <A comma separated list of our graphite data nodes>

Similar to statsd proxies, carbon relays have no state so a simple kubernetes deployment and service are a good solution here.

Graphite Data Nodes

Graphite data nodes are actually composed from 3 main processes:

  1. Carbon-Cache daemon: responsible for receiving metrics from carbon-relays, persisting them on the disk and serving them to the graphite web application (via CACHE_QUERY_PORT).
  2. Graphite Django web application: responsible for receiving data requests, querying the carbon-cache and disk storage (Whisper files) and responding.
  3. NginX server: serves static assets and delegates queries to the graphite web application.

Note: we could spare the NginX here since they serve as data-only nodes and no-one accesses them via the web interface to receive the static assets.

Since carbon-relays pass metrics to graphite data nodes using consistent hashing each of the nodes must have its own unique and consistent network identity. This is why again, we must use a StatefulSet here.

Another benefit of StatefulSets which we did not utilize on the statsd daemons setup is persistent volumes. Kubernetes can dynamically allocate a persistent EBS volume for each of the graphite data node instances. If one of the instances fail, Kubernetes knows to attach each instance its exact volume. Persistent volumes free us from the worries of failing instances since data is never lost.

Graphite Query Node

The graphite query node (or at least that’s the way I call it) is the last piece of the puzzle. It is mainly responsible for querying the cluster data nodes as it does not receive any time series data itself.

The only interesting configuration here is CLUSTER_SERVERS in local_settings.py telling the graphite web app where it should query the data from.

Conclusion

Let’s sum up the Kubernetes resources we created:

  1. A statsd proxy deployment and service
  2. A statsd daemon StatefulSet and a headless service for it
  3. A carbon relay deployment and service
  4. A graphite data node StatefulSet, a headless service and a presistent volume claim for each of the nodes.
  5. A graphite query node deployment and service

The three main advantages of this setup are:

  1. Every component is replicated so a failure in a single machine should not interfere with our metric collection and/or serving.
  2. Metrics data is replicated across more than one graphite data node so we have access to the full data set even when a node is not operational
  3. Persistent storage allows us to persist data in case of a node failure
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s