Grafana is a great visualization and dashboarding tool that works against a Graphite or an InfluxDB backend. If you are using graphite without a visualization tool then go through this Grafana Tutorial to see exactly how it can enhance your monitoring experience.
I’m going to assume you know the basics: how to create a new dashboard and how to add a simple graph with some graphite metric shown on it.
One of the greatest features of Grafana is templating. It lets you set a value for a variable that you can then use in your graphs and graphite metrics. Let’s assume for example that I count server hits from mobile and desktop plaftorms. So we have stats.server_hits.desktop and stats.server_hits.mobile metrics on our graphite backend. I can set a $platform variable that will allow desktop/mobile values, and then create a graph which shows the following graphite metrics: stats.server_hits.$platform. Now when the user enters the dashboard he/she can select a platform and view the corresponding server hits.
It becomes extremely useful when you have multiple graphs on the dashboard and you would like to quickly switch between mobile / desktop context on all of them together.
So now that we all agree templating is a cool and useful feature, let’s see how we can use it to solve another problem: Many times we get too many points in a graph. If our graphite saves data in a 5 seconds resolution with 6 hours retention, then for 6 hours we have 4320 points. That’s a lot more than we need and more than our eye could make sense of.
Ideally, we would like to see between 30 to 60 points, which means a point every 8 minutes for a 6 hours timeframe. To achieve this, we can use Graphite’s summarize function.
For example: summarize(stats.server_hits.$platform, ‘8min’, ‘sum’) generates a graph which has a point every 8 minutes that holds the sum of all server hits within the 8 minutes interval.
This is useful and works well, but that 8min constant looks ugly. What is going to happen when I change the timframe from 6 hours to 15 minutes? I will get only 1 or 2 points on my graphs. A better solution is to tell graphite how many points (or steps) we want to see on our graphs, and let it automatically adjust the summarize interval accordingly. Setting the steps variable to 30, with a 30 minutes timeframe will generate a point on a 1 minute interval, and with 60 minutes timeframe will generate a point on a 2 minutes interval. Thus, our interval changes to fit the timeframe according to number of steps we defined.
Luckily, Grafana supports exactly this. Let’s see how we can make it happen:
1. First let’s open a new dashboard and add a graph with some arbitrary metric to it.
2. Now click on the top right wheel to go to the dashboard’s settings panel, choose features tab and check templating. Hit the close button to go back to the dashboard.
3. A new wheel has appeared on the top left corner of the dashboard, click on it and choose templating from the drop-down list. This is the place where all the dashboard variables are defined.
4. Click the Add tab, put interval as variable name and choose interval as variable type.
5. You can see the Values field has been auto-populated by Grafana once we chose the interval variable type. These are the values that will be available for us as intervals. We can change/add to them as we see.
6. Below the Values field, there is a “Include auto interval” checkbox. This is exactly what we’re looking for. Once we check it, a new “Auto interval steps” drop down appears, which will let us choose how many steps each graph on the dashboard will have. If we choose 10 steps for example, and the timeframe of the dashboard is 10 minutes, Grafana will resolve the $interval variable to 1min. if we choose 30 steps for a 10 minutes timeframe Grafana will pick the closest interval which is 30 seconds.
7. After you picked the number of steps, click the Add button and close the templating settings panel. The $interval variable appears on the top left and when we click it we can either select “auto” or any other arbitrary time interval.
8. All we are left to do is to use summarize in all of our graphs with the $interval variable which will be auto adjusted by Grafana. For example: summarize(stats.server_hits.$platform, $interval, ‘sum’)
Let’s enjoy this a little bit:
1. Change the dashboard timeframe on the top right and see how the $interval changes accordingly and your graphs keep the number of steps.
2. To see data more accurately, you can lower the $interval. This will automatically generate more graph steps in all of the graphs.
Just a little note about the summarize function: I used the ‘sum’ option because I am summarizing counts. When we summarize counts we would like to get a sum of all points within the interval. This is not always true though. If we summarize a metric which is an average of something, the summarize function should use ‘avg’ and not ‘sum’. There are also ‘max’, ‘min’ and ‘last’ options to get the corresponding values from each interval. Use them wisely.
Enjoy your new Grafana dashboard 🙂