Note: If you’re not familiar with Logstash, please watch the introduction video to get the hang of inputs, filters and outputs and how they operate within Logstash. I do assume you have the basic knowledge about Logstash in this tutorial.
Logstash was really a game changer to our monitoring capabilities at FTBpro.com. We’re using it to aggregate and analyse every system which produces log files in our infrastructure. The setup may take some time, but it is certainly worth the while.
Among all the logs we aggregate using Logstash, we also aggregate our Nginx logs, which I am sure a lot of you would like to but just don’t know exactly how. I wanted to share our configuration and how exactly we set things up to make it work. It has been working for a few good months now with incredible results.
So this is a general sketch of our system:
The server has 4 main components which make it work:
- Redis: Used in order to receive and store events form the web servers
- Logstash Indexer: Takes events from the Redis and pushes them into ElasticSearch
- ElasticSearch: Gets events from the Redis and stores them
- Kibana Web: A very cool web application which nicely presents data stored in ElasticSearch
You will have to install Java, Redis and ElasticSearch to have it ready for action.
Afterwards, just follow this 10 minutes walkthrough to get the Logstash agents up.
Pay attention that you will have a Logstash agent to move events from the Redis to the Elasticsearch (Indexer), and another Logstash web process which holds the Kibana Web interface.
After you have everything installed, the server.conf file is really simple:
As an input, we have redis which operates on localhost. We’re reading the “logstash” key of it which holds a list of events.
As an output, we have ElasticSearch and rubydebug just to be able to watch the log files of the agent if things mess up.
The clients (Web servers) have only a Logstash Agent which reads the NginX access log and sends it directly to the Redis on the Logstash Server.
The Logstash agent here is the same one we installed on the Server earlier. So get back to that 10-min tutorial and install the Logstash agent on your web server. Pay attention that the Logstash Web (Kibana) is not needed on the Web servers so don’t run it here.
After getting the agent installed, we need to set up the NginX logging to spit all the data we care about. We found that the default NginX log format does not supply us with all the data we’d like to have, so we altered it. This is the log format definition on our NginX servers:
The differences from the default log are:
- We added HTTP host to the log. This is because one NginX server may serve multiple hosts and we’d like to know on which one the request operated.
- We omitted $remote_user variable from the default log – we saw no use of it.
- Added $request_time to have response time of static files served by NginX.
- Added $upstream_response_time to have response time of our application server (Unicorn)
Now that we have all the needed data in the log file, all that needed is to ship it right to the Redis on the Logstash server.
This is how the agent.conf file looks on our Web Servers:
A few points worth mentioning about it:
The input, as expected, is the NginX access log file.
The Grok filter is one of the most useful filters included in Logstash. It allows you to analyse each log line and fetch fields like client_ip, request and response form it, by matching the log line against regular expressions. We’re using it here to analyse the NginX log line and to extract all the fields we’d like to get out of it to ElasticSearch.
Logstash ships with a bunch of regular expressions set up for you so you can easily start matching log lines without even writing regular expressions by yourself. You can find the list of given regular expressions here. For example, the first part of the string
means we’re matching the pre-defined Grok pattern
IPORHOSTand want to have it as
http_hostfield in ElasticSearch. So simple.
Getting the hang of Grok is really a matter of practice. There is also a great web tool called Grok Debugger which allows you to try Grok Patterns on inputs you define.
The reason that there are two patterns (lines 12, 13) is that Grok will try to match the first and then the second. If NginX serves a static file it won’t have $upstream_response_time. So we need one Grok pattern to “catch” requests that went to upstream (this is the first pattern) and one Grok pattern for static requests served (this is the second one).
Inside the Grok pattern we explicitly mention data types on fields we would like to analyse as numbers. For example
request_timefield will be indexed as
floaton ElasticSearch and that we would be able to run numeric functions on it like mean, average etc, and to show it on the graph. This is extremely important because otherwise the response_time would have been indexed as a string and not a number and we couldn’t properly display it on a graph.
NUMBERis, again, a pre-defined Grok pattern.
The date filter allows us to set the date on the event sent to ElasticSearch. By default logstash puts the time the agent read the log line as the timestamp. This is not good for us because NginX writes the log line after the request has been processed, so the time the log line was written is not the time in which the request arrived to our server.
The geoip filter adds geographical fields on the event. This will later allow us to ask questions like “From which country I got the most requests?”
The output is pretty self explanatory: Send the events to the Logstash Server Redis and to STDOUT for debugging purposes.
The top left graph shows us HTTP responses by time. Each response time – 200OK, redirects, not-found and errors are colored so we can distinguish between them. Dragging the mouse from left to right inside the chart will drill down on the dragged time frame on all of the graphs on the dashboard.
The top right graph shows us only HTTP errors (5xx) responses by time. If we see a bump on this graph it usually means something went wrong. Again you can drag your mouse to drill on the time frame you wish to focus on.
The two bottom left graphs are showing top URLs hits and top user-agents. Clicking on the magnifying glass on the right to each of them will filter all the graphs on the dashboard by this url or user-agent.
The bottom right pie chart shows us the distribution of response codes. Clicking on each section on the pie will filter all graphs on the dashboard to show only requests with the chosen response code.
The left chart simply shows us mean response time.
The right map shows us geo data of requests. The more requests, the darker the color of the country of the map. Clicking on a country will filter all the graphs on the dashboard to show only requests from this country. So simple, but yet so powerful.
So what do we get out of it? Endless possibilities to analyse and drilldown our web server activity logs. We are able to instantly answer so many questions we couldn’t before. Questions like:
1. What are the top URLs in terms of hits? 404 responses?
2. What agents get the most errors?
3. From what countries are we getting most traffic?
4. What are the slowest responses and to which URLs?
The limit is your imagination and getting immediate results is in a matter of a click.
So yes, go get Logstash now. It is a great project, really well maintained and most important – will easily provide you with answers you had no way to get before.