2022 Gartner® Magic Quadrant™ for APM and Observability Read the Report

Back to insight results

May 13, 2019 By Sumo Logic

Apache Server Traffic Monitor: Analyzing Insights

Gain Deep insights from your Apache Traffic

Apache access log analytics is about getting different kinds of visibility into your web operations. When it comes to traffic, we’re primarily concerned with two metrics: the number of HTTP requests (hits) and the total bytes served (volume). You can find all sorts of actionable insights by plotting these values against the other information in your access logs.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

For example, comparing traffic to request URL identifies your most popular content, while visualizing hits and volume against referrer URL helps you pinpoint hotlinked media resources.

Analyzing Apache traffic metrics with Sumo Logic

Total Traffic by Hits

Let’s start by getting a high-level look at how much content you’re serving:

_sourceCategory=Apache/Access | timeslice 1m | count as hits by _timeslice

Sumo Logic adds a _sourceCategory metadata field to logs as it collects them. This lets us limit the scope of queries to either access logs, error logs, or custom log files that you’ve configured in httpd.conf. The _timeslice operator groups log messages into 5-minute intervals.

Running this query returns a table counting the number of hits every 5 minutes. You can visualize this information by clicking the Line Chart button in the Aggregates tab. This gives you a much clearer view of traffic spikes.

Visualizing total hits over time

You can save this chart into a dashboard by clicking the Create Panel button in the Aggregatestab. Dashboards are automatically updated in real time, so you’ll always know exactly what’s going on in your Apache infrastructure.

Alternatively, if you have a Sumo Logic Professional account, you can set up a real-time alert to receive an email when hits pass a certain threshold. In either case, the idea is to keep tabs on key performance indicators with Sumo Logic’s monitoring capabilities, then dig deeper with ad-hoc queries when something needs attention.

Total Traffic by Volume

Hits are only half the story when it comes to analyzing web server traffic. A single hit could result in anywhere from a few kilobytes to hundreds of megabytes or more. To get a complete picture of our Apache traffic, we need to modify the above query to display both hits and volume side-by-side:

_sourceCategory=Apache/Access | parse regex "HTTP/1.1\"\s+\d+\s+(?<size>\d+)" | where size != "-" | timeslice 1m | (size/1024) as kbytes | count as hits, sum(kbytes) as kbytes by _timeslice</size>

In addition to counting hits, this query also extracts the response size from each access log and adds them up to get the total volume for each 5-minute interval. With a few tweaks to the column chart settings, we have an at-a-glance view of both traffic metrics:

Visualizing both total volume and total hits over time

This gives you more insight into what kind of content you’re serving. If a small change in hits is accompanied by a big shift in volume, it means you’re serving a small amount of large responses. If your web application processes many small HTTP requests, you’ll see a closer correlation between hits and volume.

These charts can help guide optimization efforts. In the former case, reducing response size will have the biggest impact on your web application’s performance. On the other hand, if you’re handling unnecessary HTTP requests, you’re better off refactoring your content to reduce the number of requests.

Hits and Volume by Server

If you’re maintaining multiple Apache servers, you’re probably also interested in how traffic is distributed amongst your servers. We can break down hits and volume by server using the _sourceHost metadata field. Like _sourceCategory, this value is set while configuring a source.

_sourceCategory=Apache/Access | parse regex "HTTP/1.1\"\s+\d+\s+(?<size>\d+)" | where size != "-" | timeslice 1m | (size/1024) as kbytes | count as hits, sum(kbytes) as kbytes by _timeslice, _sourceHost | transpose row _timeslice column _sourceHost</size>

There are two important additions to the above query. First, we’ve grouped hits and volume not only by _timeslice, but also by _sourceHost. Second, the transpose operator acts as a pivot, which lets us display each server as a section in a stacked column chart:

Separating traffic by server

This visualization provides a clear window into your entire web application infrastructure. For instance, if you’re monitoring a load-balanced cluster, this panel immediately tells you whether traffic is distributed correctly. Or, if you’re reacting to a denial-of-service attack, all you have to do is glance at your dashboard to see which servers require attention.

Hits and Volume by URL

So far, we’ve only been examining hits and volume over time. To get a another perspective on your web traffic, we can also plot traffic against request URLs:

_sourceCategory=Apache/Access | parse regex "[A-Z]+ (?<url>/[^\ ]+?) HTTP/1.1\"\s+\d+\s+(?<size>\d+)" | where size != "-" | (size/1024) as kbytes | count as hits, sum(kbytes) as kbytes by url | sort hits | limit 20</size></url>

This calculates the volume and hit count for the top 20 URLs, which tells us which content is worth optimizing. To reduce HTTP requests, you can combine small images with high hit counts into spritesheets or cache dynamically generated content. To reduce bandwidth usage, you can compress high-volume media resources, gzip your HTML, or minify your CSS and JavaScript.

Inspecting the highest-traffic URLs

Note that this is the same type of information we saw in the last two sections, but breaking it down by URL provides a new set of actionable insights.

Identifying Hotlinked Resources

Comparing traffic metrics to referrer URL identifies a different optimization opportunity: hotlinked resources. The following query will identify the top 20 websites that are direct-linking to your image files:

_sourceCategory=Apache/Access | parse regex "\"[A-Z]+ .+\.?(?<type>jpg|jpeg|png|gif) HTTP/1\.1\" \d+ (?<size>\d+) \"(?<referrer>\S+)\"" | where !(referrer matches "**") | (size/1024) as kbytes | count as hits, sum(kbytes) as kbytes by referrer | sort hits | limit 20</referrer></size></type>

Note that the parse regex expression automatically drops all logs that don’t have a .jpg, .jpeg, .png, or .gif extension. In addition, the where clause filters out your own web pages (be sure to change to your own domain). Visualizing the results as a bar chart quickly tells you which sites are the culprits.

Identifying hotlinking referrers

Depending on the type of web application you’re running, you might want to block these referrers. Or, you can simply disallow image hotlinking altogether by adding the following lines to your .htaccess file:

<pre>RewriteEngine OnRewriteCond %{HTTP_REFERER} !^http://(.+\.)?example\.com/ [NC]RewriteCond %{HTTP_REFERER} !^$RewriteRule .*\.(jpe?g|gif|bmp|png)$ - [F]</pre>

Again, change to your domain so your own web pages aren’t blocked from retrieving image files.

Identifying Malicious Users

Finally, you can analyze traffic against client IP to find abusive users, misbehaving bots, and potential denial-of-service attacks.

_sourceCategory=Apache/Access | parse regex "(?<client_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" | parse regex "HTTP/1\.1\" \d+ (?<size>\d+)" | (size/1024) as kbytes | sum(kbytes) as kbytes by client_ip | sort kbytes | limit 20

Visualizing the results as a pie chart makes it easy to identify IPs that are using up abnormal amounts of bandwidth:

Traffic distribution by IP address

Note that this is only the first step towards finding malicious clients. Deeper analysis is required to determine if these IPs are scraping copious amounts of information from your application, trying to make your site unresponsive, or if they’re simply your best customers.

The next article in this series explains more sophisticated ways to identify misbehaving robots. For more information about identifying and blocking malicious users, see Analyzing System-Critical Errors.

Apache Traffic Monitor Summary

This article was about reducing traffic, measured in either hits or volume. We discussed ways to optimize application resources, prevent other websites from hotlinking content, and identify malicious users. One topic we didn’t cover was analyzing response time, which requires a custom Apache access log format.

For any of these activities, the role of an Apache log analyzer is monitoring and root cause analysis—not remediation. A tool like Sumo Logic can provide real-time insights about your web traffic, but it’s still up to your IT and development teams to implement solutions.Apache Server Traffic Monitor: Analyzing Insights

Additional resources

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Access insight


Sumo Logic Continuous Intelligence Platform™

Build, run, and secure modern applications and cloud infrastructures.

Start free trial

People who read this also enjoyed