REPORT

2022 Gartner® Magic Quadrant™ for APM and Observability Read the Report

Back to insight results

May 10, 2019 By Sumo Logic

Apache Error Log Analysis

Gain Deep Insight into Your Apache Server Environment

Your Apache access and error logs contain a wealth of actionable insights about potential server configuration and web application issues. The problem is, this information is hidden within millions of log messages. The goal of Apache log analytics is to efficiently extract these insights so you can respond to problems before they impact your users.

Apache log analysis revolves around two activities: monitoring and troubleshooting.

First, you need to track key performance indicators in real-time dashboards so you can identify abnormal behavior as it’s happening. Then, when these dashboards indicate that something has gone wrong, you need a powerful query language to dig deeper into relevant log messages.

Together, the monitoring and troubleshooting features of an Apache log analyzer results in faster root cause analysis, increased uptime, and fewer headaches.

Extracting specific Apache log messages with a custom query.

Apache error log analysis makes it easier to monitor problems in real time and troubleshoot critical issues when they occur. Your server’s error logs contain all the information you need to do these things, but extracting useful insights from millions of log entries can be tricky without a dedicated tool.

Monitoring system-critical errors in Sumo Logic.

Search, aggregate, visualize & Identify

Sumo Logic provides all the search, aggregation, and visualization tools you need to quickly identify the root cause of your website’s Apache errors. Follow along with the example queries - sign up for a free Sumo Logic account.

Apache System-Critical Error Log Analysis

Apache error log analysis makes it easier to monitor problems in real time and troubleshoot critical issues when they occur. Your server’s error logs contain all the information you need to do these things, but extracting useful insights from millions of log entries can be tricky without a dedicated tool.

Isolating Apache System-Critical Error Logs

Depending on your LogLevel directive, Apache error logs can contain verbose details about the inner workings of your servers. A good place to start your error log analysis is to strip away this noise by isolating serious errors.

In Sumo Logic, you can extract emergency-, alert-, and critical-level error messages with the following query:

_sourceCategory=Apache/Error | parse regex "\[.*:(?<log_level>[a-z]+)\]" | where log_level in ("emerg", "alert", "crit")</log_level>

Sumo Logic is designed to record all of your log data, which is why we need to select Apache error logs with the _sourceCategory metadata field. Also note the regular expression that parses the log_level assumes the default Apache 2.4 error log format.

System-Critical Apache Errors
Matching Apache error log entries.

Running this query will list all of the matching error log entries in the Messages tab, as shown above. This gives you a lot of debugging info, but it’s nothing you couldn’t find with a text editor. The real power of Apache log analytics is the ability to aggregate and visualize these error logs.

Monitoring System-Critical Errors in Apache

To stay on top of system-critical errors, we can set up a live panel that displays the number of errors in real-time. First, we need to group logs into 5-minute intervals with the timesliceoperator. This lets us count the total logs in each group with the count operator:

_sourceCategory=Apache/Error | parse regex "\[.*:(?<log_level>[a-z]+)\]" | where log_level in ("emerg", "alert", "crit") | timeslice 5m | count by _timeslice</log_level>

Visualizing the results as an area chart gives us a clear picture of how many errors our Apache system is generating. We can then save this chart as a panel by adding it to a dashboard. Sumo Logic periodically re-executes the underlying query and updates the panel automatically.

Visualizing the number of serious errors in Apache system logs.

This kind of real-time window into your Apache servers is the perfect complement to continuous integration environments. If an update to your web application causes serious problems, this panel will let you know immediately. You can then roll back the update and fix those issues before they affect too many of your visitors.

Monitoring Apache Error Reasons

Knowing how many errors are occurring is a great first step towards making sense of our error logs, but it’s also useful to know what kinds of errors are occurring. Using the exact same process, we can create another panel that displays the most common error reasons. First, we need to form a query that extracts the information we’re after:

_sourceCategory=Apache/Error | parse regex "\[.*:(?<log_level>\w+)\] .*\] (?<reason>.*)$" | where log_level in ("emerg", "alert", "crit") | count reason | sort _count | limit 10</reason></log_level>

Then we can save the resulting table as a live panel:

Display your top Apache error reasons in real-time.

The idea is to build up dashboards that contain all the metrics you’ll need when your system crashes and you have to switch into troubleshooting mode.

Identifying Malicious Client IPs in Apache Logs

Real-time monitoring lets you know that errors are occurring, but you also need to understand why they’re occurring. After your dashboards tell you that something has gone wrong, the next step is to look for more specific information with custom queries. This is the troubleshooting aspect of Apache log analytics.

We already know the most common error reasons from our panel in the previous section. Now, it’s time to ask deeper questions like who is causing system-critical errors:

_sourceCategory=Apache/Error | parse regex "(?<client_ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" | parse regex "\[.*:(?<log_level>[a-z]+)\]" | where client_ip !="" AND log_level in ("emerg", "alert", "crit") | count client_ip | top 10 client_ip by _count</log_level></client_ip>

Any users causing a disproportionate number of errors will be immediately apparent after visualizing the results as a pie chart.

Distribution of serious errors among client IPs.

If you do happen to find that a particular user or IP range is crashing your system, you can block those addresses with a deny from directive in your .htaccess file:

<pre>order allow,denydeny from 142.181.34.9allow from all</pre>

Identifying Apache Server Starts/Stops

Troubleshooting often involves looking for specific kinds of errors. For example, the data in your panels might suggest that a server is rebooting too often. You can perform a custom query to get a clearer picture of server start and stop events:

<_sourceCategory=Apache/Error | parse regex ".*\] (?<reason>.*)$" | if(reason matches "caught SIGTERM, shutting down", 1, 0) as server_stop | if(reason matches "*-- resuming normal operations", 1, 0) as server_start | timeslice by 5m | sum(server_stop) as server_stops, sum(server_start) as server_starts by _timeslice</reason>

This inspects each log entry and looks for the specific messages that Apache generates every time it starts or stops. Any abnormal behavior is easy to see after graphing the results as a stacked column chart:

Apache server start and stop events visually displayed over time.

But, this is only the beginning of the troubleshooting process. To find the root cause of the restart events, you’ll need to perform more custom queries around the time frames indicated by the above results.

The actual queries involved in Apache log analytics aren’t generally all that complicated. The hard part is figuring out which questions to ask and how to find the answer in your log data. As we saw in this section, effective analysis requires intimate knowledge of the log messages produced by your server.

A good way to approach error log analysis is peacetime preparation followed by wartime troubleshooting. During peacetime, you’re getting ready for when things go wrong by configuring panels that contain all the metrics you’re interested in. During wartime, these panels guide your debugging efforts and help you write custom queries that identify the root cause of the problem.

Analyzing Apache Status Code Response Errors

Unlike system-critical errors, Apache 400- and 500-level status codes usually relate more to content and linking issues rather than problems with your server configuration. In this article, we’ll learn how a dedicated Apache access log analyzer can make it much easier to monitor and troubleshoot status code errors.

Monitoring Apache status code errors in a Sumo Logic dashboard.

Isolating Apache Status Code Errors

To isolate access logs that contain 400- and 500-level status codes, we need to extract the status code from each log using the parse operator. Then, it’s easy to constrain the query to find status code errors with a where clause:

_sourceCategory=Apache/Access | parse "HTTP/1.1\" * " as status_code | where num(status_code) >= 400

_sourceCategory is a metadata field that Sumo Logic attaches to each log message as it’s collected, and Apache/Access is the canonical label for Apache access logs. If you used a different value when setting up your source, be sure to change your query accordingly.

Apache access log entries with status code errors.

Even for moderately busy websites, Apache servers produce millions of access logs. The first step towards identifying useful trends in all this data is to get rid of logs that we’re not interested in. This allows us to perform calculations with relevant log entries and visualize the results. In turn, this makes it much easier to monitor potential problems than sifting through Apache logs with grep.

Monitoring Apache Status Code Errors

For example, we can graph our status code errors over time with the following query:

_sourceCategory=Apache/Access | parse "HTTP/1.1\" * " as status_code | where num(status_code) >= 400 | timeslice 5m | count as count by _timeslice, status_code | transpose row _timeslice column status_code as *

After adding the timeslice and count operators, Sumo Logic automatically enables its graphing capabilities. All it takes is a few clicks to display these results as a stacked column chart. This gives us an at-a-glance view of every status code error in our Apache system.

Visualizing Apache status code errors in a live dashboard.

But, the monitoring capabilities of Sumo Logic revolve around live dashboards, not custom searches. Dashboards consist of multiple panels that track different key performance indicators (KPIs) in real time. The idea is to save our chart as a panel so we always have a transparent window into our Apache web server’s operations.

We now have a lot of status code information at our fingertips. If a PHP script starts to hang, we’ll see a spike in 500 errors. If a referring site contains broken links, we’ll see 404 errors go up. Even obscure errors like 503s caused by an overloaded server will be readily apparent.

Monitoring Apache 404 URLs

Configuring live dashboards is all about preparing for when your server breaks. To this end, we should probably include another panel that displays which URLs are generating 404 errors.

_sourceCategory=Apache/Access | parse regex "[A-Z]+ (?<url>.+) HTTP/1\.1\" (?<status_code>\d+) " | where num(status_code)=404 | count as count by url | sort count | limit 10</status_code></url>

Just like our other panel, we can save this query in a dashboard so the information is readily accessible.

Panel listing top URLs causing 404 errors on an Apache server.

Of course, you’ll likely have more sophisticated dashboards set up for production monitoring, but even these two panels give us a realistic glimpse into the utility of Apache access log analytics. A common scenario might be:

  • You see that 404s are spiking in our first panel.
  • So, you look at our second panel to see which URLs are causing 404s.
  • It turns out one particular URL is causing most of them, which means that it’s time to dig deeper to find the root cause of the 404s.

This is where you switch into troubleshooting mode and start running custom queries that investigate the data in your pre-configured panels.

Identifying 404 Referrers

Odds are, these 404 errors are coming from a broken link. To figure out where this broken link is, we need to find all the referrers that are pointing people to the missing page. If the URL is /about-us, the following query will do just that:

_sourceCategory=Apache/Access | parse regex "[A-Z]+ (?<url>.+) HTTP/1\.1\" (?<status_code>\d+)\s\S+\s\"(?<referrer>\S+)\"" | where num(status_code)=404 | where url matches "*about-us*" | count as count by referrer | sort count | limit 10</referrer></status_code></url>

The matches operator recognizes asterisk wildcards, making it easier to search for slugs in a URL. This query then tallies up how many times each referrer sent someone to the missing resource.

Referrers causing 404 errors in an Apache server.

If you find external websites in this list, it probably means you changed a URL and forgot to add a redirect. Alternatively, you may find pages from your own site in the results, which could indicate broken internal links or missing media resources.

This query is also a good demonstration of the separation of concerns involved in Apache log analytics: monitoring vs. troubleshooting. You wouldn’t want to save this query as a live panel, because it’s much too specific to be of use as a monitoring metric.

Identifying Unusual Behavior with Outliers

A certain amount of status code errors are expected based on your traffic volume. It’s important to keep this in mind when searching for atypical behavior because it means we need to replace questions like “Have there been more than a hundred 404 errors?” with “Have 404 errors fallen outside the expected range?”

One way to represent that “expected range” is as a multiple of the standard deviation around a rolling average. This is precisely what the outlier operator was designed to do:

_sourceCategory=Apache/Access | parse "HTTP/1.1\" * " as status_code | where num(status_code)=404 | timeslice 5m | count _timeslice | outlier _count window=6, threshold=2.5

This calculates the moving average of 404 errors using 6 data points, then detects when the number of 404 errors is beyond 2.5 standard deviations of that average. Graphing the results as a line chart shows both the range and any outliers that were detected:

Identifying Apache status code error outliers.

The outlier operator can be useful in both panels or troubleshooting queries, but it really shines when used in real-time alerts (requires Sumo Logic Professional). It avoids setting static thresholds for the alerts, which often results in false-positives when your traffic is volatile or cyclical.

Identify Apache Errors with Sumo Logic

While most status code errors are relatively straightforward to fix, identifying them with real-time visualizations is much more reliable and convenient than manually clicking through every link in your site or inspecting the raw text of your Apache log files.

You’re not just watching 500 errors occur; you’re figuring out why they’re occurring with troubleshooting queries, getting your developers to implement a solution, and verifying that it worked back in your live dashboards.

As a data structure, Apache logs are pretty simple. As you add more servers for load balancing, high-availability or new development environments, making sense of your log files becomes increasingly difficult. When you have a hundred servers generating millions of log messages, getting to the root cause of an issue is time-consuming and error prone.

What is needed is a dedicated Apache log analyzer tool to centralize your logs, monitor errors and provide the ability to troubleshoot issues as they occur in real time.

Apache Error Log Analysis and Sumo Logic

Sumo Logic has built an integration to specifically analyze and visualize errors logged by Apache servers. With the Integration for Apache, you can:

  • Monitor 404 errors
  • Identify 404 URLs and referrers
  • Set dynamic thresholds to alert on “abnormal” levels of 500- errors
  • Optimize web resources
  • Identify misbehaving bots
  • Speed up Apache response times

Apache log analytics doesn’t exist in isolation. A tool like Sumo Logic is meant to integrate tightly with the rest of your web development workflow. You’re not just watching 500 errors occur; you’re figuring out why they’re occurring with troubleshooting queries, getting your developers to implement a solution, and verifying that it worked back in your live dashboards.

The Integration for Apache server

Learn how the integration for Apache provides all the search, aggregation, and visualization tools you need to quickly identify the root cause of your website’s Apache errors.

Additional resources

Access insight

Categories

Sumo Logic Continuous Intelligence Platform™

Build, run, and secure modern applications and cloud infrastructures.

Start free trial

People who read this also enjoyed