Posts

What is a Soft 404?

What is a Soft 404?

A Soft 404 is an error that describes a web request that returns a successful response code (200), but the response does not contain the expected content.

Regular 404 File Not Found Errors

404 is the HTTP (the communications protocol of the web) error code that means File Not Found. Error 404 is so well-known that it’s wormed its way into our pop culture. Error 404 occurs on a static website (a site that just serves files from a folder on a web server) when a file is requested and does not exist.  A 404 may occur from a site running a content management system like WordPress or Drupal when a user requests a page that does not exist in the site’s database.

To differentiate between soft 404 errors and the type we just described, we call a standard 404 a Hard 404. A Hard 404 occurs when the server returns the 404 response code.

How a Soft 404 is Different from a Hard 404

Soft 404 occurs when the server returns a successful response code (200), but Google, Bing, or some other service determines that the reply doesn’t contain the expected content.

How does that happen?

If you have a WordPress site, it’s very easy to replicate. Create an empty category, tag, or other taxonomy. Make sure there are no posts assigned to it.  View the new category’s page on your website.  WordPress will return a successful response code, but in most cases, the page content will end up displaying a message saying the content is missing.

A successful response code, but a response that doesn’t contain the expected content. That’s a soft 404.

Why does that happen?

Think about what’s happening. If you visit the URL for a blog category and the category exists, then the content was found, right? But if the blog category contains no content, the right thing for the CMS to do is to inform you that there are no blog posts under that category.  WordPress is doing the right thing.  Google is responding by telling you about the issue. Now it’s up to you to fix your content in a way that’s most helpful to your visitors.

Where Can I See My Soft 404 Errors?

Login to Google Search Console and select your site. Click the Crawl tab and then choose the Crawl Errors report. You’ll be able to see all of your crawl errors including soft 404 errors.

How do I Avoid Creating Soft 404’s?

Generally speaking: don’t create scenarios where your web server is sending a 200 (Success) response code, but sends a response body that indicates missing content.

Helpful, right?

I most often see soft 404 errors occur when I restructure the tags and categories used on a WordPress site.  You can avoid this by thinking about your category and tag structures before you start creating content, so you don’t have to restructure later and accidentally cause a variety of crawl errors, including both hard and soft 404’s.

How do I fix a Soft 404?

There’s a good chance you’ll eventually run into soft 404 errors no matter how hard you try to avoid them.

Fixing them depends on the cause and your website’s platform.  Generally speaking, you repair a soft 404 by making sure that the URL returns the right response code for the actual response content. Here are some ideas on how to fix a Soft 404:

  • Make it a Hard 404.  If the content doesn’t exist, ensure that your website is returning the appropriate response code. Of course, this is just another error that will show up on your Crawl Errors report. So this is the Viagra solution because you’re making your 404 hard, but the only thing getting screwed is your website.
  • 301 Redirect it to a valid URL.  Make the URL point to existing and relevant content. A 301 resolves the error while helping visitors to stay on your site.
  • Make it valid. Figure out why the request is being considered a soft 404.  Did you accidentally create a category or tag structure with no associated posts?  Add some content to the taxonomy, so there is content there when the page is indexed.

Do Soft 404 Errors Affect My SEO?

Soft 404 errors, like hard 404 errors, can harm your SEO ranking.  Search engines don’t like to discover that they’ve indexed and linked to content that no longer exists. These sorts of errors send a signal about your content quality, and Google eventually removed URLs that result in 404 errors (hard or soft).

Another reason to care about soft 404 errors: a search engine will only exert so much energy toward crawling your website.  This is called your crawl budget. If Google only plans to spend so much time on your domain, you want them focused on your working content, not broken links.

Six Ways to Find 404 Errors On Your Website

404 Not Found: even if you’re not a web designer or a programmer you’ve probably seen this error before. But if you have been living underground in the disconnected world of the mole people for a few decades, 404 Not Found is the error code on the web that means you’ve tried to access a resource that doesn’t exist.

404 errors need to be dealt with. When a customer experiences a 404 it’s a missed opportunity for you and a bad user experience for them. When a search engine experiences a 404 error the missing resources could be removed from their index, and it could be translated as a signal that your website is unreliable.

But before you can fix a 404 Not Found error you need to know they’re happening. This article explore six easy ways to discover 404 errors on your own website.

1. Find 404 Errors Using Server Logs

One of the easiest ways to discover 404 errors is by utilizing your hosting environment’s access logs and error logs. Every hosting environment is different so unfortunately I can’t explain where to find yours, but a Google search should prove fruitful. Searching for “cpanel raw access logs” turns up a plethora of helpful pages for the CPanel hosting environment.

Your log files may need some massaging to be useful. Most are text files that can be easily opened in Excel and then filtered by HTTP response code.

The Pros: This method should show you all 404 errors that occurred on your site in the time frame covered by the log.

The Cons: Your hosting environment’s log files can be difficult to read and utilize unless you know your way around a spreadsheet.

The raw access logs from this website. These file contain lots of data and need some help to be useful.

The raw access logs from this website. These file contain lots of data and need some help to be useful.

2. Find 404 Errors Using a Spider or Link Scanner

This method doesn’t actually find 404 errors. It discovers broken links on your website so they can be fixed before they generate 404 errors. An ounce of prevention is worth a pound of cure.

An easy way to find potential 404 errors is by scanning your website with a spider or link scanner. A spider indexes your site the same way that a search engine like Google does: it starts with a URL and scans the code for links, and then works through that list recursively. There are lots of programs and online services that can scan your site for free, and will provide a list of broken links that exist on your site.  My two favorites are Xenu Link Sleuth and Screaming Frog SEO.

The Pros:

Using a spider to locate broken links on your website doesn’t actually find 404 errors: it helps prevent them.  By scanning your site, discovering, and fixing broken links you’ll prevent your visitors from visiting URLs that don’t exist and reduce the number of 404 errors that occur on your website in the future.

The Cons:

Anyone can link to your website, and you don’t have control over the URLs that they link to. Just because you fix all of the broken links on your website doesn’t mean that other websites, or even search engines, don’t have active links to broken URLs on your website. You won’t be able to discover or fix those using a spider.

A report from Xenu Link Sleuth. Xenu requests every URL of your site and returns the status code, among other things.

A report from Xenu Link Sleuth. Xenu requests every URL of your site and returns the status code, among other things.

3. Find 404 Errors Using Google Analytics (and Yoast SEO)

This one is a little WordPress-specific, but you can do a similar trick with other content management systems.

If you use the Google Analytics by Yoast plugin, it automatically tags 404 errors so you can find them in Google Analytics using the Content Drilldown tool. Just go to Behavior > Site Content > Content Drilldown and do a search for 404.html.

The Pros: 

It’s right in Google Analytics where you would expect to find this sort of data. Because it’s in Google Analytics you can export your list of 404 errors to do something useful with it, like construct a list of URL to redirect.

The Cons:

It’s WordPress specific. It requires you to install yet another plugin on your website to basically embed a few lines of JavaScript for Google Analytics.

Yoast SEO automatically registers your 404 errors in Google Analytics.

Yoast SEO automatically registers your 404 errors in Google Analytics.

4. Find 404 Errors Using Google Search Console

Google Search Console (formerly Google Webmaster Tools) provides a window into how Google sees your website. Under the Crawl Errors Google Search Console provides a list of all connection errors that occured while Google attempted to index your site.

The Pros:

Google regularly spiders your site and attempts to index any URLs that it finds, as well as any URLs it already had in it’s database. If any URL stops working, whether it’s new or historical, Google Search Console will let you know.  These reports can be exported to CSV, so you can do something useful with them, like create a redirect list. This is also helpful because essentially Google Search Console is discovering broken links before actual humans are, so check regularly and act on it.

The Cons:

If you have any pages that you’ve intentionally prevented search engines from indexing either through robots.txt or through a noindex tag, Google will not index them and, therefore, not check to see if the links still work. Google Search Console only displays 404 errors generated by Google’s crawler and not actual users.

Using Google Analytics to Find 404 Errors

Using Google Analytics to Find 404 Errors

5. Find Errors Using SEO Redirection (or another 404 Management Plugin)

This tip is WordPress specific, but most content management systems have a similar feature or plugin.

There are lots of plugins that can help you discover and fix 404 errors. I use SEO Redirection as well as it’s premium sibling, SEO Redirection Premium. These plugins track 404 errors that occur on your website and help you easily resolve them by redirecting the broken URL to an existing page. Yoast SEO’s premium version also has this feature built in. Yoast SEO Premium will hook up straight to your Google Search Console account, and allow you to redirect 404 errors discovered by Google from within the plugin.

The Pros:

WordPress can become your one-stop shop for discovering and dealing with 404 errors and broken links. And if you’re willing to pay for a premium plugin, you can really automate the process.

The Cons:

WordPress specific, and all of the problems that come with installing plugins. To make the most of this method you should  purchase the premium version of one of these plugins. But trust me: they’re worth it.

404 Errors Caught by SEO Redirection Premium

404 Errors Caught by SEO Redirection Premium

6. Find 404 Errors with Other Services

There are a host of other services that can help you discover 404 errors including OnPage.org and Moz. I only bring these up for the sake of thoroughness. There’s absolutely no reason you need to sign up for a paid service just to discover broken URLs and 404 errors, but these services do a lot more than that and are worth investigating.

The Pros:

Another set of eyes scanning your website for errors can’t be a bad thing. Plus these services offer far more than just 404 error reports.

The Cons:

Services like Moz don’t come cheap. So don’t use them unless you’re looking for more than just a 404 error detector.

Moz 4xx Error Report

Moz 4xx Error Report

Summary

So what do I do?  I use a combination of all of the methods above. A link scanner or spider like Xenu helps you proactively discover and fix broken links within your website, but can’t help you with links from other sources like other websites or Google. Google Analytics helps you discover URLs that are actually breaking as people attempt to view them. Google Search Console helps you discover broken links that Google either has in their index, or is trying to index. WordPress plugins like SEO Redirection and Yoast SEO Premium help me easily deal with 404 errors as I discover them. And of course, I already use Moz for other reasons, so I take it’s 404 report into consideration as well.

In another post, I’ll be talking about how to handle 404 errors on your WordPress website once you find them. Stay tuned!