Home » SEO » 404 » Yet Another <httpRequestBegin> Pipeline Processor to Handle ‘Page Not Found’ (404 Status Code) in Sitecore

Yet Another <httpRequestBegin> Pipeline Processor to Handle ‘Page Not Found’ (404 Status Code) in Sitecore

Sitecore Technology MVP 2016
Sitecore MVP 2015
Sitecore MVP 2014

Enter your email address to follow this blog and receive notifications of new posts by email.

Last week I pair programmed with fellow Sitecore MVP Akshay Sura on a class that would serve as an <httpRequestBegin> pipeline processor to serve up ‘Page Not Found’ content along with a 404 status code when a user requests a page that does not exist as an Item in the Sitecore XP.

In this solution, the page does not redirect to the ‘Not Found’ page since this results in a 302 status code which isn’t ideal for SEO. Instead, the ‘Page Not Found’ content should appear on the page with the ‘Not Found’ request.

We decided to have our <httpRequestBegin> pipeline processor class not inherit from Sitecore.Pipelines.HttpRequest.ExecuteRequest — this lives in Sitecore.Kernel.dll — as can be seen in the following blog posts:

Why? The solutions in the above are a bit fragile given that they are subclassing Sitecore.Pipelines.HttpRequest.ExecuteRequest which is an example of tight coupling — code changes in Sitecore.Pipelines.HttpRequest.ExecuteRequest could potentially break code within the subclasses.

Further, the implementations of the RedirectOnItemNotFound() method in the above blog posts don’t redirect unless an Exception is encountered which is a bit awkward given the name of the method.

I’m not going to share the exact solution that Akshay and I had built in this blog post. Instead, I’m going to share one that is quite similar — actually the solution below is an enhancement of the solution we had come up with. I added some caching and a few other things (basically put more things into Sitecore configuration so that the solutions is more extendable/changeable):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;

using Sitecore.Data;
using Sitecore.Data.Items;
using Sitecore.Diagnostics;
using Sitecore.Links;
using Sitecore.Pipelines.HttpRequest;
using Sitecore.Web;

using Sitecore.Sandbox.Caching;

namespace Sitecore.Sandbox.Pipelines.HttpRequest
{
    public class HandleItemNotFound : HttpRequestProcessor
    {
        private string TargetWebsite { get; set; }

        private string StatusDescription { get; set; }

        private List<string> RelativeUrlPrefixesToIgnore { get; set; }

        protected ICacheProvider CacheProvider { get; private set; }

        protected string CacheKey { get; private set; }

        public HandleItemNotFound()
        {
            RelativeUrlPrefixesToIgnore = new List<string>();
        }

        public override void Process(HttpRequestArgs args)
        {
            Assert.ArgumentNotNull(args, "args");
            bool shouldExit = Sitecore.Context.Item != null 
                                || !string.Equals(Context.Site.Name, TargetWebsite, StringComparison.CurrentCultureIgnoreCase) 
                                || StartsWithPrefixToIgnore(args.Url.FilePath);
            if (shouldExit)
            {
                return;
            }

            string notFoundPageItemPath = Sitecore.Context.Site.Properties["notFoundPageItemPath"];
            if (string.IsNullOrWhiteSpace(notFoundPageItemPath))
            {
                return;
            }

            Database database = GetDatabase();
            if (database == null)
            {
                return;
            }

            Item notFoundItem = database.GetItem(notFoundPageItemPath);
            if (notFoundItem == null)
            {
                return;
            }

            string notFoundContent = GetNotFoundPageContent(args, database, notFoundPageItemPath);
            if(!string.IsNullOrWhiteSpace(notFoundContent))
            {
                args.Context.Response.TrySkipIisCustomErrors = true;
                args.Context.Response.StatusCode = 404;
                if (!string.IsNullOrWhiteSpace(StatusDescription))
                {
                    args.Context.Response.StatusDescription = StatusDescription;
                }

                args.Context.Response.Write(notFoundContent);
                args.Context.Response.End();
                return;
            }

            Log.Warn("The 'Not Found Page: {0} shows no content when rendered!", notFoundItem.Paths.FullPath);
        }

        protected virtual bool StartsWithPrefixToIgnore(string url)
        {
            return !string.IsNullOrWhiteSpace(url) && RelativeUrlPrefixesToIgnore.Any(prefix => url.StartsWith(prefix));
        }

        protected virtual Database GetDatabase()
        {
            return Context.ContentDatabase ?? Context.Database;
        }

        protected virtual string GetNotFoundPageContent(HttpRequestArgs args, Database database, string notFoundPageItemPath)
        {
            Assert.ArgumentNotNull(args, "args");
            Assert.ArgumentNotNull(database, "database");
            Assert.ArgumentNotNullOrEmpty(notFoundPageItemPath, "notFoundPageItemPath");
            string cacheKey = GetCacheKey();
            string content = GetNotFoundPageContentFromCache();
            if(!string.IsNullOrWhiteSpace(content))
            {
                return content;
            }

            Item notFoundItem = database.GetItem(notFoundPageItemPath);
            if (notFoundItem == null)
            {
                return string.Empty;
            }

            string domain = GetDomain(args);
            string url = LinkManager.GetItemUrl(notFoundItem);
            try
            {
                content = WebUtil.ExecuteWebPage(string.Concat(domain, url));
                AddNotFoundPageContentFromCache(content);
                return content;
            }
            catch (Exception ex)
            {
                Log.Error(string.Format("{0} Error - domain: {1}, url: {2}", ToString(), domain, url), ex, this);
            }

            return string.Empty;
        }

        protected virtual string GetNotFoundPageContentFromCache()
        {
            Assert.IsNotNull(CacheProvider, "CacheProvider must be set in configuration!");
            return CacheProvider[GetCacheKey()] as string;
        }

        protected virtual void AddNotFoundPageContentFromCache(string content)
        {
            Assert.IsNotNull(CacheProvider, "CacheProvider must be set in configuration!");
            if(string.IsNullOrWhiteSpace(content))
            {
                return;
            }

            CacheProvider.Add(GetCacheKey(), content);
        }

        protected virtual string GetCacheKey()
        {
            Assert.IsNotNullOrEmpty(CacheKey, "CacheKey must be set in configuration!");
            return CacheKey;
        }

        protected virtual string GetDomain(HttpRequestArgs args)
        {
            Assert.ArgumentNotNull(args, "args");
            return args.Context.Request.Url.GetComponents(UriComponents.Scheme | UriComponents.Host, UriFormat.Unescaped);
        }
    }
}

The code in the Process() method above determines whether it should execute. It should only execute when Sitecore.Context.Item is null — this means that previous <httpRequestBegin> pipeline processors could not ascertain which Sitecore Item should be served up for the request — and if the relative url does not start with one of the prefixes to ignore — for example, we don’t want this code to run for media library Item requests which all start with /~/ in a stock Sitecore instance.

Further, the path to the ‘Page Not Found’ Item must be set on the site node within Sitecore configuration. If this is not set, then the code will not execute.

If the code should execute, it tries to grab the ‘Page Not Found’ content from cache — the class above reuses the CacheProvider class which I wrote for my post on storing data outside of the Sitecore XP but using the Sitecore API.

If this does not exist in cache, we basically make a request to the ‘Page Not Found’ Item using Sitecore.Web.WebUtil.ExecuteWebPage; put this content in cache; and then return it to the Process() method.

If there is content to display, we send it out to the response stream.

I then glued everything together using the following patch configuration file:

<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <pipelines>
      <httpRequestBegin>
        <processor patch:before="processor[@type='Sitecore.Pipelines.HttpRequest.ExecuteRequest, Sitecore.Kernel']"
                   type="Sitecore.Sandbox.Pipelines.HttpRequest.HandleItemNotFound, Sitecore.Sandbox">
          <TargetWebsite>website</TargetWebsite>
          <StatusDescription>Page Not Found</StatusDescription>
          <RelativeUrlPrefixesToIgnore hint="list">
            <Prefix>/~/</Prefix>
          </RelativeUrlPrefixesToIgnore>
          <CacheProvider type="Sitecore.Sandbox.Caching.CacheProvider, Sitecore.Sandbox">
            <param desc="cacheName">[404]</param>
            <param desc="cacheSize">500KB</param>
          </CacheProvider>
          <CacheKey>404Content</CacheKey>
        </processor>
      </httpRequestBegin>
    </pipelines>
    <sites>
      <site name="website">
        <patch:attribute name="notFoundPageItemPath">/sitecore/content/Home/404</patch:attribute>
      </site>
    </sites>
  </sitecore>
</configuration>

In the above configuration file, I am injecting this <httpRequestBegin> pipeline processor to execute before the Sitecore.Pipelines.HttpRequest.ExecuteRequest <httpRequestBegin> pipeline processor.

Let’s see this in action.

I set up an Item in Sitecore to serve as my ‘Page Not Found’ page Item:

404-item

After publishing and navigating to a page url that does not exist in my instance, I get the following:

nope

As you can see, we get the rendered page content for the 404 Item yet stay on the original requested nonexistent page (/nope).

If you have any comments or thoughts on this, please share in a comment.


6 Comments

  1. As always, great article.

    As you write yourself, there is a lot of value i creating custom 401 handlers. Back when I actually built web sites, I usually bundled 401 code with 301 (moved permanently) as they are often closely related (e.g. when you rename or move an item, of if you convert from other technology to Sitecore and you look up an URL redirect table).

    I have also built the 401 handler do a search for “similar content” and propose the top hit with high accuracy. This way, 1 typo in an URL might result in a “the page did not exist. Did you mean xxx)

    • Thanks Lars!

      I have also incorporated “Did you mean xxx” functionality into 404 pages. Doing so offers a good user experience on helping users find what they are looking for.

      • “Did you mean” is a great extension for 404 pages and pretty easy to implement using ContentSearch. Would need to remove the caching in this implementation in that case though, right?

      • yeah, caching would have to be removed in that scenario.

  2. […] of useful Sitecore information) pointed me to this helpful post from Mike Reynolds titled “Yet Another <httpRequestBegin> Pipeline Processor to Handle “Page Not Found” (404 …“. That gives example code for most of the stuff mentioned above and is a good starting point […]

Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.