Introduction

EPiServer CMS 6 supports rewriting of incoming and outgoing URLs. This is the mechanism behind friendly URLs where a visible link is rewritten to something more intuitive and user friendly than is usually the case.

The friendly URL functionality is just one small (but powerful) example of what can be done by hooking into the rewriting pipe. What if we, for example, wanted a mechanism that checks all HTML for links, tests these links to see that they are valid and then formats broken links so it is immediately evident on the page?

The following guide will provide such an example. Since it can quickly kill performance if we had this feature switched on in every page we are going to create a right-click option to turn this off/on. Clicking the menu option will trigger events through the use of a registered Javascript in the static PageSetup method.

This sample uses static methods and objects which means that if activated, it will be activated for all visitors on the site. To use this in a live scenario it is advisable to rewrite the code to only be activated for the current user.

Sample Markup

To start things off we create a basic .aspx page and add some text and two links, one if which will be broken.

Copy 
<%@ Page language="c#" Inherits="EPiServerDemo.Default" Codebehind="Default.aspx.cs" AutoEventWireup="true"%>

<html xmlns="http://www.w3.org/1999/xhtml">
<head id="head1" runat="server">
    <title>Sample</title>
</head>
<body>
    <form runat="server" id="form1">
    <div>
    <p>
        A body with text containing numerous links. Some of these links are valid and some are broken.
    </p>
    <p>
        All broken links will be printed with red font color. Hovering the mouse over a broken link
        will produce a tooltip containing the error mesage received by the HttpRequest used to test 
        the link.<br />
        For example: this link to <a href="http://www.aftonbladet.se">Aftonbladet</a> 
        is correct (http://www.aftonbladet.se), while this link to a 
        <a href="http://www.ttttddfhdfh.se/">non existing site</a> is not (http://www.ttttddfhdfh.se/).
    </p>
    <p>
        You can enable or disable this feature by setting enableCheckLinks to true/false.
        or by using the right-click menu.
    </p>
    <p>
        To tweak performance and accuracy you can use the <b>checkLinkTimeout</b> setting in 
        web.config. The timeout value will default to 1000ms (one second).
    </p>

    <p>
        You need to be logged in to EPiServer for this functionality to take effect.
    </p>    
    </div>
    </form>
</body>
</html>

In this sample we are using a project and namespace called EPiServerCMS5, the page is default.aspx and the code-behind, of course, is default.aspx.cs.

Web.config settings

To start things off we will want the bad-link-checking functionality to be optional so we add an appsetting to web.config.

Copy 
<appSettings>
    <add key="enableCheckLinks" value="true" />
</appSettings>

The timeout value when testing links will default to 1000 milliseconds (one second). To improve performance (with the risk of having slow sites being returned as broken) you can change the value of checkLinksTimeout in web.config.

Copy 
<appSettings>
    <add key="checkLinkTimeout" value="5000" />
</appSettings>

The Code

To create the rewriting functionality we add a new class to our solution. Right-click your solution and choose Add New Item... On the Visual C# tab, choose Class and name it LinkChecker.cs.

Your using section needs to include the following:

CopyC#
using System;
using System.Data;
using System.Configuration;
using System.Collections.Generic;
using System.Text;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using System.Xml;

using EPiServer.Core;
using EPiServer.PlugIn;
using EPiServer.Web;
using System.Net;

To make sure our class is initialized (and only initialized once) add the PagePlugin attribute to class LinkChecker:

CopyC#
[PagePlugIn()]
public class LinkChecker
{

We will need five members on this class.

  1. private string _errorMessage A String that is used to store the error message alongside the invalid URL when one is found. This is then added to the tooltip shown when hovering over the broken link.
  2. private AddState _addState = 0; The member _addState is used to determine where in the HTML structure we are at the moment. When a link is found we need to know the following:
    • Are we inside the body tag? If not, no action is to be taken.
    • Are we at the start of the link element? If so we add the opening elements of our error formatting.
    • Are we at the end of the link element? If so we add the closing elements of our error formatting.
  3. private static bool _checkLinks = false; This boolean is used to see if we should be checking links or not. The member is accessed through the property CheckLinks. See "CheckLinks property".
  4. private static int _linkTimeout = 1000; The value used to determine how long to wait for an HttpRequest to respond before classing the link as broken.

CheckLinks property:

CopyC#
// Returns true if we have clicked "Enable CheckLinks" on the right-click menu
public static bool CheckLinks
{
    get
    {
        // If this querystring does not contain a checklinks value, simply return
        // the previous value. This is in order to avoid our bool to be overwritten in 
        // case of multiple queries being sent
        if (HttpContext.Current.Request.QueryString["CheckLinks"] == null)
            return _checkLinks;
        // Check the value of our QueryString and assign it to _checkLinks
        bool.TryParse(HttpContext.Current.Request.QueryString["CheckLinks"], out _checkLinks);
        return _checkLinks;
    }
}

Your variable declaration should look as follows (also included here is the enum declaration):

CopyC#
private string _errorMessage = string.Empty;
private static bool _checkLinks = false;
private static int _linkTimeout = 1000;
private AddState _addState = 0;

private enum AddState { Inactive, StartDiv, EndDiv, InBody };

Initialize

After declaring our variables (and the enum) we start off by creating the Initialize function. This will be responsible for checking web.config to see if linkchecking is enabled and what timeout value we might have added as well as adding our HtmlRewriteInit function to the event chain.

CopyC#
// Called by the EPiServer framework during startup, due to the PagePlugIn attribute. This is not
// a PagePlugIn, we're just piggybacking the mechanism to ensure we get called once (and only once).
public static void Initialize(int optionFlag)
{
    string linkTimeout = System.Web.Configuration.WebConfigurationManager.AppSettings["checkLinktimeout"];

    string enableLinkChecker = System.Web.Configuration.WebConfigurationManager.AppSettings["enableCheckLinks"];

    // Set the timeout value according to web.config settings or default it 
    // to one second if no setting is found 
    if (linkTimeout == null)
        _linkTimeout = 1000;
    else
        _linkTimeout = int.Parse(linkTimeout);

    if (!String.IsNullOrEmpty(enableLinkChecker))
    {
        bool isEnableLinkChecker;
        if (Boolean.TryParse(enableLinkChecker, out isEnableLinkChecker))
        {
            if (!isEnableLinkChecker)
            {
                return;
            }
        }
    }
    HtmlRewriteToExternal.HtmlRewriteInit += HtmlRewriteToExternal_HtmlRewriteInit;
    // Added to enable the script parts of our solution to be initialized
    EPiServer.PageBase.PageSetup += new EPiServer.PageSetupEventHandler(PageBase_PageSetup);
}

Initializing the Rewrite Pipe

The RewriteInit event will be triggered every time an HtmlRewritePipe object is called by the UrlRewriteModule and looks as follows:

CopyC#
// Init the HtmlRewrite-engines event handlers. This is called every time an HtmlRewritePipe object
// is instantiated by the UrlRewriteModule.
// The EPiServer.Web.HtmlRewriteEventArgs parameter contains the event data.
static private void HtmlRewriteToExternal_HtmlRewriteInit(object sender, HtmlRewriteEventArgs e)
{
    // If we haven't clicked "Enable CheckLinks", don't do anything
    if (!CheckLinks)
        return;

    // Called on every request
    // We need an instance of ourselves, to keep track of our state
    LinkChecker linkChecker = new LinkChecker();

    // There are two major events from the HtmlRewrite-engine, which allow us to rewrite
    // names and values of the content. The exact definition depends on the XmlNodeType
    // that is being processed. In this sample we only use the HtmlRewriteValue event
    e.RewritePipe.HtmlRewriteValue += linkChecker.HtmlRewriteValueEventHandler;
}

Page Setup - Registering Our Javascript

The PageSetup event is called before rendering the page. This is where we register our Javascript to interact with the right-click menu and reload our page according to the QueryString corresponding to our current CheckLink state (enabled/disabled). We also add our own PreRender function to the event chain.

CopyC#
static void PageBase_PageSetup(EPiServer.PageBase sender, EPiServer.PageSetupEventArgs e)
{
    // This is the place to register .js files but in our case we avoid external .js files by
    // adding our script directly to the markup with a call to RegisterClientScriptBlock
    if (sender is EPiServer.TemplatePage)
    {
        // The javascript responsible for reloading the page and adding the correct QueryString
        // The markup version will look as follows:
        // function SetLinkCheckingState(enable)
        // {
        //     window.location.search="";
        //     if(enable)
        //     {
        //         window.location.search = '?CheckLinks=true';
        //     }
        //     else
        //     {
        //         window.location.search = '?CheckLinks=false';
        //     }
        //     window.location.reload();
        // }
        string script = "function SetLinkCheckingState(enable)\n\r{\n\r\twindow.location.search=\"" +
                        "\";\n\r\tif(enable)\n\r\t{\n\r\t\twindow.location.search = '?CheckLinks=true'" + 
                        ";\n\r\t}\n\r\telse\n\r\t{\n\r\t\twindow.location.search = '?CheckLinks=false';" + 
                        "\n\r\t}\r\n\twindow.location.reload();\n\r}";
        sender.ClientScript.RegisterClientScriptBlock(sender.GetType(), "lcscript", script, true);
        sender.PreRender += new EventHandler(sender_PreRender);
    }
}

Rewriting the HTML - HtmlRewriteValueEventHandler

In the HtmlRewriteValueEventHandler event we check what part of the HTML structure we are currently parsing as well as what element fraction we are at. Depending on where we are we change the _addState to reflect our HTML position.

If the state is found to be AddState.StartDiv we insert our start tags for formatting a bad link. Getting to this point means three things:

  1. We are inside the <body> tag
  2. We have found a broken link (or at least a link that timed out)
  3. We are at the opening fragment of the broken <a href> element

When we find the end of the a href element we append our formatting to the HTML and reset the state to AddState.InBody again so that the parsing continues.

CopyC#
// Handle rewrite value.
// The value event is raised after an associated name event. Check the e.NodeType and other 
// properties to determine course of action.
private void HtmlRewriteValueEventHandler(object sender, HtmlRewriteEventArgs e)
{
    // Check if we are at the start or end of the "insert div phase"
    // and change the state accordingly
    switch (_addState)
    {
        case AddState.Inactive:
            if (e.ElementType == EPiServer.Web.HtmlRewritePipe.ElementTypes.BODY && 
                e.NodeType == XmlNodeType.Element)
                _addState = AddState.InBody;
            return;
        case AddState.InBody:
            if (e.ElementType == EPiServer.Web.HtmlRewritePipe.ElementTypes.A && 
                e.NodeType == XmlNodeType.Attribute && 
                string.Compare(e.Name, "href", StringComparison.OrdinalIgnoreCase) == 0)
            {
                // If its not a stylesheet attribute and if the link does not exist
                if (!LinkExists(e.Value))
                    _addState = AddState.StartDiv;
            }
            break;
        case AddState.StartDiv:
            if (e.NodeType != XmlNodeType.Element)
                break;
            e.ValueBuilder.Append("<span class=\"brokenLink\">");

            _addState = AddState.EndDiv;
            break;
        case AddState.EndDiv:
            if (e.ElementType != EPiServer.Web.HtmlRewritePipe.ElementTypes.A || 
                e.NodeType != XmlNodeType.EndElement)
                break;
            e.ValueBuilder.Insert(0, "<span>" + _errorMessage + "</span></span>");
            _addState = AddState.InBody;
            break;
    }
}

Adding Our Right-click Menu Option - sender_PreRender

In the prerender event we call our CheckLinks property and add the correct menu option to the right-click menu.

If we are currently told to check for bad links, we add a "Disable CheckLinks" option. If we are not checking for links we add a "Enable CheckLinks" option.

CopyC#
static void sender_PreRender(object sender, EventArgs e)
{
    if (CheckLinks)
        (sender as EPiServer.PageBase).ContextMenu.Menu.Add("Disable Check Links", 
            EPiServer.Security.AccessLevel.Edit, 
            new EPiServer.RightClickMenuItem("Disable Check Links", "SetLinkCheckingState(false)"));
    else
        (sender as EPiServer.PageBase).ContextMenu.Menu.Add("Enable Check Links", 
            EPiServer.Security.AccessLevel.Edit, 
            new EPiServer.RightClickMenuItem("Enable Check Links", "SetLinkCheckingState(true)"));
}

Testing a Link

Finally we have our LinkExists function that tests each URL it receives against the preset timeout value (or the default 1000 ms).

CopyC#
// Checks the link by sending an http request. The timeout value can be tweaked to 
// enhance performance but lowering this setting might make slow sites be reported 
// as bad URL:s. 
// The function sets _errorMessage to the url being tried as well as the connected 
// errormessage (if the URL turns out to be bad). This string will then be used as 
// a tooltip over the bad link on the HTML page.
// 
// If we return false, the HTML parser will skip this link since it is working and 
// move on to the next.
// If we return true the rewriting functionality will kick in and the error message 
// that we set in the function will be used as a tooltip for the broken link.
private bool LinkExists(string url)
{
    // Used to build the error message -> badurl, errormessage
    StringBuilder errorMessage = new StringBuilder();

    // Only check certain protocols
    if (!url.StartsWith("http://") && !url.StartsWith("https://") && !url.StartsWith("ftp://"))
    {
        return false;
    }

    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
    request.Timeout = _linkTimeout;
    HttpWebResponse response = null;

    try
    {
        response = (HttpWebResponse)request.GetResponse();
    }
    catch (UriFormatException ex)
    {
        errorMessage.Append(url);
        errorMessage.Append(",");
        errorMessage.Append(ex.Message);
        _errorMessage = errorMessage.ToString();
        return false;
    }
    catch (WebException ex)
    {
        errorMessage.Append(url);
        errorMessage.Append(",");
        errorMessage.Append(ex.Message);
        _errorMessage = errorMessage.ToString();
        return false;
    }
    catch (Exception ex)
    {
        errorMessage.Append(url);
        errorMessage.Append(",");
        errorMessage.Append(ex.Message);
        _errorMessage = errorMessage.ToString();
        return false;
    }
    return true;
}