Home » Customization » Periodically Rebuild Link Databases using an Agent in Sitecore

Periodically Rebuild Link Databases using an Agent in Sitecore

Sitecore Technology MVP 2016
Sitecore MVP 2015
Sitecore MVP 2014

Enter your email address to follow this blog and receive notifications of new posts by email.

Last week a colleague had asked me whether rebuilding the Link Database would solve an issue she was seeing. That conversation got me thinking: wouldn’t it be nice if we could automate the rebuilding of the Link Database for each Sitecore database at a scheduled time?

I am certain others have already created solutions to do this — if you know of any, please share in a comment — but I didn’t conduct a search to find any (I normally advocate not reinventing the wheel for code solutions but wanted to have some fun building a new solution).

In the spirit of my post on putting Sitecore to work for you, I built the following Sitecore agent (check out John West’s blog post on Sitecore agents to learn more):

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Xml;

using Sitecore.Configuration;
using Sitecore.Data;
using Sitecore.Diagnostics;
using Sitecore.Jobs;

namespace Sitecore.Sandbox.Tasks
{
    public class RebuildLinkDatabasesAgent
    {
        private static readonly IList<Database> Databases = new List<Database>();
        private static readonly Stopwatch Stopwatch = Stopwatch.StartNew();

        public void Run()
        {
            JobManager.Start(CreateNewJobOptions());
        }

        protected virtual JobOptions CreateNewJobOptions()
        {
            return new JobOptions("RebuildLinkDatabasesAgent", "index", Context.Site.Name, this, "RebuildLinkDatabases");
        }

        protected virtual void RebuildLinkDatabases()
        {
            Job job = Context.Job;
            try
            {
                RebuildLinkDatabases(Databases);
            }
            catch (Exception ex)
            {
                job.Status.Failed = true;
                job.Status.Messages.Add(ex.ToString());
            }

            job.Status.State = JobState.Finished;
        }

        private void RebuildLinkDatabases(IEnumerable<Database> databases)
        {
            Assert.ArgumentNotNull(databases, "databases");
            foreach (Database database in databases)
            {
                Stopwatch.Start();
                RebuildLinkDatabase(database);
                Stopwatch.Stop();
                LogEntry(database, Stopwatch.Elapsed.Milliseconds);
            }
        }

        protected virtual void RebuildLinkDatabase(Database database)
        {
            Assert.ArgumentNotNull(database, "database");
            Globals.LinkDatabase.Rebuild(database);
        }

        protected virtual void LogEntry(Database database, int elapsedMilliseconds)
        {
            Assert.ArgumentNotNull(database, "database");
            if (string.IsNullOrWhiteSpace(LogEntryFormat))
            {
                return;
            }

            Log.Info(string.Format(LogEntryFormat, database.Name, elapsedMilliseconds), this);
        }

        private static void AddDatabase(XmlNode configNode)
        {
            if (configNode == null || string.IsNullOrWhiteSpace(configNode.InnerText))
            {
                return;
            }

            Database database = TryGetDatabase(configNode.InnerText);
            if (database != null)
            {
                Databases.Add(database);
            }
        }

        private static Database TryGetDatabase(string databaseName)
        {
            Assert.ArgumentNotNullOrEmpty(databaseName, "databaseName");
            try
            {
                return Factory.GetDatabase(databaseName);
            }
            catch (Exception ex)
            {
                Type agentType = typeof(RebuildLinkDatabasesAgent);
                Log.Error(agentType.ToString(), ex, agentType);
            }

            return null;
        }

        private string LogEntryFormat { get; set; }
    }
}

Logic in the class above reads in a list of databases set in a configuration file, adds them to a list for processing — these are only added to the list if they exist — and rebuilds the Link Database in each via a Sitecore job.

I added some timing logic to see how long it takes to rebuild each database, and capture this information in the Sitecore log.

I then wired up the above class in Sitecore using the following patch include configuration file:

<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <scheduling>
      <agent type="Sitecore.Sandbox.Tasks.RebuildLinkDatabasesAgent" method="Run" interval="00:01:00">
        <databases hint="raw:AddDatabase">
          <database>core</database>
          <database>master</database>
          <database>web</database>
        </databases>
        <LogEntryFormat>Rebuilt link database: {0} in {1} milliseconds.</LogEntryFormat>
      </agent>
    </scheduling>
  </sitecore>
</configuration>

I’ve set this agent to run every minute for testing, but it would probably be wise to have this run no more than once or twice a day.

After waiting a bit, I saw the following in my Sitecore log:

rebuilt-link-database

I do question the rebuild times. These seem quite small, especially when it takes a while to rebuild the Link Databases via the Sitecore Control Panel. If you have any ideas/thoughts on why there is an incongruence between the times in my log and how long it takes to rebuild these via the Sitecore Control Panel, please share in a comment.

Further, if you have any recommendations on making this code better, or have other ideas on automating the rebuilding of Link Databases in Sitecore, please drop a comment.

Until next time, have a Sitecoretastic day!

Advertisement

10 Comments

  1. Owen Martin says:

    I believe the time shown is the time taken to start the internal Sitecore job to rebuild the link database, as opposed to the time to rebuild the database proper. I think what you have is a job within a job, so your manual job creation is unnecessary. Unfortunately I don’t think there is a nice way to hook in to the end of the rebuild event

    • Thanks for the reply Owen!

      I modeled my code after code in Sitecore.Shell.Applications.Dialogs.RebuildLinkDatabase.RebuildLinkDatabaseForm in Sitecore.Client.dll, and that code also creates a manual job.

      Rebuild code in Sitecore.Links.SqlLinkDatabase executes SQL directly in a synchronous manner, so I’m a little puzzled.

      • Owen Martinn says:

        That is really weird. I took your code and ran it on a fairly empty solution and had the same results – from the log entries it took about 9 seconds to rebuild using the form, and about half a second to rebuild using the agent…
        I’m going to try and duplicate the form and use some timings that way just to be sure, but it doesn’t really make sense!

  2. Michael West says:

    Will this rebuild the links on the content delivery servers?

    • It should since the rebuilding of the link relationships are saved directly in the SQL database.

      I dug further into the code for retrieval, and it looks like it pulls directly from the SQL database as well.

    • Owen Martin says:

      Assuming that the CD servers are using a shared db for the link database (i.e. core), then yes. We had a situation where CD didn’t have access to core, and so we had to rebuild the link db in the web database manually (in theory, the link db should be kept up to date from the CM environment.
      Check out the following entry in the web.config on your CD environment:

  3. Ivan Huang says:

    This is cool, the only thing worries me, is the time slip using the schedule agent, as it’s definitely not good to run it during business hours.

  4. Mike LeVasseur says:

    I would recommend giving your blog more context on why such a scenario would be necessary. This will help people new to Sitecore understand why they need to implement your code suggestion. It will also help people find your answer in Google searches they do. Otherwise you’re just stating an answer to no apparent question. Make sense?

  5. Scott says:

    We’re in need of a solution like this, as we’ve got no synchronization between CM and CD and we rely on the link DB for a custom search index.

    This is a very interesting idea, but I’m not convinced that it’s actually rebuilding anything. The agent finishes with 900ms. There’s no way a rebuild of our massive link DB would be done that quickly. Also the “units processed” is empty:

    ManagedPoolThread #4 17:29:37 INFO Job started: BECU.Sitecore.Agents.RebuildLinkDatabasesAgent
    ManagedPoolThread #4 17:29:37 INFO Job ended: BECU.Sitecore.Agents.RebuildLinkDatabasesAgent (units processed: )

    Any one have any thoughts on how to get this to actually work?

Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: