Search engines

The more content you have the more important it gets to guide the user towards the relevant content. One big part in this is the search function.

There are plenty of search engines on the market, and in order to create a flexible solution Kaliko CMS will use a provider based model for search engine implementations. Right out of the box the system comes with two different search providers; the null provider and KalikoSearch.

The null provider is just a dummy to use if you don't want any search functions on your website. KalikoSearch is built on Lucene.net (which is a quite powerful open source search engine) and is provided in order to have a quite powerful search engine as an option right out of the box as well as to act as a sample of how to integrate with a 3rd party search engine.

If you wish to use any other search engine, such as Google Search Appliance, Solr or any other product or service you can integrate these by creating a custom search provider. Hopefully you might also share it with the rest of the community thus providing a broader search provider base for the future.

Compared to how search engines normally crawl websites the implementation differs a bit in this system. Instead for the search engine to crawl the pages in order to index them the system will notify the search provider every time a page is saved. The page type itself decides if the content should be indexed or not (by implementing the IIndexable interface) and if so - what content to store.

This makes it quite easy to narrow down the indexed content on a page to the unique content - such as headers and main texts - instead of indexing everything including menus and lists.

If you implement a provider for a search engine that needs to crawl the pages you can just either omit the interface on your page types or ignore the save event in your provider implementation.

How to add indexing to your page type

For this example we'll take a simple article page type with a page heading, a preamble and body text. The page type will look something like this:

namespace CmsDemo.PageTypes {
    using KalikoCMS.Attributes;
    using KalikoCMS.Core;
    using KalikoCMS.PropertyType;

    [PageType("Article", "Article", "~/Templates/Pages/ArticlePage.aspx", PageTypeDescription = "Simple article")]
    public class ArticlePageType : CmsPage {
        [Property("Heading")]
        public virtual StringProperty Heading { get; set; }

        [Property("Preamble")]
        public virtual TextProperty Preamble { get; set; }

        [Property("Main body")]
        public virtual HtmlProperty MainBody { get; set; }
    }
}

To index the page contents of this type let's start by adding the IIndexable interface and implement it's member MakeIndexItem(CmsPage page):

namespace CmsDemo.Pagetypes {
    using KalikoCMS.Attributes;
    using KalikoCMS.Core;
    using KalikoCMS.PropertyType;
    using KalikoCMS.Search;

    [PageType("Article", "Article", "~/Templates/Pages/ArticlePage.aspx", PageTypeDescription = "Simple article")]
    public class ArticlePageType : CmsPage, IIndexable {
        [Property("Heading")]
        public virtual StringProperty Heading { get; set; }

        [Property("Preamble")]
        public virtual TextProperty Preamble { get; set; }

        [Property("Main body")]
        public virtual HtmlProperty MainBody { get; set; }

        public IndexItem MakeIndexItem(CmsPage page) {
            throw new System.NotImplementedException();
        }
    }
}

The MakeIndexItem function takes one parameter which is a generic CMS page. This function will be called every time a page is saved in order to create a custom IndexItem that contains all data that should be indexed. Let's cast the in parameter to a typed version of our pagetype and populate a new IndexItem. Make sure to create the IndexItem by using the GetBaseIndexItem of your casted page in order to get correct meta-data for the page:

    public IndexItem MakeIndexItem(CmsPage page) {
        // Get a strongly typed version
        var typedPage = page.ConvertToTypedPage<ArticlePageType>();

        // Create a base index item
        var indexItem = typedPage.GetBaseIndexItem();

        indexItem.Title = typedPage.Heading.Value;
        indexItem.Summary = typedPage.Preamble.Value;
        indexItem.Content = typedPage.MainBody.Value;
        indexItem.Category = "Article";

        return indexItem;
    }

This will index our pages in the Article category, which may be used to separate content at search.

If you are adding the indexing feature to a website that already have a couple of pages you might want to have them indexed without re-saving them one by one. This can be done by going to the following page; /Admin/Search/.

Index additional properties

It's possible to add additional properties to the index. This is done by using the MetaData collection:

indexItem.MetaData.Add("myproperty", typedPage.SomeProperty);

These additional properties are however not used for searches.

To include a custom property in the search result, add its name to the `MetaData` property:

  var searchQuery = new SearchQuery(query) {
    MetaData = new[] {"category", "summary", "myproperty" }
  };

  // Perform the searh
  var result = SearchManager.Instance.Search(searchQuery);

Looping through the result set you can then access the property through searchHit.MetaData["myproperty"].

Implement the search function

[To be written]

Query syntax

The queries sent to the Kaliko Search provider can be made more explicit by using boolean terms and specific field matching. By default a sentenced entered as a query will return a result set using each word as a single term, and matching pages within the result will have matched one or more of these words (equivalent with an OR search). By using quotation marks you can search for phrases instead.

QueryMatching rules
cms demo Must match either 'cms' or 'demo'
"cms demo" Must match 'cms demo'
+cms Must match 'cms'
-cms Must not match 'cms'
cms AND demo Both 'cms' and 'demo' must match
cms OR demo Either 'cms' or 'demo' must match.
cms NOT demo Must match 'cms' but not 'demo'
(csharp OR javascript) AND tutorial Must match either 'csharp' or 'javascript' and 'tutorial'
title:news Must have a title that matches 'news' (available fields are title, summary, content, tags, category and any custom meta field added by the developer when indexing)
title:"latest news" Title must match the phrase "latest news"

Showing more like this

One neat feature with having a powerful search engine is the ability to search for content that resembles an indexed item. This can be really useful for blog posts or news pages as you can list simular pages that is determined by the page content and link your visitors to related pages.

The related hits are retrieved by calling SearchManager.Instance.FindSimular with the page you want to use find simular posts like. This function has a few optional parameters like search offset, number of hits to return and whether or not the result only should contain pages from the same category (as set when indexing the pages). By default it returns the first 10 hits in the same category.

The following example gets the first five best matches in the same category and builds up an HTML list with the links.

private string RenderRelatedPosts() {
    // Get the first 5 most simular pages based on the current page
    var searchResult = SearchManager.Instance.FindSimular(CurrentPage, 0, 5);

    // Build a list of the result
    var stringBuilder = new StringBuilder();
    stringBuilder.Append("<ul class=\"list-unstyled related\">");
    foreach (var searchHit in searchResult.Hits) {
        stringBuilder.AppendFormat("<li><a href=\"{0}\">{1}</a></li>", searchHit.Path, searchHit.Title);
    }
    stringBuilder.Append("</ul>");

    return stringBuilder.ToString();
}

If you need to access them as pages you can use PageFactory.GetPage(searchHit.PageId).

By default FindSimular only matches content in the same category. To get a match spanning over different categories explicitly set category matching to false: SearchManager.Instance.FindSimular(CurrentPage, 0, 5, false)

For the Kaliko Search provider the fields that are searched for simularities are: title, summary, content and tags.