Search Crawling in SP2013

128

Search Crawling in SP2013

In SP2010, we have two types of crawls; Full or Incremental Crawl. In a nutshell, your search index can be made on-average fresher, but it is not real-time. Right now we do Full Crawls weekly and incremental crawls hourly.

One of the limitations of Full and Incremental Crawls in SP2010 is that they cannot run in parallel, i.e. if a full or incremental crawl is in progress, the admin cannot kick-off another crawl on that content source. This forces a first-in-first-out approach to how items are indexed.

Moreover, some types of changes result in extended run times; such as script based permission changes, or moving a folder, or changing fields in a content type. Incremental crawls don’t remove “deletes”, so ghost documents are still returned as hits, after deletion, until the next full crawl.

SharePoint 2013 will introduce the concept of “Continuous Crawl”. It doesn’t need scheduling. The underlying architecture is designed to ensure consistent freshness by running in parallel. Right now, if a Full or Incremental crawl is slow, everything else awaits its completion. It’s a sequential crawl. Behind the scenes, continuous crawl selection results in the kick-off of a crawl every 15minutes (this wait can be configured) regardless of whether the prior session has completed or not. This means a change that is made immediately after a deep and wide-ranging change doesn’t need to ‘wait’ behind it. New changes will continue to be processed in parallel as a deep policy change is being worked on by another continuous crawl session.

Note that Continuous crawl will increase load incrementally on the SharePoint server since it inherently can run parallel multiple sessions simultaneously. If needed, we can tune this through ‘Crawl Impact Rule’ settings (which exist today in SP2010) which controls the maximum number of simultaneous requests that can be made to a host (default is 12 threads, but is configurable).

Share this entry

Leave a Reply

Your email address will not be published. Required fields are marked *

Table of Contents

Categories

Categories