Tuning your Crawl

Want to tune your Search crawling? There’s plenty of benefit to be had refining how Search crawls in SharePoint.  Eliminate useless page hits, or documents that will fail crawl processing.  It’s another way to exclude sensitive documents as well, if you can find a suitable search crawl exclusion rule.  I found out the hard way that SharePoint URLs defined in a Content Source MUST be a Web Application.  If you only want to crawl a subsite your recourse is to pare out all other sites using Crawl Rules.  The Crawl Rules come in two basic flavors; simple wildcard which is quite intuitive, and Regular Expressions.  You can find the Crawl Rules in Central Admin, General Application Settings, Search, (your Content SSA if in FAST), Crawl Rules ( visible on left).

Surprisingly, there is scant documentation on the Regular Expression implementation in SharePoint.  Through a bit of digging and trial and error I’ve summarized the Regular Expression operators supported in SharePoint:

? Conditional matching; matches optionally “http ://sharepoint/List_%5ba-z%5d?.aspx”
the char a-z is optional
* Matches on zero or more “http ://sharepoint/List_M*”
no M or M or MM…at the end.
+ Matches on one or more “http ://SharePoint/List_M”
One or more Ms at the end
. Match one character “htt p://sharepoint/List_”
One character expected after _
[abc] Any characters; I use abc as example. Ranges a-c work too “http ://sharepoint/List_%5ba-z]”
Matches on any List_ with any letter a-z
| Exclusive OR
If both sides are true, this evaluates to false.
() Parentheses group characters for an operation
{x,y} Range of counts
{x} Exact count
{x,} X or more counts

For FAST, note the Crawl Rules are under your Content SSA, not the Query SSA.

To create an Exclusion Rule with Powershell; Type 0=include, 1=exclude:

New-SPEnterpriseSearchCrawlRule -SearchApplication FASTSearchApp  -Path “http ://SharePoint/Sites/Secret/*”  -Type 1

To output all your Crawl Rules, use this line of PowerShell:

get-SPEnterpriseSearchServiceApplication | get-SPEnterpriseSearchCrawlRule | ft

The CmdLet “get-SPEnterpriseSearchCrawlRule” requires a Service Application object, so we simply pipe one in using the “get-SPEnterpriseSearchServiceApplication” CmdLet.  You can then pipe it to whatever you want.  “ft” is an alias for Format-Table, which is the default output, but you can just as easily pipe it to a file for automatic documentation.  This is especially useful when playing with your crawl rules.

Direct filtered access to SharePoint Timer Job History

Have you ever needed to scroll through the Timer Job History in Central Administration?  Wow, do a lot of jobs run!  Nice that you can view 2,000 at a time, but even that’s not enough to scroll to a previous day’s Timer Job History.   You can just jump into SQL Studio and use this query to extract the timeframe you want.  I needed a two minute window almost three days ago, here’s the simple query, enjoy!

FROM [SharePoint_Config].[dbo].[TimerJobHistory] 
WHERE starttime >'1/1/12 4:59:00'and EndTime <'1/1/12 5:01:00'

Migrating a full copy of MS-Project Server content

There’s a frequent need to refresh MS-Project Server test environments from Production.  The conventional wisdom holds that you need to delete and recreate the SharePoint Web Application (PWA).   However this requires you to:

  • Recreate Alternate Access Methods (AAM)
  • Set quotas
  • Refine Blocked File Types
  • Set User Policy
  • Set Service Application connections…

The faster and smoother better way is instead to drop the old PWA content DB, and and reconnect the new PWA Content database. Regardless, the very first step should actually be dropping the Project Server Web App.  This is done through the Project Server Service Application in Central Administration, Service Applications. Now let’s switch over to the replacement PWA Content DB.  This houses both the top level Site Collection, the PWA application Site Collection and all webs under it (mostly each are a project site):

Dismount-spcontentdatabase [the old content database we are replacing]
Mount-SPContentDatabase -name [the new content database ] -DatabaseServer [your DB server] -WebApplication "http ://pwa"  [change as needed]

However there is one big wrinkle.  Doing this seems to leave an orphaned explicit managed path definition in the Web Application that prevents creation of the PWA Site Collection.  This appears to be what leads people to simply delete and recreate the Web Application.  However the solution is quite simple; remove the orphaned site collection:

Remove-SPSite -Identity  "htt p://pwa/" [change as needed]

When recreating the Project Web Application note:

  • When removing the Project Server Web App, you may wish to uncheck the “Remove Content DB” checkbox
  • Halting the Timer jobs may be required for the steps above.  One advantage of using PowerShell is that it does not depend on the Web App Application Pool to be active
  •  “PWA” must be the name of the project web app
  • Get the database names right, to ensure you connect to the target databases migrated to this environment

If your replacement PWA DBs come from an environment with different security, you’ll need to adjust security manually at the database level.  I prefer to take screenshots in SQL Studio for comparison before starting.

OLAP Configuration

The OLAP cube inevitably needs to be reconfigured.  When refreshing the databases, the OLAP configuration will now mirror the source environment.  Make sure you know the name of the OLAP server and database to reset the OLAP configuration.

Delete the Data Connections OLAP folder, as well as the 13 assorted cubes and the folder in which they reside.  When rebuilding the cube, these get recreated with reference to the OLAP Server and database.

Lastly check the cube rebuild frequency and rebuild the cube.  You should see a successful OLAP cube build log, the new cubes recreated, and data in the cubes that is visible in the Excel PivotTables stored in the OLAP folder that is as current as the source PWA database.


Check main navigation links.  Any hard-coded navigation links in the source may not get repointed automatically in the newly refreshed environment, and could require hand-tuning.

Check that the new Excel OLAP pivots are in a SharePoint Excel Services Trusted Location

Other areas to test

  1. Add-ons
  2. Links
  3. Data (projects, resource pool and associated metadata)
  4. Sites
  6. Configurable fields and Lookup tables
  7. Enterprise Calendar
  8. Security
  9. Ability to access via MSPS client
  10. Scheduled backups
  11. Quick Launch settings in Server Settings
  12. Time/task management settings
  13. Project Detail pages
  14. Project site templates
  15. Ability to add a Risk, Issue to project sites
  16. OLAP
  17. Reports

When you’ve done a refresh once, you are golden!  I like to say “only in technology can you do something once, then be considered an ‘expert'”!