SharePoint Document IDs

SharePoint Document IDs

The SharePoint Document ID Service is a new feature of SharePoint 2010 that offers a number of useful capabilities, but carries some limitations.  Let’s dig a bit deeper and see what it does and how it works.

One challenge for SharePoint users is that links tend to easily break. Rename a file or folder, or move the document, and a previously saved or shared link will not work.  By tagging a document with an ID, SharePoint can start referencing documents using this ID, even when the underlying structure beneath it has changed.  SharePoint can accept a link with this ID, by referencing a dedicated page on each site that takes care of finding the the document.  This page is named DocIDRedir.aspx.  Here’s what a URL might look like:

“http: //%3csitecollection%3e/%3cweb%3e/_layouts/DocIdRedir.aspx?ID=XXXX”

There’s also a Document ID web part that’s available for users to enter a Document ID.  This is used most prominently when creating a Records Center site, which is based on an out-of-box website template.

The Document ID Service is enabled at the Site Collection level, and assigns Document IDs that are unique only within the site collection.  There is a prefix available for configuration that is most useful when assigned uniquely for each Site Collection to ensure uniqueness across your web application and even farm.  If you have more than one farm, it makes sense to provide an embedded prefix to indicate the farm, to ensure uniqueness globally.

Setting Document ID

Once the Document ID Service is enabled, every new or edited document instantly gets a Document ID assigned.  However, historical documents do not get an immediate Document ID assignment.  The assignment of Document IDs to documents that were uploaded prior to this service being enabled are assigned by a Timer Job called the “Document ID assignment job” that exists at the Web Application level.  By default this job runs nightly.  This is one of two jobs associated with the Document ID Service; the other being the “Document ID enable/disable job ”

When the Document ID Service is enabled for a Site Collection, Event Receivers are automatically installed in each Document Library.  Actually there is a set of Event Receivers installed for each and every Content Type configured within that document library.  The Event Receiver is called “Document ID Generator” and is configured to by fired synchronously.  There is a separate Event Receiver for the following events:

  • ItemAdded
  • ItemUpdated
  • ItemCheckedIn
  • ItemUncheckedOut

Once a Document ID is assigned, it is changeable through the Object Model, although do so at your own risk.  Before the Document  ID Service is enabled, the Document ID field does not exist to be assigned.   if you are migrating from a legacy system that has existing Document IDs, you can first migrate the documents, then the Document ID service is enabled.  This adds the internal Document ID field.  Then before the daily Document ID Assignment job runs (better yet, disable it during this process), we can programmatically take the legacy Document IDs and assign their values to the SharePoint IDs.  With the Document ID field populated, the Document ID Service will not overwrite the already set Document IDs.

Note that part of Document ID Service is to redirect URLs referencing the Document ID.  It turns out, if you manually assign duplicate Document IDs (something that in theory should never occur), the daily Document ID Assignment Job detects this situation, and the DocIDRedir.aspx redirects to a site-based search page that passes in the Document ID.

Under the covers there are three internal components to a Document ID:

  • _dlc_DocIdUrl: fully qualified URL for document referencing the DocIDRedir.aspx along with the lookup parameter
  • _dlc_DocId: The Document ID.  This is the internal property you can directly address and assign as $item[“_dlc_DocId”]
  • _dlc_DocIdItemGuid: DocID related GUID

That completes our tour of the Document ID Service.  I look forward to hearing of others’ experience with it.

Records Management with SharePoint – Information Architecture: part 2

There is a good deal of groundwork required to fully implement Records Management in SharePoint.  The foundation is the overall Information Architecture.  SharePoint 2010 provides a range of capabilities and is very flexible.  With this flexibility comes choices.  Some of these decisions affect the manageability and extensibility and usability of SharePoint, so we want to plan carefully.  Below are the primary facets of a SharePoint Information Architecture:

  • Hierarchy
    This includes Web Applications, the breakdown of Site Collections, the Site Hierarchy, and associated Document Libraries.  Separate Site Collections that ride along managed paths allow a logical and granular division between content databases, allowing near endless scalability.
  • Navigation
    A good portion of navigation flows out of the decisions on Hierarchy combined with selection and standardization of navigation elements including tables of contents, left hand navigation, horizontal top level global navigation, breadcrumbs, and optionally additional techniques such as MegaMenus.  Best practice dictates security trimmed navigation, so users are only presented with navigation elements to which they have some level of access.
  • Security
    Best practice guides to the use of permissions inheritance wherever possible.  This will make administration as easy as possible.  If security is granted broadly at the top, and more restrictive as one descends the hierarchy, the user will have the best possible experience. This is because subsites will be reachable naturally via navigation, reducing the incidence of pockets and islands that can only be reached via manual bookmarks and links.  Leveraging AD and/or SharePoint groups further minimizes security overhead.
  • Metadata
    This is the heart of the Information Architecture, and the primary focus of this article.

Metadata can be assigned to individual documents and allocated within individual document libraries, however for a true enterprise-class Information Architecture, this needs to be viewed holistically from top down.  To achieve this, the following should be viewed as best practices:

  • Leverage Content Types
    Content Types are the glue that connects data across the enterprise.  The encapsulate the metadata, the document template, workflow and the policies that apply to documents.  A single centrally managed content type can control documents in libraries within countless sites.
  • Content Syndication Hub
    Before SharePoint 2010, Content Types lived within the Site Collection as a boundary.   This was a significant obstacle to scalability and consistency across the enterprise.  The Content Syndication Hub changes all that.  From a single location, All Content Types can be defined and published across the farm.  That includes the information policies, metadata and document template.
  • Content Type inheritance
    All Content Types must inherit from built-in SharePoint Content Types.  However by structuring your content types to inherit is a logical and hierarchical fashion, management and evolution of your Information Architecture can be an elegant and simple affair.   An example could be a Corporation Content Type, with sub-companies inheriting from it, then divisions, departments, and finally use-oriented content types.  Imagine needing to add a new field (or site column) across an entire company.  Adding it high in your hierarchy will propagate to all subordinate content types.
  • Build out enterprise taxonomies
    For the Information Architecture to be relevant and useful, it needs to map to the organization from a functional perspective.  The vast majority of the naming of data in an organization, as well as the hierarchy and relationships need definition, to enable the SharePoint Farm to enable users to tag, search and utilize the documents and information in the farm.  The larger the organization, the harder this is to achieve.

One challenge is managing all the Content Types and Site Columns.   This is because on publishing, Site Collections actually identify these by name instead of a GUID (Guaranteed Unique Identifier).  If you have an existing Site Column or Content Type locally defined in a Site Collection, this name collision will prevent the propagation of these conflicts into this Site Collection.  The challenge is magnified by the Content Syndication Hub publishing the Content Types and Site Columns to all subscribing Site Collections.  So even if your Site Collection only needs a few, it’s an all or nothing affair.

Given we are limited to planning to avoid naming conflicts, my recommendation is to add identifying information to the trailing end of Site Columns and Content Types, especially when defining a generic content type such as “Reference Document” or a Site Column such as “Completion Date”.  Instead, perhaps add additional text in a consistent manner.  Such as “Reference Document (AR)”  (for Accounts Receivable) or “Completion Date (PMO Task)”.  The reason to add the text at the end is in many situations the end of the text is cut-off in the user interface.  While hovering over the text (such as in a grid column) oftentimes shows the full name, best is to make the title easily identifiable from a user perspective.

The real challenge in setting up the Information Architecture is not the technical configuration.  That’s a walk in the park.  The real hard part is gathering the experts to define the taxonomies and making the appropriate decisions is the hardest part in large organizations.   If you have an existing farm that has grown organically and has not taken advantage of content types, the syndication hub, it is actually possible to wrestle it from chaos to order, but it’s not a cakewalk.  I have created a range of scripts and techniques for publishing the components of the new Information Architecture, and reassigning documents and metadata to it, resulting in the structured farm that works within the defined Information Architecture framework.

Records Management with SharePoint – part 1

SharePoint 2010 has some great capabilities for implementing a true Records Management policy.  Let’s explore both the capabilities as well as limitations, and how to extend SharePoint to create a true enterprise-class Records Management system.

The overarching goal in Records Management is to ensure Records are handled in a manner that is consistent with the organization’s Records policy.  So first a policy must be defined across the enterprise, and then all systems including SharePoint must manage documents in a policy-compliant, automated, user-friendly and auditable fashion.  Strategically we want to:

  • Limit demands on the user
    Simplify metadata tagging, and hide records jargon from end-users
  • Policy based disposition
    Automate disposition to eliminate the dependency on end-users to take action on each document.
  • Enhanced reporting
    Enable users to self-satisfy, to explore document expiration and disposition.

First let’s clarify what Records are.  Not all documents are Records.  SharePoint offers a range of capabilities in support of defining and managing records:

  • Records can be managed centrally or in-place
    Central management through sending documents to a “Record Center” offers the ultimate in centralized control, search, storage, security and administration.  However there is a real impact on end-users when Records are moved from their usual home.  SharePoint offers “In-Place Records Management” which is the direction the industry seems to be heading.
  • Records can be blocked from deletion
    End users can be prevented from deleting a Record, which makes sense, as Records Management by definition provides policy for treating Records.
  • Records can be stripped of versions
    This reducing the frequency that multiple versions of a Record are stored.
  • Records can be made read-only
    This can be used to lock down a record so it does not change.
  • Records can and likely do have their own expiration and disposition rules
    SharePoint allows a different policy to be applied to a document if it is a record.
  • Records are quickly identified
    Searching, sorting, filtering are available to identify records.  Documents that are records are also easily identified by a special record mark on the document icon.

Below are the pieces of the puzzle, each of which I will devote an article to addressing:

  1. Define your information architecture
  2. Creating a centrally managed set of Content Types with Information Policies
  3. Wrestling an unstructured set of sites, libraries and documents into a centrally managed information architecture
  4. Document Disposition in SharePoint
  5. Customizing the Expiration Date calculation
  6. Reporting on pending document disposition in SharePoint
  7. Review and approval of document disposition
  8. Control the timing of document disposition

I’m going to delve into how to accomplish the above to define and put in place a Records Management policy and system across an enterprise.