Customized Taxonomy Creation from a feedfile

122

Creating a customized Taxonomy from a feed file

There’s an native SharePoint 2013 capability to import termsets using a CSV. However it’s limited in structure, produces no logs, and if it encounters a problem, it fails without indicating where it failed, and it won’t continue the import after the problem entry.

For industrial strength Taxonomy loads, rolling your own is the only way.

Here’s a script that is easily adapted and extended. It detects the first letter of the term, and files the terms by the first letter. Then uses two more levels to create a 3 level hierarchy

When loading terms, terms are committed in batches. The larger the batch, the faster the load. However for most one-off loads, I recommend using a batch size of 1, so any errors are immediately addressable and localized to one term.

Taxonomy basics

First, grab a taxonomy session

$taxonomySession = Get-SPTaxonomySession -Site $TaxSite

Now let’s grab the termstore for our target Service Application

    $termStore = $taxonomySession.TermStores[$ServiceName]

Finally, we can grab our target group

    $group = $termStore.Groups[$GroupName]

Lastly, we can grab our target termset if it exists

    $termSet = $group.TermSets | Where-Object { $_.Name -eq $termSetName }

Or create a new termset:

		    $termSet = $group.CreateTermSet($termSetName)
		    $termStore.CommitAll()

Let’s grab one or more matching terms. Setting the value below to $true avoids untaggable terms, like parent company at higher tier, but we want to find unavailable tags here

		[Microsoft.SharePoint.Taxonomy.TermCollection] $TC = $termSet.GetTerms($CurrentLetter,$false)

Let’s see what matching terms we have:

if ($TC.count -eq 0)
{
  write-host "No Matching Terms Found!"
}
else
{
  write-host "$($TC.Count) Matching Terms Found!"
}

Let’s create a Term2 beneath an existing Term1, then set a description value:

		$Lev2TermObj=$Lev1TermObj.createterm($Lev2Term,1033);
                $Lev2TermObj.SetDescription($Description,1033)

That covers some of the basics. Let’s put it together into a useful script:

#CREATES  AND POPULATES A FULL HIERARCHICAL TERMSET
#Later we can add details to termset
#term.SetDescription()
#term.CreateLabel
#KNOWN PROBLEM: batch will fail if there are duplicate names in the batch, preventing clean restart unless batch size = 1

$snapin = Get-PSSnapin | Where-Object {$_.Name -eq 'Microsoft.SharePoint.Powershell'}
    if ($snapin -eq $null)
    {
      Add-PSSnapin Microsoft.SharePoint.PowerShell
    }

$env="Prod"
$termSetName = "YourTermset"

$SourceCSV="L:PowerShellTaxTabDelimitedTermsetFeed.txt"

#set the batch size; 100+ for speed, reduce to 1 to catch errors
$batchSize=1;
$BatchNum=0;

if ($env -eq "Dev")
{
	$TaxSite = "http ://SharePoint Dev"
	$ServiceName="Managed Metadata Service"
	$GroupName="TermGroupName"
}
elseif ($env -eq "Prod")
{
	$ServiceName="Managed Metadata Services"
	$TaxSite = "http ://SharePoint"
	$GroupName="TermGroupName"
}

try
{
$StartTime=get-date;
Write-Host -ForegroundColor DarkGreen "Reading CSV...$($StartTime)"
$Terms=Import-Csv $SourceCSV -Delimiter "`t"
$ReadSourceTime=Get-Date;
$Duration=$ReadSourceTime.subtract($StartTime)
Write-Host -ForegroundColor DarkGreen "Read in $($Terms.count) items from $($SourceCSV) in $($duration.TotalSeconds) Seconds"
}
catch
{
Write-Host -ForegroundColor DarkRed "Could not read in $($SourceCSV)"
}
	#first let's grab a taxonomy session
  	$taxonomySession = Get-SPTaxonomySession -Site $TaxSite
	#plural Now let's grab the termstore for our target Service Application
    $termStore = $taxonomySession.TermStores[$ServiceName]
	#Finally, we can grab our target group
    $group = $termStore.Groups[$GroupName]

    $termSet = $group.TermSets | Where-Object { $_.Name -eq $termSetName }
    if($termSet -eq $null)  # will have to create a new termset
	{
		try
		{
		    $termSet = $group.CreateTermSet($termSetName)
		    $termStore.CommitAll()
			Write-Host "Created Successfully $($termSetName) TermSet"
		}
		catch
		{
			Write-Host "Whoops, could not create $($termSetName) TermSet"
		}

    }
	else #termset already exists
	{
	Write-Host "Nice, termset $($TermSetName) already exists"
	}

$CurrentLetter=$LastParentTerm=$null; # track previous parent, to determine whether to create a parent

for ($i=0; $i -lt $Terms.count; $i++)
{
$Lev1Term=$Terms[$i]."Level 1 Term"
$Lev2Term=$Terms[$i]."Level 2 Term"

if ($LastParentTerm -ne $Lev1Term)
{
	$LastParentTerm=$Lev1Term;
	if ($LastParentTerm[0] -ne $CurrentLetter)  #create a new letter!
	{
		$CurrentLetter=$LastParentTerm[0];
		#setting to $true avoids untaggable terms, like parent company at higher tier, but we want to find unavailable tags here
		[Microsoft.SharePoint.Taxonomy.TermCollection] $TC = $termSet.GetTerms($CurrentLetter,$false)
		if ($TC.count -eq 0)
		{
			$CurrentLetterTerm=$termSet.createterm($CurrentLetter,1033);
			$CurrentLetterTerm.set_IsAvailableForTagging($false);
		}
		else
		{
			$CurrentLetterTerm=$TC[0]
		}

	}

	#first try to find existing level1 term before trying to create the term.  This is needed for incremental loads
	[Microsoft.SharePoint.Taxonomy.TermCollection] $TC = $termSet.GetTerms($Lev1Term,$false)
	if ($TC.count -ge 1)  #Term found.  So use it
	{   #assume only one hit possible, if more than one found, just use first, as precise parent is less important in this case
		$Lev1TermObj=$TC[0];
	}
	else # no term found, so create it
	{   #in this case, all parent terms are not available, this logic is for extensibility only
		$Lev1TermObj=$CurrentLetterTerm.createterm($Lev1Term,1033);
		if ($Terms[$i]."available" -eq "FALSE")  #careful, if term2 has a new term1, the term1 will be created as available for tagging
		{
			$Lev1TermObj.set_IsAvailableForTagging($false);
		}
		else
		{  #we choose not to tag this level as available, so force level1 to always unavailable.
		$Lev1TermObj.set_IsAvailableForTagging($false);
		}
	}
} #term1 unchanged, so this above was handling new terms or finding terms, below is just term2 handling. Note hole, in case term is being loaded that exists already
	try
	{
	if ($Lev2Term.get_length() -ne 0)  #bypasses my habit of new parent terms with empty level 2, can be zero length and not null
		{
		$Lev2TermObj=$Lev1TermObj.createterm($Lev2Term,1033);

		$Description=$Terms[$i]."Description"
		if ($Description.get_Length() -ne 0)
		{
			try
			{
				$Lev2TermObj.SetDescription($Description,1033)
			}
			catch
			{
			Write-Host -ForegroundColor DarkRed "Failed to set description on $($i)"
			}
		}

		}
	}
	catch
	{
	Write-Host -ForegroundColor DarkRed "Could not create $($terms[$i])"
	}
	if (($i % $batchSize) -eq ($batchSize-1))   #some quick modulus math
	{
	$BatchNum++;
		try
		{
		$termStore.CommitAll();
		Write-Host -ForegroundColor darkgreen "Committed terms in batch: $($BatchNum)"
		}
		catch
		{
		Write-Host -ForegroundColor darkred "FAILED commiting terms in batch: $($BatchNum), Index: $($i)"
		}
	}
}

$termStore.CommitAll();  #in subsequent ophase, try to commit a batch at a time

Observations

1. CSV loads fast, and is cached, so subsequent loads are extremely fast
2. Batching speeds things, but not as much as one might imagine
3. Once a batch fails due to a duplicate name, the whole process is messed up and the script needs re-running

Tips and tricks

1. Sort source CSV by Term1 () then by Available
2. Eliminate leading blanks for terms using Trim()
3. Ensure no dups in advance
4. Sort so all term levels are grouped together, otherwise an attempt to create the second set of Term1 will fail

Enjoy!

Share this entry

One Response

Leave a Reply

Your email address will not be published. Required fields are marked *

Table of Contents

Categories

Categories