Wednesday, April 15, 2015

Enabling OCR of TIFF images for SharePoint 2013 Search

SharePoint 2013 Enterprise Search has the built-in ability to OCR and index the content of your scanned tiff images during a crawl (whether they are are stored in SharePoint or not). This is a very powerful feature, yet a bit mysterious to configure as the configuration steps have changed since the 2010 version. I’ll outline the steps below:

1.      Using Server Manager, ensure the Windows TIFF iFilter feature is enabled on each crawl server

2.      Open the Local Group Policy Editor and locate the OCR folder beneath Computer Configuration > Administrative Templates.

3.      Edit the policy setting for “Select OCR languages from a code page”.  Choose Enabled and select the appropriate languages.

4.      Open the SharePoint Management Shell (using Run as Administrator) and run the following commands to configure content parsing for TIFF images.

5.   $ssa = Get-SPEnterpriseSearchServiceApplication

6.   New-SPEnterpriseSearchFileFormat -SearchApplication $ssa tif "TIFF Image File" "image/tiff"

New-SPEnterpriseSearchFileFormat -SearchApplication $ssa tiff "TIFF Image File" "image/tiff"

7.      Restart the SharePoint Search Host Controller service.

8.      Open the Search Service Application administration.  Under the Crawling navigation item, navigate to File Types.  Add two new File Types for tif and tiff.

9.      Perform a Full Crawl of your content.

Depending on how many TIFF images are crawled, this may be a considerably longer amount of time than your previous crawl time.  Additional planning may be necessary, such as potentially scoping a Content Source to only content that should be OCR’d, or adjusting crawl schedules.