iFilter for PDF files SharePoint 2010 Crawl

In SharePoint 2010 we can use iFilters to extend the functionality of the search engine. In this post I will talk about iFilters but more specifically about how you can ensure that your PDF files are crawled by the SharePoint search. You can read more about iFilters at http://technet.microsoft.com/en-us/library/gg405170.aspx

The first step here is to add the icon for the PDF files. You do not need to do this step if you do not wish to add the icon to your SharePoint environment.

Installing the PDF icon

  • First you need to download the PDF icon. You can find this at http://www.adobe.com/misc/linking.html#pdficon
  • We then need to add this icon to SharePoint.
  • find the file DOCICON.XML in your 14-hive folder (14\TEMPLATE\XML\)
  • Search for the following line <Mapping Key=”pdf”
  • If this line exists you already have the icon and can move to the next step, if it doesnt exist you should add the following line inside the tag:
    pdf” Value=”pdficon_small.png” /> The value here is simply the name of the pdf icon file (the standard name is pdficon_small.png) you can change this if needed.
  • Now we have told SharePoint to look for the image pdficon_small.png when it finds a PDF document so the last thing we need to do is to actually add the image somewhere where SharePoint can find it.
  • Open \14\TEMPLATE\IMAGES\ and simply add the pdficon_small.png to that folder.

Installing the iFilter

Now that we have the icon for PDF files setup we need to add the actual iFilter which our crawl will use.

The iFilter is now installed on the server but we still need to tell SharePoint to use it.

  • Open Central Administration and navigate to the Search Service Application
  • From the left-hand menu select “File Types”
  • Click on “New File Type”
  • Enter “pdf” as the extenssion and press Ok

Now we need to perform an IIS-reset in order for the changes to work (Warn your users before you do this since their sessions will be terminated)

  • Start the CMD-prompt [Start] -> [All programs] -> Accessories – > Command Prompt
  • Type iisreset then press enter
  • Type NET STOP OSearch14 then press enter
  • Type NET START OSearch14 then press enter

You can now crawl your pdf files (Start a full crawl)

 

Note:

It is worth to mention that there are commerical iFilters as well that will crawl your files much faster. The free iFilter from adobe will only crawl one PDF at the time so if you are experiencing problems with the time it takes to crawl your farm due to there being a lot of PDF files you might want to look into the iFilters you can buy for PDF.

 

Advertisements

Tags: , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: