Archive | Search RSS for this section

iFilter for PDF files SharePoint 2010 Crawl

In SharePoint 2010 we can use iFilters to extend the functionality of the search engine. In this post I will talk about iFilters but more specifically about how you can ensure that your PDF files are crawled by the SharePoint search. You can read more about iFilters at http://technet.microsoft.com/en-us/library/gg405170.aspx

The first step here is to add the icon for the PDF files. You do not need to do this step if you do not wish to add the icon to your SharePoint environment.

Installing the PDF icon

  • First you need to download the PDF icon. You can find this at http://www.adobe.com/misc/linking.html#pdficon
  • We then need to add this icon to SharePoint.
  • find the file DOCICON.XML in your 14-hive folder (14\TEMPLATE\XML\)
  • Search for the following line <Mapping Key=”pdf”
  • If this line exists you already have the icon and can move to the next step, if it doesnt exist you should add the following line inside the tag:
    pdf” Value=”pdficon_small.png” /> The value here is simply the name of the pdf icon file (the standard name is pdficon_small.png) you can change this if needed.
  • Now we have told SharePoint to look for the image pdficon_small.png when it finds a PDF document so the last thing we need to do is to actually add the image somewhere where SharePoint can find it.
  • Open \14\TEMPLATE\IMAGES\ and simply add the pdficon_small.png to that folder.

Installing the iFilter

Now that we have the icon for PDF files setup we need to add the actual iFilter which our crawl will use.

The iFilter is now installed on the server but we still need to tell SharePoint to use it.

  • Open Central Administration and navigate to the Search Service Application
  • From the left-hand menu select “File Types”
  • Click on “New File Type”
  • Enter “pdf” as the extenssion and press Ok

Now we need to perform an IIS-reset in order for the changes to work (Warn your users before you do this since their sessions will be terminated)

  • Start the CMD-prompt [Start] -> [All programs] -> Accessories – > Command Prompt
  • Type iisreset then press enter
  • Type NET STOP OSearch14 then press enter
  • Type NET START OSearch14 then press enter

You can now crawl your pdf files (Start a full crawl)

 

Note:

It is worth to mention that there are commerical iFilters as well that will crawl your files much faster. The free iFilter from adobe will only crawl one PDF at the time so if you are experiencing problems with the time it takes to crawl your farm due to there being a lot of PDF files you might want to look into the iFilters you can buy for PDF.

 

Advertisements

Indexing Gantt Views in SharePoint

When SharePoint tries to index pages with Gantt views you will get an error similar to:

Microsoft.SharePoint.SPException: This view requires at least Microsoft Internet Explorer 7.0, Mozilla FireFox 3.0, or Apple Safari 3.0.

This is bug in SharePoint which makes it impossible for the index server to index the Gantt view due to the fact that its user agent is not permitted to crawl the content.

 In order to fix this we need to modify the registry on the index server as follows:

  •  Start Regedit
  • Find the following key

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Global\Gathering Manager\

  • In this key locate the User Agent value

 

  • Change the value from: 
    Mozilla/4.0 (compatible; MSIE 4.01; Windows NT; MS Search 6.0 Robot)

    To:

    Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; MS-RTC LM 8; Tablet PC 2.0)
  •  Restart the server.

 You should now be able to index these pages without any problems.