Sunday, January 29, 2006

SharePoint Services, SQL 2005 and PDF searches

WSS (2003) depends on SQL Full-text Search service (MSFTESQL) to perform document searches (SQL Express does not support Full-text search so, to be able to search document contents, you will need to use SQL Workgroup edition or above). Because this service cannot understand all different document formats (Word, Excel, PDF, etc) it depends on filter components provided by third parties (i.e. Adobe provides the pdf filter) to index their contents. These components are often called IFilter components because they all need to implement the well defined IFilter interface (for more on IFilters see this).

To provide PDF search functionality within WSS the PDF IFilter needs to be installed on the SQL server box (to download click here). Installing the PDF IFilter in a SQL 2000 box will automatically make it available to the SQL Full-text Search Engine (MSFTESQL) but, on a SQL 2005 box, MSFESQL will not, by default, load third party filters already installed in the operating system. To enable this the following extra steps are required (after installing the PDF IFilter):

  • Run the following SQL statements against the WSS content database (be aware that this will reduce the security of SQL server so it should be restored after the IFilter is loaded):

    • Exec sp_fulltext_service 'load_os_resources', 1 (This command tells the Microsoft Search service to load OS specific wordbreakers and stemmers)

    • Exec sp_fulltext_service 'verify_signature', 0 (This allows the Microsoft Search service to load unsigned filters)

  • Bounce SQL server and the Full-text Search engine (MSFTESQL).

  • Check that PDFs have been added to the list of files available for Full-text indexing by running the following SQL statement against the WSS content database:

    • SELECT * FROM sys.fulltext_document_types

  • Restore the Full-text service settings to their default

    • Exec sp_fulltext_service 'verify_signature', 1
    • Exec sp_fulltext_service 'load_os_resources', 0

  • To index existing PDFs in WSS go to the WSS administrative page disable and re-enable the full-text search and index component. This will rebuild the Full-text index and use the new IFilter to index existing PDFs.

No comments: