Documentation Center

Configuring which binary resources get indexed for Content Delivery search

By default, the IQ Index Service indexes PDF documents, Microsoft Word documents, Microsoft Excel documents, Microsoft PowerPoint documents, and OpenDocument Text documents. You can change which binary files (with textual content) get indexed by editing the configuration file, deployer-conf.xml, of your combined Content Deployer or Content Deployer worker. In these files, you can specify configuration strings as hardcoded values or as parameters.

Procedure

  1. On your Content Delivery server environment, access the configuration location of your Content Deployer worker or, if your worker is combined with the endpoint, the configuration location of your combined Content Deployer.
  2. Depending on your preference, do one of the following:
    • If you prefer to configure using environment variables, open deployer-conf.xml for viewing.
    • If you prefer to configure by editing the configuration file itself, open deployer-conf.xml for editing.
  3. Find the Step element that takes care of indexing content. It has its Id attribute set to IshSearchIndexDeployStep.
  4. Within this element, find the child element called BinaryIndexing. Its extensions attribute specifies the default list of file extensions for binary files it can index:
    File extensionFile type
    pdfAdobe PDF (Portable Document Format) document
    docMicrosoft Word document (before the 2007 version)
    docxMicrosoft Word document (from the 2007 version onward)
    xlsMicrosoft Excel document (before the 2007 version)
    xlsxMicrosoft Excel document (from the 2007 version onward)
    pptMicrosoft PowerPoint document (before the 2007 version)
    pptxMicrosoft PowerPoint document (from the 2007 version onward)
  5. To change the list of file extensions, decide which types of files you want to index, and identify the file extensions associated with those file types. This indexing functionality uses Apache Tika, a toolkit that can work with over a thousand different file types. For more information about supported file types, refer to the Apache Tika Supported Document Formats webpage.
  6. Ensure that the value of the extensions attribute in deployer-conf.xml is set to your own comma-separated list of file extensions.