Configuring which binary resources get indexed for Content Delivery search
By default, the IQ Index Service indexes PDF documents, Microsoft Word documents, Microsoft Excel documents, Microsoft PowerPoint documents, and OpenDocument Text documents. You can change which binary files (with textual content) get indexed by editing the configuration file, deployer-conf.xml, of your combined Content Deployer or Content Deployer worker. In these files, you can specify configuration strings as hardcoded values or as parameters.
Procedure
- On your Content Delivery server environment, access the configuration location of your Content Deployer worker or, if your worker is combined with the endpoint, the configuration location of your combined Content Deployer.
- Depending on your preference, do one of the following:
- If you prefer to configure using environment variables, open deployer-conf.xml for viewing.
- If you prefer to configure by editing the configuration file itself, open deployer-conf.xml for editing.
- Find the
Stepelement that takes care of indexing SDL Tridion Sites content. It has itsIdattribute set toTridionSearchIndexDeployStep. - Within this element, find the child element called
BinaryIndexing. Itsextensionsattribute specifies the default list of file extensions for binary files it can index:File extension File type pdf Adobe PDF (Portable Document Format) document doc Microsoft Word document (before the 2007 version) docx Microsoft Word document (from the 2007 version onward) xls Microsoft Excel document (before the 2007 version) xlsx Microsoft Excel document (from the 2007 version onward) ppt Microsoft PowerPoint document (before the 2007 version) pptx Microsoft PowerPoint document (from the 2007 version onward) odt OpenOffice document - To add to the list of file extensions, decide which types of files you want to index, and identify the file extensions associated with those file types.
This indexing functionality uses Apache Tika, a toolkit that can work with over a thousand different file types. For more information about supported file types, refer to the Apache Tika Supported Document Formats webpage. Note that Tika defines file types by their MIME types rather than by their file extension. Also note that some file types may require additional plugins.
- Ensure that the value of the
extensionsattribute in deployer-conf.xml is set to your own comma-separated list of file extensions. - If this Content Deployer is also used to deploy Tridion Docs content, find the
Stepelement that takes care of indexing Tridion Docs content, which has itsIdattribute set toIshSearchIndexDeployStep, and configure it in the same way.