Documentation Center

Creating a stop-words list

You can exclude English words from being indexed by Legacy Content Delivery . As a consequence, a search will return no result based on those words.

About this task

Legacy Content Delivery comes with a default stop-words list: an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, with. Those words, as well as any one-character word, are always ignored by the indexing process and therefore cannot be searched. Here is how to update the stop-words list.

Procedure

  1. Evoke a command console in the \WEB-INF folder of your Legacy Content Delivery home.
  2. Run loaddb.bat client, log in.
  3. Navigate to /db/system/config/db/LiveContent/data/collection.xconf and open the file.
  4. Add values to the <analyzer> element: Replace <analyzer id="en" class="org.apache.lucene.analysis.standard.StandardAnalyzer"/> with, for example,
    <analyzer id="en" class="org.apache.lucene.analysis.standard.StandardAnalyzer">
    		<param name="stopwords" type="org.apache.lucene.analysis.util.CharArraySet">
    			<value>the</value>
    			<value>this</value>
    			<value>and</value>
    			<value>that</value>
    		</param>
    	</analyzer>
    Set as <value> any stop-word you would like to be used by the system.
    You can also remove any values already in the list.
  5. Save and close the file.
  6. Reindex the database.