Documentation Center

Concordance Search

The concordance search tool lets you search translation memory and, optionally, machine translation for a word or phrase that you are unsure about, to give you examples of how the word or phrase has been used elsewhere. For example, if you had to translate an industry-specific term like "photovoltaic adapter" or "hedge fund," you might search both translation memory and machine translation for matches.

The concordance search tool can be run either as a Search Type from the WorldServer Tools > Translation Memories > Translation Memory: <TM search page or from Tools > Concordance Search Tool in the Browser Workbench. In the Browser Workbench, if you select a word or phrase, then Tools > Concordance Search Tool, the Concordance Search dialog opens with all translation memory hits for that selected word or phrase. If you select a segment, by checking its check box, then invoke the Concordance Search dialog, it opens with that entire segment's contents in the search field, and returns all TM entries that have any of the words in that segment, sorted with those with the most matched words first. Typically, however, you would not search for an entire segment's contents, since that has been performed already when the asset was prepopulated from the TM.

The standard and segment leverage translation memory search options are focused on finding exact and fuzzy matches, for the purpose of providing a translation for a complete segment. The concordance search is a secondary type search that attempts to find TM entries where the words provided have been used.

In the following search for the string latest eGate, WorldServer found two 100% matches—that is, matches of the entire string—and several 50% matches—where only one of the two words was found.

Figure 1. Concordance Search

You can select a translation memory and machine translation configuration to search in. The default TM and MT used is based on the configuration for the content being translated. Similarly, the source and target language is inferred from the asset in context.

The scores are displayed for each match using the “concordance scoring” method. The “concordance scoring” is the percentage of the sought words being found in the match, regardless of the number of other words in the segment. Normal fuzzy match scoring would have reported this as a much lower match because fuzzy match scoring is a percentage of the found words against the number of words in the segment.

The following example shows a concordance search when Include MT using configuration <MT configuration> is enabled.

Figure 2. Concordance Search Including Machine Translation

Machine translation matches return a concordance score equal to the level set for the Fuzzy Score Equivalent field when the MT adapter was configured. In this example, all concordance scores returned by the BabelFish configuration are 80.0. This does not mean that you should use an MT match instead of any TM match with a score of less than 80.0. The Fuzzy Score Equivalent is an advisory estimate of how your WorldServer administrator felt a BabelFish match would compare on the translation memory scale. However, since every match returned has the same score, this value should not be used as the basis for a decision.
The following rules apply to the concordance search:
  • Concordance results are case insensitive. However, a penalty is assessed for capitalization penalties in the same way they are assessed for standard and segment leverage match results. See the WorldServer Translation Memory Administration Guide for details about the TM score capitalization penalty property.
  • Concordance scoring uses only words and numbers. Placeholders and punctuations are ignored for scoring purposes.
  • The Minimum Score % threshold is used to filter the results.
  • The Maximum # of Hits value restricts the displayed result set. All hits are displayed in a single page, so you should limit results to a small set.
  • Depending on your permissions, you are granted access to the TM entry editor by clicking on the source text of the result. This applies to TM based results only.
  • Order and position of words in your search text is not significant. That is, searching for “cars and trucks” or “trucks and cars” both return a 100% match against the segment “His favorite modes of travel are cars, planes, and trucks.”
  • Found search words are highlighted in the source text of the results. Target-based concordance searches are not supported.
  • The sort order of the results is based first on the concordance score.
  • Wild card searches are not supported.
  • Searches are based on whole words or stems of those words if stemming is enabled. There is no support for substring or subword lookups.