Custom Word Breaker Support
WorldServer now supports the creation and use of custom word breaker implementations via the WorldServer SDK. Customers can implement language-specific word breaking implementations to use in place of the core implementation. This provides a framework through which specialized word breaking algorithms and implementations can be used within WorldServer.
Evaluate the core word breaker implementation for your needs before you invest in a custom implementation. The simplest way to do this is to use the translation memory search tool in standard mode to perform some word-based lookup. Look for words that you know are in your translation memory. You might start by running a query for "*" to return the first 1000 records (or whatever maximum you set). If you are able to search for specific words and get back the expected results, then the word breaker is most likely sufficient for you in that language.
You can evaluate for any language you like; however, in general, you should focus on languages that will be used as the source language. As mentioned earlier, the core implementation does not work well for character-based languages such as Japanese and Chinese. This will affect your ability to do word-based searches on these languages. However, if these languages are not being used as source languages, there may be limited return from investing in custom word breaker implementations for these languages. Translation costs are generally driven from the scoping reports generated from source content.
See the WorldServer SDK documentation for details on creating and deploying custom word breaker implementations.