Documentation Center

Word breakers

Word breakers attempt to identify word boundaries within text content and present the information to WorldServer for further processing. Word breaking is critical to translation memory and terminology database searches (excluding ICE match and exact match searches).

Word breakers define the word units used when creating shingles for match searches. Shingles are sequences of words (or word runs) taken from the reference text.

By using word breakers, you can create new implementations for handling specific languages.

Java package: com.idiominc.wssdk.component.linguistic