Documentation Center

Word Breakers

Word breaking attempts to identify word boundaries within text content and present the information to WorldServer for further processing. Word breaking is critical to translation memory (TM) and terminology database (TD) searches (excluding ICE and pure exact match TM searches). It is what defines the word units that are then used when creating shingles for match searches.

The word breaker component allows you to create new implementations for handling specific languages.

Java documentation

com.idiominc.wssdk.component.linguistic
1 Shingles is a term used to refer to word runs or a series of words taken from a reference text.