Word Breakers
Word breaking attempts to identify word boundaries within text content and present the information to WorldServer for further processing. Word breaking is critical to translation memory (TM) and terminology database (TD) searches (excluding ICE and pure exact match TM searches). It is what defines the word units that are then used when creating shingles for match searches.
The word breaker component allows you to create new implementations for handling specific languages.
Java documentation
com.idiominc.wssdk.component.linguistic
1 Shingles is a term used to refer to word runs or a series of words taken from a reference text.