Segmentation rule - Example
The following rule is used as an exception to the segmentation rule that defines a segment demarcated by a period (full stop).
Because it is used as an exception, the TM will treat text that matches this pattern as matching a section of text that does not contain a segment break, even if the text also matches the more general pattern that defines a segment break.
This following rule matches any text that contains a period (perhaps followed by other closing punctuation), followed by a space and then a lowercase letter.
Before break
\.+[\p{Pe}\p{Pf}\p{Po}"]*
Close, final and other punctuation, are defined Unicode categories for the following codes:
\p{Pe} specifies close punctuation.
\p{Pf} specifies final quote punctuation.
\p{Po} specifies other punctuation.
After break
\s\p{Ll}
This regular expression matches a space followed by a lowercase letter.
For more information about Unicode categories, see, http://msdn.microsoft.com/en-us/library/system.globalization.unicodecategory.aspx.