Example of a segmentation rule

The following rule is used as an exception to the segmentation rule that defines a segment demarcated by a period (full stop).

Because it is used as an exception, the TM will treat text that matches this pattern as matching a section of text that does not contain a segment break, even if the text also matches the more general pattern that defines a segment break.

This following rule matches any text that contains a period (perhaps followed by other closing punctuation), followed by a space and then a lowercase letter.

Before break

\.+[\p{Pe}\p{Pf}\p{Po}"]*

Close, final and other punctuation, are defined Unicode categories for the following codes:

\p{Pe} specifies close punctuation.

\p{Pf} specifies final quote punctuation.

\p{Po} specifies other punctuation.

After break

\s\p{Ll}

This regular expression matches a space followed by a lowercase letter.

For more information about Unicode categories, see, http://msdn.microsoft.com/en-us/library/system.globalization.unicodecategory.aspx.