Specifying regular expressions for the Advanced Display Filter
Regular expressions (regex) are powerful search formulas that can locate complex patterns of characters inside texts. In Trados Studio you can use regular expressions to filter on segments that match a certain pattern. Trados Studio uses the .NET syntax for regular expressions.
About this task
To specify a regular expression for the Source or Target field of the Advanced Display Filter:
Procedure
Example
- Examples of regular expressions
-
Result Regular expression Explanation Display all segments with different capitalization in source and target
Source:
^[A-Z]Target:
^[a-z]Make sure to enable the Case Sensitive option. Otherwise, the regular expression engine finds both lowercase and uppercase strings for both patterns.
The ^ caret symbol signals the beginning of a segment.
[A-Z] describes the range of all uppercase letters, while [a-z] describes the range of all lowercase letters.
Display all segments with different end punctuation in the source and target
Source:
\.$Target:
[^.]$These expressions find all segment pairs that end with a period in the source text but not in the target. In the first expression, the
$signals the end of a string or segment, and the backslash (\) followed by the dot signals a literal period.In the second expression, the caret inside the set marks negation, so
[^.]indicates any character that is not a period.You can modify this expression to search for other punctuation marks. For example, the following regexes find all segments ending with a question mark in the source but not in the target:
- Source regex:
?$ - Target regex:
[^?]$
- Source regex:
- Backreference constructs for regular expressions
-
Backreferences reuse submatches identified earlier in the same regular expression (regex) or in the corresponding replacement regex. Backreferences are useful when you need to repeat a character sequence like
/(abc)(abc)(abc)in the same regex or across its correspondent replacement regex. Instead of copying the character group(abc)several times, you can reuse it by inserting backreferences to the original one.You can use both named and numbered backreferences in the following format:Syntax Regular expression Explanation \k<name>\number
when referencing a previous component of the same regex
(?<x> abc ) = \k<x>matchesabc=abc"(abc) = \1"matches"abc=abc"
${group name}$1
referencing a component of a corresponding regex
(?<TheBoy>Jack) and (?<TheGirl>Jill)matchesJack and Jillin the source segment [EN]:Jack and Jill went up that hill again!In the target regex, you can reuse the matches identified by the groups <TheBoy> and <TheGirl> with a named or numbered backreference:
${TheBoy} e ${TheGirl}$1 e $2
Both of these will match
Jack e Jillin the target segment [IT]:Jack e Jill salirono di nuovo su quella collina!
- Special characters for regular expressions
-
Metacharacters are the building blocks of regular expressions (regex). Characters in regex are understood to be either metacharacters with a special meaning, or regular characters with a literal meaning.
The following are some common regex metacharacters and examples of what strings they would match in a segment.
Metacharacter Description Regex example Match \Escape character. Cancels the special meaning of any metacharacter in this list that immediately follows the backslash and instead matches the literal character. "www\.rws\.com""www.netwrix.com"but not"www,rws,com".\bDefines a word boundary. \bstud"stud"and"studio"but not"tradosstudio".\wMatches any word character. \w"I""D""S""1","3"in "ID S1.3"\WMatches any no n-word character.
\W" " ,"." in "ID S1.3"\dMatches any digit character; equivalent to [0-9]Studio\d\d"Studio21"\DMatches any non-digit character; equivalent to [^0-9]Studio\D"Studio-"\sMatches any white-space character (a space, a tab, a line break or a form feed). Trados\sStudio"Trados Studio"and"Trados(tab)Studio"\SMatches any non-whitespace character. Studio\S"StudioT"and"Studio1"\rMatches a carriage return character. \tMatches a tab character. .Matches any single character except a newline character. Within square brackets, the dot is literal. For example: a.cmatchesabc, etc., but[a.c]matches onlya,.", orc."t.....""to RWS"in"Welcome to RWS".[ ]Creates a character class, which allows you to match any one of a set of characters that you specify. You can use
-to specify a range of characters. For example,[a-z]matches any single lowercase letter.[au]"a"in"Trados","u"in"Studio".^Matches the starting position of a segment. You also can match any character not in a given character class by adding
^to the beginning of a character class. For example,[^0-9]matches any character that is not a digit.^[^a-z0-9]Any character at the beginning of the segment that is not a lowercase letter and also not a digit. Make sure to enable the Case Sensitive option. Otherwise, the regular expression engine looks for strings that do not start with a letter or a digit.
*Matches the previous element zero or more times. \d*\.\d".0","19.9","219.9".?Matches the previous element zero or one time. It is useful for finding optional characters. colo?r"color""colour".+Matches the previous element one or more times. be+"bee"in"been""bent".|Matches any one element separated by the | vertical bar. th(e|is|at)"the","this"in"this is the day".{n}Matches the character to the left exactly ntimes.be{2}"bee"but not in"be"( )Creates a group and 'remembers' the matching section of the string. Groups can be used for backreferences or to extract a substring. Source: (?<TheBoy>Jack) and (?<TheGirl>Jill)Target: ${TheBoy} e ${TheGirl}
Jack and Jillin the source segment [EN]:Jack and Jill went up that hill again!In the target regex, you can reuse the matches identified by the groups <TheBoy> and <TheGirl> with a named or numbered backreference:
${TheBoy} e ${TheGirl}$1 e $2
Both of these will match
"Jack e Jill"in the target segment [IT]:Jack e Jill salirono di nuovo su quella collina!$- Substitutes the substring matched by group or group name.
- When used as the last character of a pattern, it anchors a match at the end of a string.
To match a literal
$, use\$or enclose it inside a character class, as in[$].< >Captures the matched subexpression into a named group. (?<double>\w)\k<double>"ee"in"deep"-Character range: Matches any single character in the range from first to last. [A-Z]"A""B"in"AB123" - Helpful resources
-
- RegEx buddy is an application that enables you to test, build, decode and debug regular expressions. It also includes a library of commonly used regular expressions.
- A competitive edge is a useful blog post about using regular expressions in Trados Studio.
- Regular expressions for beginners is a blog about getting started with regular expressions.
- Introducing regular expressions is an online book by Michael Fitzgerald explaining the fundamentals of regular expressions.
- Backreference constructs in regular expressions