Documentation Center

Specifying regular expressions for the Advanced Display Filter

Regular expressions (regex) are powerful search formulas that can locate complex patterns of characters inside texts. In Trados Studio you can use regular expressions to filter on segments that match a certain pattern. Trados Studio uses the .NET syntax for regular expressions.

About this task

To specify a regular expression for the Source or Target field of the Advanced Display Filter:

Procedure

  1. In the Editor view, open the View tab and select Advanced Display Filter 2.0.
    To always display the Advanced Display Filter 2.0 window, select the Auto-hide button.
  2. Open the Content tab, and enter your regular expressions in the Source and/or Target field.
  3. From the drop-down menu on the right, select AND or OR to combine the Source and Target search filters. While AND is a stricter filter that displays only the segments that match both the Source and Target criteria, the OR condition returns segments that match either the Source or the Target criteria.
  4. Enable the Regular Expression option. Otherwise, Trados Studio interprets the character patterns in the Content Source and Target fields as literal characters.
  5. If your source or target RegEx contains backreferences, activate the Backreference option. Trados Studio supports named and numbered backreferences in the following format: ${group name} and $1.
  6. In the DSI Information field, specify whether you want to restrict the results to specific segment types. For example, type H or heading to show only Title segments.
  7. To filter on segment results by tag content, enable either Search in text and tag content or Search only in tag content. The first option looks for tag attribute content as well as translatable content inside tags, whereas the second option only looks for tag attribute content like font=Curier New.
  8. Turn on the Case Sensitive option if you wish to match the case of the literal character in your regular expression. By default, Studio ignores case-sensitivity. For example, the RegEx ^T will return segments starting with both a lowercase and an uppercase T.
  9. Apply any additional filters available on the Filter Attributes, Comments, Document Structure, Segment, Color, Sampling tabs.
    A checkmark icon appears next to each tab that contains additional filters considered for the current search. You can see the applied filters and search results in the status bar at the bottom of the Advanced Display Filter.

Example

Examples of regular expressions
ResultRegular expressionExplanation

Display all segments with different capitalization in source and target

Source: ^[A-Z]

Target:^[a-z]

Make sure to enable the Case Sensitive option. Otherwise, the regular expression engine finds both lowercase and uppercase strings for both patterns.

The ^ caret symbol signals the beginning of a segment.

[A-Z] describes the range of all uppercase letters, while [a-z] describes the range of all lowercase letters.

Display all segments with different end punctuation in the source and target

Source: \.$

Target:[^.]$

These expressions find all segment pairs that end with a period in the source text but not in the target.

In the first expression, the $ signals the end of a string or segment, and the backslash (\) followed by the dot signals a literal period.

In the second expression, the caret inside the set marks negation, so [^.] indicates any character that is not a period.

You can modify this expression to search for other punctuation marks. For example, the following regexes find all segments ending with a question mark in the source but not in the target:

  • Source regex: ?$
  • Target regex: [^?]$
Backreference constructs for regular expressions

Backreferences reuse submatches identified earlier in the same regular expression (regex) or in the corresponding replacement regex. Backreferences are useful when you need to repeat a character sequence like /(abc)(abc)(abc) in the same regex or across its correspondent replacement regex. Instead of copying the character group (abc) several times, you can reuse it by inserting backreferences to the original one.

You can use both named and numbered backreferences in the following format:
SyntaxRegular expressionExplanation
  • \k<name>
  • \number

when referencing a previous component of the same regex

  • (?<x> abc ) = \k<x> matches abc=abc
  • "(abc) = \1" matches "abc=abc"
  • ${group name}
  • $1

referencing a component of a corresponding regex

(?<TheBoy>Jack) and (?<TheGirl>Jill) matches Jack and Jill in the source segment [EN]: Jack and Jill went up that hill again!

In the target regex, you can reuse the matches identified by the groups <TheBoy> and <TheGirl> with a named or numbered backreference:

  • ${TheBoy} e ${TheGirl}
  • $1 e $2

Both of these will match Jack e Jill in the target segment [IT]: Jack e Jill salirono di nuovo su quella collina!

Special characters for regular expressions

Metacharacters are the building blocks of regular expressions (regex). Characters in regex are understood to be either metacharacters with a special meaning, or regular characters with a literal meaning.

The following are some common regex metacharacters and examples of what strings they would match in a segment.

MetacharacterDescriptionRegex exampleMatch
\Escape character. Cancels the special meaning of any metacharacter in this list that immediately follows the backslash and instead matches the literal character."www\.rws\.com" "www.netwrix.com" but not "www,rws,com".
\bDefines a word boundary. \bstud"stud" and "studio" but not "tradosstudio".
\wMatches any word character. \w "I""D" "S" "1", "3" in "ID S1.3"
\WMatches any no

n-word character.

\W" " ,"." in "ID S1.3"
\dMatches any digit character; equivalent to [0-9] Studio\d\d"Studio21"
\DMatches any non-digit character; equivalent to [^0-9] Studio\D"Studio-"
\sMatches any white-space character (a space, a tab, a line break or a form feed).Trados\sStudio"Trados Studio" and "Trados (tab) Studio"
\SMatches any non-whitespace character.Studio\S"StudioT" and "Studio1"
\rMatches a carriage return character.
\tMatches a tab character.
.Matches any single character except a newline character. Within square brackets, the dot is literal. For example: a.c matches abc , etc., but [a.c] matches only a, . ", or c ."t.....""to RWS" in "Welcome to RWS".
[ ]Creates a character class, which allows you to match any one of a set of characters that you specify.

You can use- to specify a range of characters. For example, [a-z] matches any single lowercase letter.

[au]"a" in "Trados" , "u"in "Studio".
^Matches the starting position of a segment.

You also can match any character not in a given character class by adding ^ to the beginning of a character class. For example, [^0-9] matches any character that is not a digit.

^[^a-z0-9]Any character at the beginning of the segment that is not a lowercase letter and also not a digit.

Make sure to enable the Case Sensitive option. Otherwise, the regular expression engine looks for strings that do not start with a letter or a digit.

*Matches the previous element zero or more times.\d*\.\d ".0" ,"19.9" ,"219.9" .
?Matches the previous element zero or one time. It is useful for finding optional characters.colo?r "color""colour".
+Matches the previous element one or more times.be+"bee" in "been" "bent".
|Matches any one element separated by the | vertical bar.th(e|is|at) "the" ,"this" in "this is the day".
{n}Matches the character to the left exactly n times.be{2}"bee" but not in "be"
( )Creates a group and 'remembers' the matching section of the string. Groups can be used for backreferences or to extract a substring.

Source: (?<TheBoy>Jack) and (?<TheGirl>Jill)

Target: ${TheBoy} e ${TheGirl}

Jack and Jill in the source segment [EN]: Jack and Jill went up that hill again!

In the target regex, you can reuse the matches identified by the groups <TheBoy> and <TheGirl> with a named or numbered backreference:

  • ${TheBoy} e ${TheGirl}
  • $1 e $2

Both of these will match "Jack e Jill" in the target segment [IT]: Jack e Jill salirono di nuovo su quella collina!

$
  • Substitutes the substring matched by group or group name.
  • When used as the last character of a pattern, it anchors a match at the end of a string.

To match a literal $, use \$ or enclose it inside a character class, as in [$].

< >Captures the matched subexpression into a named group.(?<double>\w)\k<double> "ee" in "deep"
-Character range: Matches any single character in the range from first to last.[A-Z] "A" "B" in "AB123"
Helpful resources