Documentation Center

How Studio counts words

General rules regarding word count.

Words

Each individual word is counted as one.

Example: I like apples and oranges. (word count = 5)

Abbreviations

If an abbreviation is listed in the Abbreviation list, it will be counted as one.

Example: He had one obvious flaw, i.e. his laziness. (word count = 8)

Acronyms

Acronyms will be considered as one if they are included in the abbreviation list.

Example: I need the document ASAP. (word count = 5)

URLs

URLs are counted as 1 if they are present in the Abreviation list.

Example: Read more on http://www.nydailynews.com/.

For more details, check the Abbreviations topic.

Dates

If the Dates list has specific language formats, then those dates will be counted as one.

Example: July 3, 2023 was the wedding date. (word count = 5)

Times

If the format of a time in the Times list is set for a particular language, it will be counted as 1.

Example: Discounts are available between 8 PM and 10 PM. (word count = 7)

For more details, check the Dates and times topic.

Variables

If a variable is included in the Variables list, it will be counted as 1.

Example: I want to visit New York. (word count = 5)

For more details, check the Variables topic.

Numbers

Numbers are counted as 1 if their format is configured in the Number Formats list. Special attention should be given to the decimal symbol.

Example: Revenue increased by 1.2 million. (word count = 5)

For more details, check the Numbers topic.

Measurements

If a measurement's format and symbols are configured in the Measurement Units, it will be counted as 1.

Example: I spent $20. (word count = 3)

Alphanumeric strings

Alphanumeric strings are counted as 1.

Example: Wedge bolt XF1Ç00iP.

Ideograms

Each ideogram is counted as one. Ideograms are characters that represent a concept without using the sounds that form the word. Examples include numerals and Chinese characters.

結論、分断した図が.(word count = 9)

Word count - specific rules

Apostrophes

The following apostrophes link tokens together as 1 word.

  • Apostrophe single quotation mark (')
  • Modifier Letter apostrophe (ʼ)
  • Armenian apostrophe (՚  )
  • Right single quotation mark ( ')
  • Fullwidth apostrophe (')

Example: It's a beautiful day. (word count = 4).

Hyphens and dashes

By default, hyphens and dashes link tokens together as 1 word.

Example: It is splash-proof! (word count = 3)

In the example above the word count is 3. Despite the fact that splash-proof is counted as 1 word when lookups are retrieved from a TM, the words splash and proof are considered as 2 words in order to increase the number of relevant fuzzy matches.

Character count - general rules

  1. Whitespace does not increase the count (including the standard non-break space character).
  2. Tags do not increase the count (though text enclosed in tag pairs will).
  3. All other characters count as 1.

Character count - special rules

The behavior of a character can vary based on how it is represented in a particular file format. For example, consider a document that contains the word "thecat" with a non-breaking hyphen between the words. If the document is saved as a text file using the Unicode non-break hyphen U+2011, the hyphen will be considered a single character in the resulting .sdlxliff file. As a result, the entire text will be counted as a single word, assuming the default word count settings are used where hyphenated words are counted as one word.

When a non-breaking hyphen is inserted in Word using the 'Insert Symbol' dialog, it is typically represented as a tag in the resulting .sdlxliff. As a result, it is not counted as a character and the word count is increased from one to two.

To determine how other specific characters behave in other specific file types (e.g.optionalhyphenin Word), create small test files for each and check the results of Studio reports.Tags are not counted as characters and they are featured on a separate column in Studio reports:

The following characters count as a single character:

  • Em dash
  • En dash
  • Two consecutive apostrophes. Characters considered to be apostrophes are the Unicode characters U+0027, U+02bc, U+055a, U+2019, U+ff07 (namely, 'ʼ՚'' ).