Documentation Center

Recognizing and capturing text in different scripts

When you use the mouse action, you can capture text from the screen even if that text is just a picture - a photograph for example. MultiTerm Widget does optical character recognition (OCR) of characters on the screen.

About this task

You can improve the recognition of characters written in other languages and scripts by downloading and installing language-specific training data.

The widget factory settings are optimized to recognize English characters. Because English does not usually include some characters (for example, á a ä â), the software may not recognize them correctly without training.

Training data provides the widget software with a different set of reference characters. Google's tesseract-ocr page has suitable training data for many languages and scripts.

Procedure

  1. In your browser, go to Tesseract documentation.
  2. Download the appropriate training package as described on that page. For example, for German, download deu.traineddata.
  3. Rename the file to generic.traineddata.
  4. Ensure that MultiTerm Widget is stopped.
  5. In Windows Explorer, navigate to the MultiTerm Widget installation folder. Usually this is %programfiles%\Trados\MultiTerm\MultiTerm18.
  6. Rename the existing generic.traineddata file (for example, to generic.traineddata.save), and copy the downloaded generic.traineddata file to this folder.
  7. Restart MultiTerm Widget.