Documentation Center

RichEdit

This topic describes how SDL Passolo processes rich text.

The following notation is used:

LF = ASCII char 10 (noted as \n in C)

CR = ASCII char 13 (noted as \r in C)

A text file can have one of the following as end-of-line (EOL) markers:
  • CR LF
  • LF
  • CR

For example, we have this in a text file with EOL = CR LF:

<item>a
b\</item>

The item data will contain 5 chars:

a, CR, LF, b, \
	 

In SDL Passolo this is displayed in RichEdit as:

a
b\

because the text from RichEdit contains 8 chars:

a, \, r, CR, LF, b, \, \

This is caused by a combination of 2 issues:

  • Ambiguous protocol when data is exchanged between parser and RichEdit regarding EOL representation
  • Improper display of \ char and of non-printable chars

The steps are:

  1. The parser sends to Passolo this data:

    a, CR, LF, b, \

  2. The parser assumes that the EOL is CR LF
  3. But Passolo assume that: EOL is LF
  4. And in order to display non-printable chars will escape them: CR -> \, r and \ -> \, \

    obtaining:

    a, \, r, LF, b, \, \

  5. Then LF is converted to RichEdit's EOL, resulting: a, \, r, CR, LF, b, \, \
  6. Having CR LF as line break, it will be displayed as:

    a, \, r

    b, \, \

Because the most of chars in [0-31] range don't have visual representations (are not defined in windows fonts), Passolo escapes them as follows:

7 \a

8 \b

9 \t

10 \n or sometimes physical enter

11 \v

12 \f

13 \r

the rest are escaped as \nnn where nnn is the octal representation of character ascii's code

0 -> '\000'

14 -> '\012'

31 -> '\037'

doing so, the \ must be also escaped:

\ -> '

'

Current escaping method should be removed and replaced with an alternative as follows:

  • Char codes in range [0-4] must be forbidden and filtered by parsers
  • Char code 0 is C marker for end of string
  • Char code 1 is used in some places as separator for multiple strings
  • Chars with codes 2,3,4 are used in richedit as inline tags markers

The remaining chars in [5-31] range should be displayed in a similar way as inline tags are displayed.

Special chars:

unicode-char

tag 0-4 illegal

5 ENQ - can be entered as alt+num0+num5 (numpad digits)

6 ACK

7 BEL

8 BS

9 (→) U+2192 tab

10 (↵) U+21b5 enter - can be entered as ctrl+enter

11 VT

12 FF

13 not used

14 SO

15 SI

16 DLE

17 DC1

18 DC2

19 DC3

20 DC4

21 NAK

22 SYN

23 ETB

24 CAN

25 EM

26 SUB

27 ESC

28 FS

29 GS

30 RS

31 US

32 ' ' space -> (·) U+00B7 when show whitespaces is on

U+00B7 - · (MIDDOT) Middle Dot

U+00A0 - (NBSP) no-break space - can be entered as shift+ctl+space or alt+num2,num5,num5 U+200B - (ZWS) Zero Width Space - can be entered as ctrl+alt+space

U+200E - (LRM) Left-To-Right Mark U+200F - (RLM) Right-To-Left Mark

U+2039 - (LSAQUO) instead of ‹ Single Left-Pointing Angle Quotation Mark U+203a - (RSAQUO) instead of › Single Right-Pointing Angle Quotation Mark

U+2192 - → (RARR) Rightwards Arrow U+21b5 - ↵ (CRARR) Downwards Arrow with Corner Leftwards

The remaining chars can be displayed using special inline tags containing their ISO naming.