RichEdit
This topic describes how SDL Passolo processes rich text.
The following notation is used:
LF = ASCII char 10 (noted as \n in C)
CR = ASCII char 13 (noted as \r in C)
- CR LF
- LF
- CR
For example, we have this in a text file with EOL = CR LF:
<item>a
b\</item>
The item data will contain 5 chars:
a, CR, LF, b, \
In SDL Passolo this is displayed in RichEdit as:
a
b\
because the text from RichEdit contains 8 chars:
a, \, r, CR, LF, b, \, \
This is caused by a combination of 2 issues:
- Ambiguous protocol when data is exchanged between parser and RichEdit regarding EOL representation
- Improper display of \ char and of non-printable chars
The steps are:
- The parser sends to Passolo this data:
a, CR, LF, b, \
- The parser assumes that the EOL is CR LF
- But Passolo assume that: EOL is LF
- And in order to display non-printable chars will escape them: CR -> \, r and \ -> \, \
obtaining:
a, \, r, LF, b, \, \
- Then LF is converted to RichEdit's EOL, resulting: a, \, r, CR, LF, b, \, \
- Having CR LF as line break, it will be displayed as:
a, \, r
b, \, \
Because the most of chars in [0-31] range don't have visual representations (are not defined in windows fonts), Passolo escapes them as follows:
7 \a
8 \b
9 \t
10 \n or sometimes physical enter
11 \v
12 \f
13 \r
the rest are escaped as \nnn where nnn is the octal representation of character ascii's code
0 -> '\000'
14 -> '\012'
31 -> '\037'
doing so, the \ must be also escaped:
\ -> '
'
Current escaping method should be removed and replaced with an alternative as follows:
- Char codes in range [0-4] must be forbidden and filtered by parsers
- Char code 0 is C marker for end of string
- Char code 1 is used in some places as separator for multiple strings
- Chars with codes 2,3,4 are used in richedit as inline tags markers
The remaining chars in [5-31] range should be displayed in a similar way as inline tags are displayed.
Special chars:
unicode-char
tag 0-4 illegal
5 ENQ - can be entered as alt+num0+num5 (numpad digits)
6 ACK
7 BEL
8 BS
9 (→) U+2192 tab
10 (↵) U+21b5 enter - can be entered as ctrl+enter
11 VT
12 FF
13 not used
14 SO
15 SI
16 DLE
17 DC1
18 DC2
19 DC3
20 DC4
21 NAK
22 SYN
23 ETB
24 CAN
25 EM
26 SUB
27 ESC
28 FS
29 GS
30 RS
31 US
32 ' ' space -> (·) U+00B7 when show whitespaces is on
U+00B7 - · (MIDDOT) Middle Dot
U+00A0 - (NBSP) no-break space - can be entered as shift+ctl+space or alt+num2,num5,num5 U+200B - (ZWS) Zero Width Space - can be entered as ctrl+alt+space
U+200E - (LRM) Left-To-Right Mark U+200F - (RLM) Right-To-Left Mark
U+2039 - (LSAQUO) instead of ‹ Single Left-Pointing Angle Quotation Mark U+203a - (RSAQUO) instead of › Single Right-Pointing Angle Quotation Mark
U+2192 - → (RARR) Rightwards Arrow U+21b5 - ↵ (CRARR) Downwards Arrow with Corner Leftwards
The remaining chars can be displayed using special inline tags containing their ISO naming.