Property Sheet data versus content
When content is stored in SDL Contenta, it is not changed.
For example, SDL Contenta may add or change an ID attribute of a tag so that it contains a unique identifier that can be used by down stream processing. When SDL Contenta does this it always sets these values using characters in the 00 to 7F byte range—in other words, characters that are represented by single byte(s) in both ISO and UTF-8, therefore avoiding any encoding mismatches.
Data that is extracted from content and stored as metadata/Property Sheet values must be handled differently. SDL Contenta Property Sheet fields are stored in database columns that are defined to store Unicode and all delivered applications (portal, API, etc) rely on the fact that data in properties are encoded as single byte values.
This means that all the tools that manage the import of data follow a common process for converting characters (if needed) so they can be added to Property Sheet fields as Unicode.
In practice, UTF-8 encoded data can and does exist in tag names, attribute names, attribute values, and content. There are restrictions regarding UTF-8 and the use of the Upload tools including Dynamic Import (DI) (see the Import and the API). At a high level, these restrictions include the following.
- Tag names must contain only single byte UTF-8 character encodings (7-bit ACII characters).
- Attribute names must follow the same rules as tag names.
- Attribute values can be anything and are converted per the algorithm described in Character Conversion Process for Property Sheets.
- There are no restrictions on content.