Documentation Center

Property Sheet data versus content

When content is stored in SDL Contenta, it is not changed.

Whether you are importing a UTF-8 encoded file, an ISO-8859-1 file, or a binary graphic, the data is stored in the system exactly as it is received (see exception in the note below). SDL Contenta never removes or alters content in any way. However in some cases, SDL Contenta may add to the content.

For example, SDL Contenta may add or change an ID attribute of a tag so that it contains a unique identifier that can be used by down stream processing. When SDL Contenta does this it always sets these values using characters in the 00 to 7F byte range—in other words, characters that are represented by single byte(s) in both ISO and UTF-8, therefore avoiding any encoding mismatches.

Data that is extracted from content and stored as metadata/Property Sheet values must be handled differently. SDL Contenta Property Sheet fields are stored in database columns that are defined to store Unicode and all delivered applications (portal, API, etc) rely on the fact that data in properties are encoded as single byte values.

This means that all the tools that manage the import of data follow a common process for converting characters (if needed) so they can be added to Property Sheet fields as Unicode.

In practice, UTF-8 encoded data can and does exist in tag names, attribute names, attribute values, and content. There are restrictions regarding UTF-8 and the use of the Upload tools including Dynamic Import (DI) (see the Import and the API). At a high level, these restrictions include the following.

  • Tag names must contain only single byte UTF-8 character encodings (7-bit ACII characters).
  • Attribute names must follow the same rules as tag names.
  • Attribute values can be anything and are converted per the algorithm described in Character Conversion Process for Property Sheets.
  • There are no restrictions on content.