S1000D Upload
S1000D Upload is comprised of Java-based applications that are used to drag-and-drop content into a browser window.
These applications must process the data, identify links and references, and resolve ambiguity before the content can be imported.
During the drag-and-drop process, users can place any content type for upload, typically S1000D modules and binary graphics, into the S1000D Upload browser interface.
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml version="1.0" encoding="UTF-8"?>
If there is no declaration, the file is assumed to be UTF-8 (the default per the XML standard) and accepted if the byte order mark is present. Any other declaration is rejected.
<?xml version="1.0" encoding="UTF-16"?>
<?xml version="1.0" encoding="ISO-8859-2"?>
If there is an encoding mismatch where the XML declaration indicates one encoding but the actual data is encoded differently, errors such as the following are displayed during Upload.
File names have the same character restrictions as previously noted for the Name field.
| If the data is encoded as: | But the Declaration is: | The error message displayed is: |
|---|---|---|
| ISO-8859-1 | <?xml version="1.0" encoding="UTF-8"?> | An invalid character was found in text content. |
|
| ||
| UTF-8 | <?xml version="1.0" encoding="ISO-8859-1"?> | Switch from current encoding to specified encoding not supported. |
|
| ||
The following checks for encodings and encoding mismatches describes how the upload tool handles them. In general, the S1000D Upload tool captures the encoding mismatches if they exist and presents information to the user.
The following are examples of files that pass the encoding mismatch testing.
| File | Results |
|---|---|
| ISO encoded data with an ISO declaration.xml | Uploads without error |
| UTF-8 encoded data with NO declaration but with a UTF-8 BOM.xml | Because there is no declaration, upload treats it as UTF-8 encoding which can be determined by the byte order mark (BOM) and uploads without error. |
| UTF-8 encoded data with a UTF-8 declaration and with a UTF-8 BOM..xml | Uploads without error |
| UTF-8 encoded data with an ISO declaration and no BOM.xml | Because there is no byte order mark there is no way to detect what the encoding is. The tool cannot distinguish the data as UTF-8 when there is no byte order mark and as such, each byte is treated as a single valid ISO character. All subsequent tools display each byte as its own character. Without a byte order mark, it is incumbent on the user to ensure the declaration accurately portrays the actual encoding. A file of this type uploads without error since there is no way to tell that the user really wants UTF-8 encoding. |
The following are examples of files that would not be uploaded because of encoding mismatches. The following files would fail upload with the error message noted.
| File | Results/Error Message |
|---|---|
| ISO encoded data with no declaration.xml | This file fails because the default encoding for XML, when a declaration is not present, is UTF-8. If this file contains at least one single byte ISO character that would be encoded with two bytes in UTF-8 (above 0x7F) then this file fails with the message: An Invalid character was found in text context. If however, all bytes are 7F and below then the file is uploaded because all bytes are valid UTF-8. |
| ISO encoded data with a UTF-8 declaration.xml | This file fails because UTF-8 is the indicated encoding but the actual encoding is ISO-8859-1. If this file contains at least one single byte ISO character that would be encoded with two bytes in UTF-8 (above 0x7F) then this file fails with the message: An Invalid character was found in text context. If however, all bytes are 7F and below then this file is uploaded because all bytes are valid UTF-8. |
| UTF-8 encoded data with an ISO declaration and with a UTF-8 BOM and all characters are inside the ISO range.xml | This file fails because it is a UTF-8 encoded file with an ISO declaration. While it does contain characters that are all inside the ISO range it has a byte order mark that indicates it is UTF-8. Switch from current encoding to specified encoding is not supported. |
| UTF-8 encoded data with an ISO declaration and with a UTF-8 BOM and some characters are outside the ISO range.xml | This file fails because it is a UTF-8 encoded file with an ISO declaration and it contains characters that are outside the ISO range. It also has a byte order mark that indicates it is UTF-8. Switch from current encoding to specified encoding is not supported. |
UCS-2 BE encoded data with a UTF-8 declaration and a UCS-2 BOM.xml UCS-2 LE encoded data with a UCS-2 declaration and a UCS-2 BOM.xml UCS-2 LE encoded data with a UTF-8 declaration and a UCS-2 BOM xml | These are three types of files that fail because they are encoded as something other than UTF-8 or ISO-8859-1. This example shows USC-2 encodings but the same rules apply for all other encodings including UTF-16, UTF-32 etc. In all cases these encoded files should have a byte order mark and as such identify the encoding type regardless of the declaration. Switch from current encoding to specified encoding is not supported. |
If the file being uploaded has no XML declaration, the file itself is also assumed to be encoded as UTF-8. Therefore, during the S1000D Upload “well-formed” check, the following behaviors occur.
| File with no XML Declaration | Results/Error Message |
|---|---|
| If the file contents are encoded as UTF-8 | Upload reports Well Formed since the file encoding matches the expected encoding (UTF- 8). |
| If the file contents are encoded as ISO- 8859-1 and contain any non-ASCII characters | Upload reports NOT Well Formed - An invalid character was found in text content. This is expected since the file has no XML declaration but the file is not UTF-8 (the default encoding). |
| If the file contents are ASCII only | Upload reports Well Formed. |
Only UTF-8 and ISO-8859 encodings are supported. For example, UTF-16 encoded files are handled as follows.
| File | Results/Error Message |
|---|---|
| If the file contents are encoded as UTF-16 | Upload reports Encoding not supported. UTF- 16 is not supported for the current version. This is expected since UTF-16 is not supported. |
In Contenta S1000D 4.0 and later, the encoding declaration is preserved in the data. In AppData (accessed from Contenta Explorer via , under), each S1000D document type should have a value named XMLDECL which should be set to either <?xml version="1.0" encoding="utf-8"?> or <?xml version="1.0" encoding="iso- 8859-1"?>. If this value is not set, S1000D Upload assumes that the expected encoding in AppData is UTF-8. Similarly, if a file being uploaded has no encoding attribute in its XML declaration, S1000D Upload assumes that the file encoding is UTF-8. In all cases, during the well-formed check S1000D Upload compares the file encoding to the expected encoding in AppData. If S1000D Upload finds a mismatch between these two encodings, the following warning message is displayed:
Well-formed
Warning: XML file encoding does not match expected encoding in Contenta. See Contenta S1000D documentation.