Split

When a user selects Split from the Activity menu, Dynamic Import uses a map file to split an SGML or XML document into fragments.

Use the command line for the Split activity if you want to:

  • execute the Split activity from the command line using an existing map file
  • create a script to split a batch of documents—for example as a system scheduled event

The syntax is:

perl directory_path/di.pl arguments

The required syntax and arguments for the di.pl split activity are:

Syntax/ArgumentDescription
perl Perl executable. If Perl is not included in your PATH variable, specify the complete path to the Perl program.
directory_path/di.pl Specify the directory path (directory_path) for the file (di.pl) to execute.
-c mapfile.mapThe full path and name of the map file.

The optional arguments for the di.pl split activity are listed and described below. If adding these arguments to the digui.cfg file, add to the split_args string.

ArgumentDescription
--cddocdir pathRequired for Contenta Web DI. Used by Contenta Web clients to instruct DI to change to the path where the SGML document is located; specified by PATH. DI returns to the inherited path after locating the SGML document.
-config file.cfg Optional. Use to specify an alternate or custom Dynamic Import configuration file to use where filename is the name of the file (filename only, no path).

The custom configuration file must be located in the Contenta_home/encaps/di directory.

The custom configuration file must satisfy the required contents of the DI configuration file. It is recommended that you start your custom modifications with a copy of the delivered configuration file.

If the -config switch is not used, the default is digui.cfg.

-custom packageUse to specify an alternate custom callback (CustomCallbacks.pm) module where package is the case-sensitive name of the module that replaces CustomCallbacks.pm—for example, DITACustomCallbacks.pm or FrameBookCustomCallbacks.pm.

The custom callback module must be located in the Contenta_home/encaps/di/custom directory.

If not specified, the default is CustomCallbacks.pm.

-d Debug. Run DI in debug mode. If switch is not present, debug mode is off.
-doctype nameUse to override the document type (doctype); where name is new doctype.
-e ErrFileFull path and name of the file for logging error messages.

The default is ../encaps/di/errs.

-gGUI. Use to indicate to di.pl that it is being executed from digui.pl and that it needs to display messages formatted for that Tk display.

This switch should be specified in the digui.cgf file only. Add to the switches for the di.pl program. Only for use when DI is executed with a GUI.

-exceptfile path/fileException File. Use with the -pentexcept argument to specify a file entity exception file. Use when you want DI to use a file other than the default (di_except.ent) file.

Enter the full path and filename of the exception file.

-igIgnore Global. Use with WEB DI (only) to specify to ignore missing parameters in the global fields of the map file—for example, ignore missing parameter for password.
-ascii asciiUse this option if you want Dynamic Import to ignore non-ASCII characters in object names and Property Sheet data.
-neUse to remove the (&ent;) SGML entities from property sheet fields and attributes.
nodirextsNo directory extensions. Use to indicate that a period in the folder name should not be processed as a filename extension. The period is then used as a character in the filename. This switch is for File Import only.

For example, a folder named flower.photo is imported into an object that is named flower.photo

-nopeWhen used, this flag turns off entity protection. Do not use with the -protent or -pentexcept argument. That is, do not use when the di_except.ent file is used.
-nuSuppress Contenta unique naming. di.pl always creates unique object names. This switch removes the suffix as the drive file is being created.
-o dir_path The full path of the XML DriveFile generated by the split activity. Use to override the default DriveFile path.
-p PCM_PathThe full import path in SDL Contenta. This is the Contenta path to the container where the document is to reside after import. Use to override PCMLOADLOC in the map file. (This is the hierarchical path in Contenta Explorer.)
-pentexceptProtect Entities Except. Use to indicate entity protection with exception handling.

When used alone the di_except.ent file is read to determine which entities are to be expanded. When used with the -exceptfile argument, the file indicated with -exceptfile is read to determine which entities are to be expanded. di_except.ent is ignored.

-protentProtect Entities. Universal entity protection; always allows normal entity expansion using production DTD provision then always collapses the entity expansion to &entity notation, deletes the expanded text before import.

Does not apply to entities in attribute values.

Do not use with the -nope switch or when the di_except.ent file is used.

-replaceIf a container object in the DriveFile already exists in SDL Contenta, don't create it again, use the one that exists. If not indicated, the container object is created even if one already exists. Use of the switch also results in replacement of existing leaf objects—if the leaf object does not exist, it is created; if it already exists, it is replaced.

Used for File Import only.

-rootdir dir_path Use to specify an alternate File Import source file directory.
-s source.docThe full path to the SGML or XML source document.

Use to override the Source field in the map file.

-schemaIndicates that the XML document has a schema (not a DTD).
-splitdir dir_path Use to specify an alternate Dynamic Import split directory.
-t [0,1,2,3,4]Use this flag to specify how to handle object and/or folder names as follows:
  • 0 = use all lowercase.
  • 1 = use all upper case.
  • 2 = do not change case. Converts spaces in names to underscores.
  • 3 = object names are case sensitive as received from the data. Embedded spaces are not stripped.
  • 4 = periods are allowed in object names. When not specified, the periods in the file names are changed to underscores upon import. This switch is for File Import only.
-validateValidates that all slices (fragments) in the split directory are in the DriveFile. Validates that no data is lost during the split process. Validation is made at the end of the split process, before the di.pl program exits.

When validation fails, the di.pl application exits with a -58 error, and the message that the Slice Validation Failed with a list of the missing fragments. The error message(s) can be viewed in the DI interface and/or in the error log file.

If this argument is not specified and fragments are missing, the data is silently lost; no errors are detected or reported.

-welfWhen used, this flag indicates that the present version of Omnimark is 7.0 or higher and that the XML source file is well-formed and lacks a DTD or schema.
-sgmlencoding encodingOptional

Use to specify the character encoding of the source SGML file.

If this value is set to ASCII, the encoding defaults to UTF-8. If this argument is omitted or left empty (the default), the program attempts to determine the encoding of the source content. If the encoding cannot be determined, the encoding defaults to UTF-8.

This argument is ignored for XML data.

On Windows systems you can display Help for di.pl by typing the following at a command prompt:

perl di.pl

Example:

Sample output:
Usage: perl di.pl

-c MapFile (Required)

-e ErrFile (Defaults to ./encaps/di/errs)

-d (Debug)

-replace (ReplaceMode)

-doctype (doctype override)

-config <configfilepath> (alternate digui.cfg file)

-p PCMLoadLoc

-o DriveFile

-s SgmlSource

-nu (No Uniquification)

-ne (Strip Entities in Obj Names)

-g (DIGui.pl Only)

-m Generate-Levels

-caterrs (Combine Omnilog and ErrorFile)

-splitdir OverrideSplitDir

-rootdir OverrideRootDir

-t CaseModeValue (0,1,2)

-welf (Omni 7.x XML WellFormed)

-protent (General Entity Expansion/DeExpansion)

-pentexcept (Entity Protect with Exceptions)

-exceptfile Override Entity Exception FileSpec

-cddocdir (chdir into Document Dir)

-ig (WebClient Only)

-nope (no entity protection - legacy)

-validate (check that all slices are in drivefile)

-ascii (ignore non-ASCII characters from name and propertysheet data)

-custom CustomPackageName

-lib OmniLib-Override