Documentation Center

CSV Conventions

There is no formal internationally recognized standard for CSV format. Because of this, SDL uses the de facto standard developed by Microsoft Corporation.

A description of this format follows and can be found on the Web at http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm.

CSV records have the following conventions:

  • Each record is one line ...but

    A record separator may consist of a line feed (ASCII/LF=0x0A), or a carriage return and line feed pair (ASCII/CRLF=0x0D 0x0A).

    ..but: fields may contain embedded line-breaks (see below) so a record may span more than one line.

  • Fields are separated with commas

    Example: John,Doe,120 Main St.,"Anytown, WW",08123

  • Leading and trailing space-characters adjacent to comma field separators are ignored

    So, John , Doe ,... resolves to "John" and "Doe", etc. Space characters can be spaces or tabs.

  • Fields with embedded commas must be delimited with double-quote characters

    In the above example. "Anytown, WW" had to be delimited in double quotes because it contained an embedded comma.

  • If a field contains a double-quote character, surround the entire field with double quotes and represent the original double quote with two consecutive double quotes.

    So, John "Lefty" Doe would convert to "John ""Lefty"" Doe", 120 Main St.,...

  • A field that contains embedded line-breaks must be surrounded by double-quotes

    So:

    Field 1: Conference room 1 Field 2: John, Please bring the M. Mathers file for review. -J.L. Field 3: 10/18/2002  ...

    would convert to:

    Conference room 1, "John,     Please bring the M. Mathers file for review -J.L. ",10/18/2002,...

    Note that this is a single CSV record, even though it takes up more than one line in the CSV file. This works because the line breaks are embedded inside the double quotes of the field.

  • Fields with leading or trailing spaces must be delimited with double-quote characters

    So to preserve the leading and trailing spaces around the last name above:

    John ," Doe ",...

  • Fields may always be delimited with double quotes

    The delimiters will always be discarded.

  • The first record in a CSV file may be a header record containing column (field) names

    There is no mechanism for automatically discerning if the first record is a header row, so in the general case, this will have to be provided by an outside process (such as prompting the user). The header row is encoded just like any other CSV record in accordance with the rules above. A header row for the multi-line example above, might be:

    Location, Notes, "Start Date", ...