XML for Adobe InDesign

In our previous post we introduced a sample of an XML file. The text of the file served its purpose in illustrating a CALS table. But there remains the need to give a tip-of-the-iceburg explanation of what XML markup is all about.

XML MARKUP

XML (Extended Markup Language) is one of the ways that plain text can be marked up to identify its elements. The whole idea of a markup language is to surround text (or other elements) with an established code that identifies it. Similar to XML, the code identifies the various text elements of a document using terminology that describes what the element IS. Unlike HTmL, it separates document (data) from style.

If you are familiar with HTML, which is probably the best known markup language, you know that the tags that define a document element begin with a word enclosed in angle brackets (<>). For example: for text that will be styled the largest, the beginning tag is written <h1> with the ending tag written </h1>.

   <h1>This is a heading</h1>

With HTML all of the available tags are predefined and recognized by web browsers. How the text is styled depends on the browser (unless styling is defined by CSS).

XML is used for many aspects of document and web development. Because it identifies what the element is but not how it is to be styled, the same XML data can be used in many different types of presentations.

The XML tags that define the document elements are similar to HTML tags in that the tag names are placed inside angle brackets. The difference is that there is no predefined list of tags that need to be used. The tags can be defined as needed by the user (the eXendable part of eXtendable Markup Language). To establish a standard set of tags, publishers create a document known as a document type definition (DTD). One of the most used standards is XMLNews which is used for exchanging news and other information. Its root element is <nitf> and its tags include <head>, <body>, <headline>, <byline>, and <dateline>.

Although users can define their own tags for XML, there are a number of rules that define how XML tags can be written:

    • All XML elements must have a closing tag

<bodyText>Text to be used as body text.</bodyText>

  • Tags are case sensitive. The tag <head> is different from the tag <Head> Opening and closing tags must be written with the same case.
  • Element names must start with a letter or underscore
  • Element names cannot start with the letters xml (XML, or Xml, etc.)
  • Element names can contain letters, digits, hyphens, underscores, and periods
  • Element names cannot contain spaces

A common practice is to use the sam naming rule as used in the source database. With InDesign documents, paragraph styles are often created using a naming convention that conforms to the XML tags. We will be using this convention in the following example.

XML STRUCTURE

An XML document must have a single root element that opens and closes the document. It’s markup tag is often named <Root>. Within that element all other document elements must be properly nested to indicate child/parent relationships.

<Root>
   <h1>This is a headline</H1>
       <bodyText><bold><italic>This text is bold and italic</italic></bold></bodyText>
</Root>

Notice in the example above that both the beginning and ending tags for italic are inside (nested within) the beginning and ending tags for bold, and both are nested within the bodyText tags.

The XML file we will use for our demonstration is shown below. You may want to enter it into a plain text file to follow along. If you do, watch the beginning and ending tags to make sure they match in spelling as well as capitalization.

MAPPING STYLE TAGS

The interesting thing about working with Adobe InDesign is that you can import an XML file and associate its tags with the names of paragraph styles in the document. Paragraph styles can also be associatd with HTML tags used for exporting documents in HTML format.We will demonstrate importing XML with the following sample script. We will leave exporting as HTML for later.

INDESIGN TEMPLATE

Our sample script will create a document from a template that has paragraph styles named the same as the XML tags used in the XML file. It also has a predefined table style named “GrayHead”. The template will be an InDesign template (.indt) stored in a folder named Templates inside the InDesign application folder. (If you do not have administration privileges, you will need to use another location.) If you ar following along, make sure your template has the following styles:

Paragraph Styles:
Head, Text_10, Label, Table_Head, Table_Body.

Table Style: GrayHead.

Cell Styles: Head (for the first row), Body (for all cells except the left region and right region), LeftCol (for the left region), RightCol (for the right region).

How you define these styles is up to you.

SAMPLE SCRIPT

We will take the script one step at a time. In case you have problems we will post a download of the files on the AppleScript page of this web site.

Define Variables

Define variables that can be changed by the user at the top of a script. Later, when the script is working as designed, these can be replaced by a user interface using InDesign’s custom dialog.

set elementName to "table"
set tableStyleName to "GrayHead" 
set doLink to false --should XML file be linked

Have User Select Template

This part of the script will be taken care of (handled) by a handler getTemplate. The call to the handler places the result passed back by the handler into the variable templateRef. The handler uses a choose file command.

   --the call to the handler
   set templateRef to getTemplate()
   (*Template file selected by user is returned from handler; otherwise an error is thrown*)
    on getTemplate()
	tell application "Adobe InDesign CC 2015"
		set appPath to file path as string
	end tell
	set templatePath to (appPath & "Templates:") as alias
	set templateRef to (choose file with prompt "choose template for document" default location templatePath without invisibles)
	return templateRef
	error "User Cancelled"
    end getTemplate 

Establish Reference to XML File Chosen by User

The script will use a choose file command to have the user identify the XML file for import. Notice how both choose file commands incorporate a default location parameter. This requires an alias reference to the folder where the file will be found. For simplicity we used the path to the user’s desktop, but you may wish to use another path.

    --establish folder path for XML file
    set homePath to path to desktop from user domain
    --define XML file to import
    set fileRef to choose file with prompt "Select XML file for import" default location homePath without invisibles 

Create Document and Import XML

Once the script has a reference to the template and the XML file to place, the document is created. A reference to the document is placed in the variable docRef. The script then tells the document (docRef) to import the XML file using the reference to the XML file (fileRef)

    tell application "Adobe InDesign CC 2015"
        tell XML import preferences
		set import style to merge import
		set import CALS tables to false
		set repeat text elements to false
		set create link to XML to doLink
	    end tell
        set docRef to open templateRef
        tell docRef
           import XML from fileRef
        end tell
    end tell

If you run the script at this point and open the document created, you will see the structure of the XML file in Adobe’s Structure Pane (View > Structure > Show Structure). You will need to click on the small arrows next to the Root element and the to disclose the elements (and their contents) in the structure.

Root
   Head
   Text_10
   table  
        td
        td
        ...
   Text_10

Place XML on the Page

To place an XML structure on a page, there are a number of conventions that can be used in InDesign. Our example will use the XML root element as part of a place XML command. The statement places the imported XML to the first page of the document. As a consequence of the place XML statement, the XML will be placed in a text frame positioned at one-half inch from the left and top of the page (place point). A reference to the text frame created is placed in the variable frameRef.

    --after the import statement inside the tell docRef code block
       set rootElement to XML element 1
    end tell --ends tell docRef block
    tell page 1 of docRef
       set frameRef to place XML using rootElement place point (".5 in", ".5 in"}
    end tell 

Style Text

To style the document text, the script will associate paragraph style names with the text contained by the XML elements using the name of their tags. Another handler will accomplish this task. It takes advantage of the fact that the names of the paragraph styles in the document are the same as the XML tags.

Notice that the call to the handler begins with the reserved word my. This is required as the call is inside a tell statement to the application. The handler associates the paragraph styles to the XML element contents by creating an XML import map. Once the map is created, the script executes the map XML tags to styles method.

    --call mapTags handler passing a reference to the document
   my mapTags (docRef)
   (*Associates tags in document's list of XML tags with corresponding Paragraph styles
    Uses XML tag to style to map styles.*)
    on mapTags(docRef)
        tell application "Adobe InDesign CC 2015"
	    tell docRef
	        set tagList to XML tags
	        repeat with i from 1 to length of tagList
		    set tagName to name of item i of tagList
		    if (exists paragraph style tagName) then
		        set styleRef to paragraph style tagName
		        make XML import map with properties {mapped style:styleRef, markup tag:item i of tagList}
		    end if
	        end repeat
	        map XML tags to styles
	    end tell
        end tell
    end mapTags 

Create and Style the Table

All that is left for the script to do is to create and style the table, the script first needs to get a reference to the XML element whose markup tag is named “table”. For this we will use a handler to identify the element. It is important to note here that XML elements do not have a name property. XML elements are referenced in order as child elements in the XML structure. This is similar to items within a list (or list of lists). The following handler accomplishes this task by looping through the number of elements found in the first level of XML elements (child elements of Root). When the required element is found, the loop exits. If the Table element were to be nested at another level in the structure, the handler would need to be written much differently. Remember that the name to associate with our table XML element (elementName) and the name of the table style (tableStyleName) were defined at the top of the script.

    --place this code before the last end tell statement
    tell docRef
       set tableElement to my getXMLElement(rootElement, elementName)
    end tell
    set tableStyleRef to table style tableStyleName of docRef
    --get width of table's container to set width of the table
    copy geometric bounds of frameRef to {fy0, fx0, fy1, fx1}
    set frameWidth to fx1 - fx0
    --convert the contents of the XML "table" element to a table
    tell tableElement to convert element to table row tag "tr" cell tag "td"
    tell tables of frameRef
        set row type of row 1 to header row
        set applied table style to tableStyleRef
        set width to frameWidth
        tell cells to clear cell style overrides
    end tell
    (*Returns reference to XML element tagged with value of variable elementName*)
    on getXMLElement(rootElement, elementName)
   	set foundElement to missing value
	tell application "Adobe InDesign CC 2015"
	    repeat with i from 1 to count of XML elements of rootElement
		if name of markup tag of XML element i of rootElement is elementName then
		    set foundElement to XML element i of rootElement
		    exit repeat
		end if
	    end repeat
	    if foundElement = missing value then
		error "Element " & elementName & " was not found"
	    else
		return foundElement
	    end if
	end tell
    end getXMLElement 

Final document

ONWARD AND UPWARD

Now that you have the pieces to the puzzle, see if you can put them together to create a real working script. Be sure to add a try/on error statement block to catch errors that will occur if the user clicks Cancel in response to the choose file methods. You will also need to trap the error generated in the getXMLElement handler (if the XML element is not found).

Yes, scripts such as this can get a little involved, but if you let handlers take care of commonly used functionality, your efforts will be amply rewarded as you are able to use these handlers for any number of scripts. When working with databases, nothing beats working with XML (except maybe JSON but that works with JavaScript and is not supported by InDesign). If you do have data stored as JSON, you can use JavaScript (and a number of other scripting languages) to convert it to XML.