WORKING WITH GOOGLE DOCS

I have always been an avid supporter of InCopy. I love its workflow options with InDesign. But over the past year or so, I have been presented with a number of workflow situations that involved Google Docs.Google Docs is cool and it’s free. What’s more, people are using it. So I decided I had better ramp up on this side of the publishing equation. And, believe me, I’m still working on it and don’t pretend to have all the answers.

Recently I had the opportunity to work on a book that was written in Google Docs. In the process of working on the book, I experimented with a few ideas to automate processes with scripts. Some ideas I felt were worth sharing.

UNDER THE COVERS

It looks like Google Docs is HTML (or XML) pure and simple; and it’s live. That is the fun part: You never “save” your document.

If you have the opportunity, export a document (File > Download as Web Page > HTML, zipped). Unzip the download and open the HTML file in a text editor. This is not for the “Learn HTML in 24 Hours” student. But with patience, and a mental filter that blocks out the “spans” before you go blind, you can get an idea of how Google Docs does it–at least for HTML.

The neat thing about the HTML download is that all of the inline images get downloaded into their own separate folder. So, no matter what you do next with the document, you have the images. But, of course, the user most likely did not optimized the images (resampled to the correct dpi, cropped, or prcessed otherwise).

It would be nice if you could just change the file type designation for the file downloaded (change HTML to XML XML) and simply import it into InDesign. According to many articles you find on the web, it should just “work.”

Problem is, File > Import XML… will probably raise an error. Getting a message like “Expected end of tag ‘meta’ Line:3, Column 4574” makes this process look a little unfavorable. Even for those who know HTML, I don’t think finding where character 4574 in the third line of text would be something even the most astute HTML jockey would want to tackle. (Maybe this person also has the tools that will spot and fix the errors.)

Short of this, it is possible for a small shop to put together a fairly nice workflow using a script or two.

A SCRIPTED SOLUTION

This is what I did for my book project.

  1. Make sure the Google Doc is formatted correctly:
    • paragraph styles are used exclusively for styling
    • fonts designated for the styles were installed on my computer
  2. Below each image in the Doc, the name of the image file was entered between square brackets as in: [image01.jpg]
  3. Document was downloaded as Web Page (html, zipped)
  4. Document was also downloaded as Microsoft Word (docx)

Book Template

A template was created in InDesign for the final document.

  • Paragraph styles named to conform to style names in the Google document
  • Object styles created for images.
    • Images will be anchored to text, so anchoring was part of object style.
    • several styles were created to automate image placement (alignment, top of page or inside text, etc.).
  • Template and styles were tested using some dummy text and images.

READY TO ROCK

A document was created from the template (I used my “docFromTemplate” script).

The document download as Microsoft Word was imported into InDesign. For this there are two options:

Manual Import

Make sure Show Import Options is checked in the system Open dialog (File > Open in InDesign).

Set import options as shown below, or use your own (or maybe a Preset).

Setting Microsoft Word import options

If the styles used in the Google Doc don’t match the styles used in your template, make sure to check Custom Style: Import Style Mapping… so you can map the styles.

 Mapping styles in InDesign

Import Using a Script

I decided to use a script. From my AppleScript “PlaceMicrosoftWord_Style” script the handler that takes care of import preferences is shown below:

on setMWordPrefs()
  tell application "Adobe InDesign CC 2015"
    tell word RTF import preferences
      set convert bullets and numbers to text to false
      set remove formatting to false
      --set convert tables to Unformatted Table
      set import endnotes to true
      set import footnotes to true
      set import index to true
      set import TOC to true
      set import unused styles to false
      set preserve graphics to false
      set preserve local overrides to false
      set resolve character style clash to resolve clash use existing
      set resolve paragraph style clash to resolve clash use existing
      set use typographers quotes to true
    end tell
  end tell
end setMWordPrefs

The idea here is that you don’t want the images to import with the document. It is better to make sure the images are optimized before they get placed. If you have set up the styles correctly in InDesign and you mapped the style names to those in Google Docs, the text for the document should end up looking near to perfect.

The dread Missing Fonts dialog

Chances are, even with all your preparation, you may get the Missing Fonts dialog when you import. You could have your script disable this dialog, but it is a good idea to take note of what is reported missing. Close the dialog if the missing fonts are not part of the original Google Doc styles. You will probably need to do some tweaking in the document as users hardly ever get it one hundred percent right. Besides things like tables and tabbed columns will definitely require some additional work.

Create Chapters

Since the document I had to work with was one long text flow, I decided to import one long document into InDesign and let a script cut the resulting document into chapters after import. This was a fairly easy script to write since the first line of every chapter was set in a unique paragraph style (Title).
The script first searched for each occurrence of the Title paragraph style in the document. This produced a list of text index references from back of the document forward. Then, within a repeat loop the list was parsed:

  1. The text between the chapter markers was cut
  2. a new InDesign document was created from the book template
  3. the cut text was placed in the new document
  4. the new document was saved to the same folder as the original document using the chapter title as its name

NOW FOR THE IMAGES

The next challenge was to place the images.

The one step that is absolutely necessary for this workflow is to resample and/or resize images to fit the document’s criteria. Because I wanted to optimize the image quality as part of this step, I left this as pretty much a manual method. The images would be:

  • No wider or taller than page margins would allow
  • Resolution set to the final resolution required for the InDesign document.
  • image files named to correspond to the name of the original image

You can make minor changes to image files once placed in the InDesign document, but for the most part, images should be page-ready when imported.

Now for the main reason for this blog: the script for placing the images.

IMAGE PLACEMENT SCRIPT

The major modules for this script were as follows:

Basic script setup – establish values for script. Variable values should be obtained from user with a custom dialog. Here they are just hard-coded.

set foundSet to {}
--minimum characters allowed to consider text a story
set minText to 1 
--arbitrary number to add to top margin for testing if image is at top of page
set topTest to 20 
--character style for image tags
set noPrintName to "NoPrint"
--objectStyles
set objStyleName to "ImageStyle"
set topObjName to "TopObjStyle"
--use paragraph style to align images horizontally
set paraAlignName to "AlignParagraph"
--will return list of stories from getStoryRef handler if set to true
set allowMultiple to false 

Choose Folder – Have the user choose the folder that contains the image files:

try
   set dPath to choose folder with prompt "Select folder for images to place"
   set folderPath to dPath as string
on error errStr
   --give the user a message and exit the script
end try

Document setup – Set application preferences and get document margins (right hand page)

tell application "Adobe InDesign CC 2015"
   set measurement unit of script preferences to points
   set preserve bounds of image preferences to true
   set docRef to document 1
   tell margin preferences of docRef
      copy {top, right, bottom, left} to {py0, px0, py1, px1}
   end tell

Story reference – Get a reference to your story
Your Google document should be one story. For some reason when a Word document is placed in InDesign you may end up with more than one story. Only one of these stories will have any amount of text so our handler should be able take care of this.

Call to the handler:

set storyRef to getStoryRef (docRef, minText, allowMultiple)
(*Parses through list of stories to get those not on master spread
and whose length is greater than 1. 
If allowMultiple, a list is returned, otherwise item 1 of the test list is returned*)
on getStoryRef(docRef, minText, allowMultiple)
   tell application "Adobe InDesign CC 2015"
      set docRef to document 1
      set testList to {}
      set storyList to stories of docRef
      repeat with i from 1 to length of storyList
         try
            set storyRef to item i of storyList
            set testIt to text containers of storyRef
            set theTest to class of parent of parent page of item 1 of testIt
            if theTest is spread then
               set end of testList to theTest
           end if
        end try
      end repeat
      set lastTest to {}
      repeat with i from 1 to length of testList
         set storyLength to length of storyRef
         if storyLength > minText then
            set end of lastTest to storyRef
         end if
      end repeat
      if allowMultiple then
         return lastTest
      else
         return item 1 of lastTest
      end if
   end tell
end getStoryRef

For simplicity, the script assumes that allow multiple is false so only one story is returned from the handler. Otherwise you would need to repeat through the stories if a list of more than one is returned.

Image tag References – Get a list of references for the image tags in the document
You can use find grep to get this list. For this I used the following:

set find grep preferences to nothing
set change grep preferences to nothing
set find what of find grep preferences to "\\[image[0-9]*(\\.[a-z]+)\\]"
tell storyRef
   set foundSet to find grep with reverse order
end tell

Notice that the list is returned in reverse order. This is important. Any time a script relies on object indexes, always work the document in reverse order. Otherwise, indexes will change and you will most likely end up with a mess.

PLACE IMAGES

If the foundSet is greater than one (1), repeat through the set and place the images. At first I was going to delete the image tags after placing the images, but after a few trial and errors I decided to just change the color of the text to nothing. This way you can use the style attributes of the image placement tags to tweak the positioning of the image in relation to caption or text following.

Insertion Point – Establish Insertion Point for Placement

Use the index of the image tags to get a reference to the insertion point to use as the object for placing

repeat with j from 1 to length of foundSet
   try
      set testIt to item j of foundSet
      set applied character style of testIt to charStyleRef
      set insIdx to index of testIt
      set insertRef to insertion point (insIdx) of thisStory

Get the name of the file to place from the image tag

      set imageText to contents of testIt
      set textLength to length of imageText
      set fileName to text 2 thru -2 of imageText
      set filePath to folderPath & fileName

Place the image Have the insertion point defined above place the image

   tell insertRef
      set placedList to place file filePath
      set imageRef to item 1 of placedList
   end tell

Apply object styles – (you may have other tests you will want to add)

   set containerRef to parent of imageRef
   set applied object style of containerRef to objStyleRef
   if item 1 of geometric bounds of containerRef ≤ (py0 + topTest) then
      set applied object style of containerRef to topStyleRef
   else
      set applied object style of containerRef to objStyleRef
   end if

End the repeat

Setting image placement tags to no print was key

This is quite the script. Putting it together should not be too hard for you to do as the majority of the code is supplied. The next time your client says that the text will come from Google Docs, you will have a workflow that should be pretty much automated. And, since you know how to write scripts, you are in control should you need to add some more functionality.

Disclaimer: Code for scripts are included to inspire users to create their own using sample code as appropriate. No claim is made to the effect that any script is complete or without error. Use at your own risk.