Reading Files As Lists, Deeper Dive
In previous blog posts we spent a fair amount of time working with reading and writing files, and for good reason. A text file, a tab/return (.tsv), or comma/return (.csv) file can serve as the core source of information for an automation workflow.
READING A FILE AS CLASS
A parameter you may want to include in your read file handlers is one that defines how the file will be read. The class of the as parameter of a read command can make the difference between your being able to use a file provided by a client. If you don’t specify the as (class) parameter, the file is read as text. But you may need to read the file as UTF-8 or UTF-16 to interpret those upper-level characters.
One handler you might want to add to your library of handlers is the following that reads a tab/return delimited file as a list of lists using Unicode 8.
(*Reads file in as a list of lists*) on readAsListsofList(myFile) set processList to {} set fileRef to open for access (myFile) set myEOF to get eof fileRef set oldDelim to AppleScript's text item delimiters if myEOF > 0 then set AppleScript's text item delimiters to {" "} --tab inside quotes set textList to read fileRef as «class utf8» using delimiter {return} close access fileRef repeat with i from 1 to length of textList set myData to item i of textList set end of processList to text items of myData end repeat set AppleScript's text item delimiters to oldDelim else error "Text file is empty" end if return processList end readAsListsofList
READ AND WRITE AS LIST
You can also use date or list for the class but this is only useful if the data was written using a write statement specifying the same class value as its as parameter. The following is an example of how to write and then read a file using the as list parameter. This readAsList handler is a simplified version and should only be used when you know the file in question is not empty and has been written using the as list parameter.
set myList to {"1234", "2222", "3333", "4567"} set userPath to path to desktop from user domain as string set filePath to userPath & "testFile.txt" set fileRef to open for access file filePath with write permission set myData to write myList to fileRef as list close access fileRef set myList to readAsList(filePath) on readAsList(filePath) set fileList to read file filePath as list return fileList end readAsList
LIST CONSIDERATIONS
When passing a list to a handler, AppleScript actually creates a reference to the list rather than the list itself. Using a reference to is much more efficient but can cause some problems if you are not aware. For instance, if you make a change to the list in the handler, the original list can also be changed.
This is how the AppleScript reference describes the difference:
Passing by Reference Versus Passing by Value
Within a handler, each parameter is like a variable, providing access to passed information. AppleScript passes all parameters by reference, which means that a passed variable is shared between the handler and the caller, as if the handler had created a variable using the set command. However, it is important to remember a point raised in Using the copy and set Commands: only mutable objects (those whose class is date, list, record, or script) can actually be changed.
As a result, a parameter’s class type determines whether information is effectively passed by value or by reference:
For mutable objects, information is passed by reference: If a handler changes the value of a parameter of this type, the original object is changed.
For all other class types, information is effectively passed by value: Although AppleScript passes a reference to the original object, that object cannot be changed. If the handler assigns a new value to a parameter of this type, the original object is unchanged.
If you want to pass by reference with a class type other than date, list, record, or script, you can pass a reference object that refers to the object in question. Although the handler will have access only to a copy of the reference object, the specified object will be the same. Changes to the specified object in the handler will change the original object, although changes to the reference object itself will not.
Some Examples
A few examples can demonstrate the difference:
set myList to {1234, 2222, 3333, 4567} set newList to incrementMyList(myList) {newList, myList} on incrementMyList(aList) set newList to {} repeat with i from 1 to length of aList set item i of aList to ((item i of aList) + 1) set end of newList to item i of aList end repeat return newList end incrementMyList
The result here is that the items in the original list are also incremented.
This can be corrected in a number of ways. Just be aware that the value of the receiving variable in the handler is not a new variable but acts as a reference to the original.
Notice that values such as strings and numbers do not have the same pass by reference behavior. Although actually passed to the handler by reference, all immutable classes (including strings and numbers) cannot be changed so remain as originally defined.
set fName to "John" set lName to "Jones" set fullName to getFullName(lName, fName) {fullName, fName} on getFullName(a, b) set b to "Mr. " & b set fullName to b & " " & a return fullName end getFullName
Try the same thing only this time using a list. You will see that the value for the second item in the original list has also become “Mr. John”
set nameParts to {"Jones", "John"} set fullName to getFullName(nameParts) {fullName, item 2 of nameParts} on getFullName(a) set item 2 of a to "Mr. " & (item 2 of a) set fullName to item 2 of a & " " & item 1 of a return fullName end getFullName
When working with large lists (as in reading in a file) you will find it is more efficient to use the a reference to operator. For instance, the following example provided by Apple uses time of (current date) to demonstrate how long it takes to enter 10000 items into a list.
set bigList to {} set numItems to 10000 set t to (time of (current date)) --Start timing operations repeat with n from 1 to numItems copy n to the end of bigList -- DON'T DO THE FOLLOWING--it's even slower! -- set bigList to bigList & n end set total to (time of (current date)) - t --End timing
On a fast machine, the 2 seconds it takes for the process may not be limiting, but on a slower machine you may not want to try it.
On the other hand, using a reference to makes working with a big list bearable even on a slower computer.
set bigList to {} set bigListRef to a reference to bigList set numItems to 10000 set t to (time of (current date)) --Start timing operations repeat with n from 1 to numItems copy n to the end of bigListRef end repeat set myNow to (time of (current date)) set total to myNow - t --End timing {t, myNow}
On a fast machine, the same process above using a reference to is so fast that there is no difference in milliseconds between the results for myNow and t.
You may also consider using a reference to when accessing items within a huge list. Here again, there is an appreciable performance difference.
set bigList to {} set bigListRef to a reference to bigList set numItems to 10000 repeat with n from 1 to numItems copy n to the end of bigListRef end repeat set numItems to 5000 set t to (time of (current date)) repeat with n from 1 to numItems item n of bigList end repeat set myNow to time of (current date) set total to myNow - t total
Change the above to use a reference to and see the difference.
set numItems to 5000 set bigListRef to a reference to bigList set t to (time of (current date)) repeat with n from 1 to numItems item n of bigListRef end repeat set myNow to time of (current date) set total to myNow - t {myNow, t}
ONWARD AND UPWARD
The next time a user sends you a “tsv” or a “csv” file, consider creating a script for automating the information using a list or a list of lists. You may find yourself getting a “hero” badge for your efforts.
Disclaimer:
Scripts provided are for demonstration and educational purposes. No representation is made as to their accuracy or completeness. Readers are advised to use the code at their own risk.