How to speed up MC, Part 2

Mon Apr 7 12:08:01 EDT 2003

Thank you for your positive reactions. Some of you correctly observed
that ‘repeat 10000’ in script B is a bit faster then ‘repeat with i = 1
to 10000’. However, I want to make scripts as comparable as possible, to
focus on the differences that I’m going to discuss, and to prevent
people from thinking that the speed enhancement might be due to other
differences in the scripts. But please continue adding your comments!
They are highly appreciated
Some warnings. I don’t have the slightest idea how MC is programmed and
don’t have any inside information (I can only tell that the MC guys did
a clever job). My knowledge is based on many years programming
experience with languages like Pascal. This means that I can be wrong in
explaining underlying mechanisms of how MC works! So don’t take my word
for the truth and nothing but the truth. Don’t hesitate to correct me if
you think I’m wrong. Don’t adjust your scripts without testing if it
really increases the speed on your type of data.
Okay, let’s continue with part 2.

Suppose you have a list of numbers, for example a list that is read from
a datafile. Common ways to separate the different numbers are CR’s,
comma’s or spaces.
To get a particular number, in case of a comma-delimited list, you may
use something like:

script A

  get item 900 of myList

In case of a space-delimited list, you may use:

script B

  get word 900 of myList

Which script is faster? The answer is that script A may be 50 to 100%
faster than script B (depending on the size, that is, the number of
characters of the numbers in your list). The same logic as discussed in
part 1 of “How to speed up MC” applies. To get a particular item from an
item-delimited list, the computer has to count comma’s. That means that
for each character in the list, the computer has to decide whether it is
a comma or not, until (in script A) 899 comma’s are counted. In script B
the computer has to decide whether a character is a space, a tab or a
CR: all three characters are word delimiters. If your list is
space-delimited, change script B to (of course your list should not
contain spaces):

script C

  set itemDelimiter to space
  get word 900 of myList

and it will be (nearly) as fast as script A. The statement ‘set
itemDelimiter to space’ hardly takes time (and of course you will put it
outside any repeat loop).

It is easy to understand now that getting word 100 from myList will take
much less time then getting word 900. Getting the last word takes the
most amount of time. If you know your list has 1000 numbers, it may be
good to know that:
  get word 1000 of myList
will be faster than
  get last word of myList

The reason is that the computer first has to figure out how many words
myList contains. Hence,
  get last word of myList
is equivalent to something like:
  put the number of words of myList into n
  get word n of myList

Now suppose you have some text, simply consisting of characters, let’s
say 5000 characters. Which script will be faster?

script D

  get character 5000 of myText

script E

  get last character of myText

Contrary to what you may expect now, script E is as fast, or even faster
than script D. A variable like myList is stored somewhere in memory. The
computer (or MC) has to know (a) the position of the start of the
variable in memory and (b) the length (the number of characters) of that
variable to prevent that other variables are written over existing ones.
To get a particular character, say character 345, the computer just has
to add 345 to the starting position of the variable, to know exactly
where character 345 resides. Consequently, unlike items, words or lines,
all characters in a text are accessed equally fast, and, even more
important, very fast.

How can we use this property to speed up our scripts?
Let’s go back to the example of getting the last word, item or line of a
list. Suppose our list is CR delimited. To get the last line, we could use:

script F

  get last line of myList

However, script G will usually be much, much faster!

script G

  put the number of chars of myList into nChars
  repeat with i = nChars down to 1
    if char i of myList = cr then exit repeat
  end repeat
  get char i + 1 to nChars of myList

Similarly, to get the one but last line, we can extend our script a bit,
as a fast alternative to ‘get line -2 of myList’. We can use this
principle to write a function handler that is similar to the lineOffset
function, but searches backwards, to find the last instance of number x
in myList. The simple, but slow function, would look something like this:

script H

function scanBackSlow aLine, aList
  put the number of lines of aList into nLines
  repeat with i = nLines down to 1
    if line i of aList = aLine then return i
  end repeat
  return 0
end scanBackSlow

Script I is much faster however:

function scanBackFast aLine, aList
  put the number of chars of aList into nChars
  put 0 into count
  put nChars into k
  repeat with i = nChars down to 1
    if char i of aList = cr then
      add 1 to count
      if char i + 1 to k of aList = aLine then
        return the number of lines of aList + 1 - count
      end if
      put i - 1 into k
    end if
  end repeat
  return 0
end scanBackFast

Generally you write function handlers that are repeatedly used in your
scripts. Unfortunately, calling a handler, passing parameters and
returning the result, takes time. Consider script J:

script J

  function myFunction n
  -- do something with n
    return n
  end myFunction

In this example, the handler performs some transformation on n and
returns the result.
To call the handler from another script, you may write: ‘put myFunction
(m) into m’. In executing this handler, the variable m is copied into
variable n, some transformation is performed on n, and the result is
send back and copied into m.
Alternatively, you can write a message handler like:

script K

on myHandler @n
-- do something with n
end myHandler

To call the handler from another script, just write: ‘myHandler m’ to
obtain exactly the same result. The difference is that no copying takes
place. The transformation takes place on the variable that resides on
the place of variable m in memory: no new variable n is created. And
hence the handler will execute faster (about 50% if ‘do something with
n’ wouldn’t take time, but the relative gain depends of course on the
time the transformation itself takes).
The variable n in ‘myHandler n’ should not be something like ‘myHandler
item 3 of myItems’ because ‘item 3 of myItems’ is not a variable (but
part of a variable): MC has no idea where ‘item 3 of myItems’ starts, or
what its length is.
If you don’t want to put the result of the transformation of m into the
same variable m, there is nothing wrong with script L:

script L

on myHandler @n, @r
-- do something with n and put the result into r
end myHandler

and you can call the handler with:
  myHandler m, p
The result of the transformation of m will be put into p, which will be
faster then
  put myFunction (m) into p

In the same way, you can make scripts like script I above, a bit faster
if you just write:
  function scanBackFast aLine, @aList
You save the time it takes copying myList into aList.

Happy programming,

Wil Dijkstra