How to speed up MC, Part 2
Wil Dijkstra
w.dijkstra at scw.vu.nl
Mon Apr 7 12:08:01 EDT 2003
Thank you for your positive reactions. Some of you correctly observed
that repeat 10000 in script B is a bit faster then repeat with i = 1
to 10000. However, I want to make scripts as comparable as possible, to
focus on the differences that Im going to discuss, and to prevent
people from thinking that the speed enhancement might be due to other
differences in the scripts. But please continue adding your comments!
They are highly appreciated
Some warnings. I dont have the slightest idea how MC is programmed and
dont have any inside information (I can only tell that the MC guys did
a clever job). My knowledge is based on many years programming
experience with languages like Pascal. This means that I can be wrong in
explaining underlying mechanisms of how MC works! So dont take my word
for the truth and nothing but the truth. Dont hesitate to correct me if
you think Im wrong. Dont adjust your scripts without testing if it
really increases the speed on your type of data.
Okay, lets continue with part 2.
Suppose you have a list of numbers, for example a list that is read from
a datafile. Common ways to separate the different numbers are CRs,
commas or spaces.
To get a particular number, in case of a comma-delimited list, you may
use something like:
script A
get item 900 of myList
In case of a space-delimited list, you may use:
script B
get word 900 of myList
Which script is faster? The answer is that script A may be 50 to 100%
faster than script B (depending on the size, that is, the number of
characters of the numbers in your list). The same logic as discussed in
part 1 of How to speed up MC applies. To get a particular item from an
item-delimited list, the computer has to count commas. That means that
for each character in the list, the computer has to decide whether it is
a comma or not, until (in script A) 899 commas are counted. In script B
the computer has to decide whether a character is a space, a tab or a
CR: all three characters are word delimiters. If your list is
space-delimited, change script B to (of course your list should not
contain spaces):
script C
set itemDelimiter to space
get word 900 of myList
and it will be (nearly) as fast as script A. The statement set
itemDelimiter to space hardly takes time (and of course you will put it
outside any repeat loop).
It is easy to understand now that getting word 100 from myList will take
much less time then getting word 900. Getting the last word takes the
most amount of time. If you know your list has 1000 numbers, it may be
good to know that:
get word 1000 of myList
will be faster than
get last word of myList
The reason is that the computer first has to figure out how many words
myList contains. Hence,
get last word of myList
is equivalent to something like:
put the number of words of myList into n
get word n of myList
Now suppose you have some text, simply consisting of characters, lets
say 5000 characters. Which script will be faster?
script D
get character 5000 of myText
script E
get last character of myText
Contrary to what you may expect now, script E is as fast, or even faster
than script D. A variable like myList is stored somewhere in memory. The
computer (or MC) has to know (a) the position of the start of the
variable in memory and (b) the length (the number of characters) of that
variable to prevent that other variables are written over existing ones.
To get a particular character, say character 345, the computer just has
to add 345 to the starting position of the variable, to know exactly
where character 345 resides. Consequently, unlike items, words or lines,
all characters in a text are accessed equally fast, and, even more
important, very fast.
How can we use this property to speed up our scripts?
Lets go back to the example of getting the last word, item or line of a
list. Suppose our list is CR delimited. To get the last line, we could use:
script F
get last line of myList
However, script G will usually be much, much faster!
script G
put the number of chars of myList into nChars
repeat with i = nChars down to 1
if char i of myList = cr then exit repeat
end repeat
get char i + 1 to nChars of myList
Similarly, to get the one but last line, we can extend our script a bit,
as a fast alternative to get line -2 of myList. We can use this
principle to write a function handler that is similar to the lineOffset
function, but searches backwards, to find the last instance of number x
in myList. The simple, but slow function, would look something like this:
script H
function scanBackSlow aLine, aList
put the number of lines of aList into nLines
repeat with i = nLines down to 1
if line i of aList = aLine then return i
end repeat
return 0
end scanBackSlow
Script I is much faster however:
function scanBackFast aLine, aList
put the number of chars of aList into nChars
put 0 into count
put nChars into k
repeat with i = nChars down to 1
if char i of aList = cr then
add 1 to count
if char i + 1 to k of aList = aLine then
return the number of lines of aList + 1 - count
end if
put i - 1 into k
end if
end repeat
return 0
end scanBackFast
Generally you write function handlers that are repeatedly used in your
scripts. Unfortunately, calling a handler, passing parameters and
returning the result, takes time. Consider script J:
script J
function myFunction n
-- do something with n
return n
end myFunction
In this example, the handler performs some transformation on n and
returns the result.
To call the handler from another script, you may write: put myFunction
(m) into m. In executing this handler, the variable m is copied into
variable n, some transformation is performed on n, and the result is
send back and copied into m.
Alternatively, you can write a message handler like:
script K
on myHandler @n
-- do something with n
end myHandler
To call the handler from another script, just write: myHandler m to
obtain exactly the same result. The difference is that no copying takes
place. The transformation takes place on the variable that resides on
the place of variable m in memory: no new variable n is created. And
hence the handler will execute faster (about 50% if do something with
n wouldnt take time, but the relative gain depends of course on the
time the transformation itself takes).
The variable n in myHandler n should not be something like myHandler
item 3 of myItems because item 3 of myItems is not a variable (but
part of a variable): MC has no idea where item 3 of myItems starts, or
what its length is.
If you dont want to put the result of the transformation of m into the
same variable m, there is nothing wrong with script L:
script L
on myHandler @n, @r
-- do something with n and put the result into r
end myHandler
and you can call the handler with:
myHandler m, p
The result of the transformation of m will be put into p, which will be
faster then
put myFunction (m) into p
In the same way, you can make scripts like script I above, a bit faster
if you just write:
function scanBackFast aLine, @aList
You save the time it takes copying myList into aList.
Happy programming,
Wil Dijkstra
More information about the metacard
mailing list