How to speed up MC, Part 3
Wil Dijkstra
w.dijkstra at scw.vu.nl
Mon Apr 14 05:57:01 EDT 2003
How to speed up MC, Part 3
It is quite interesting to compare the speed of MetaCard with HyperCard
on the one hand and lower level languages like Pascal or C on the other
hand. Consider a simple statement like (assuming that x is appropriately initialized):
put x + 1 into x
How do the different languages compare? Because you cannot use MetaBench
for this comparison, here are the (rough) results.
Pascal: x := x+1 --> 1
C++: x = x+1 --> 1
MetaCard: add 1 to x --> 80
MetaCard: put x + 1 into x --> 130
HyperCard: add 1 to x --> 2800
HyperCard: put x + 1 into x --> 3600
Empty extension: --> 3500
The relative speeds might be different for different computers or
operating systems of course. Nevertheless, it can easily be seen that
HyperCard is much slower than MetaCard. MetaCard however is much slower
than Pascal or C. This difference may suggest that writing a simple
extension that adds up numbers might considerably enhance the speed of
calculations. Unfortunately this is not true. The overhead of calling an
extension and sending the result back to MC outweighs the gain. In the
table above you can see that calling an "empty" extension (an extension
that just does nothing) would take 3500 time units. Extensions can
enhance the speed, but only if the total amount of calculations
performed by the extension outweighs calling the extension and sending
the result back. It is easy to see that if your extension performs an
adding operation 100 times, you have a significant speed gain (about
3600 versus 8000 units). More about extensions in a forthcoming part.
Hence we need different approaches to enhance the speed of calculations.
Don't expect to learn how to add up numbers faster. But you can expect
how to prevent that scripts manipulating numbers may run 10 or even 50
times (!) or so slower than necessary.
First some fairly simple things to get the most out of calculations.
You most likely know that
add 1 to x
executes faster then:
put x + 1 into x
The reason is that in 'add 1 to x', the variable x is accessed only
once, whereas in 'put x + 1 into x' it is accessed two times. The
computer doesn't know that the result of 'x + 1' should be put in the
same place x and has to figure out twice where in memory x resides;
execution of the statement is similar to the execution of 'put x + 1
into y'.
Probably you already preferred to write 'add 1 to x' instead of 'put x +
1 into x': you have less characters to type. Be honest, do you also
always write 'subtract 1 from x' instead of 'put x - 1 into x'? The
speed gain is the same, for the very same reason. And this also holds of
course for the multiply and divide operators; or for x^2 versus x*x.
Accessing variables as least as possible is a general principle to
enhance the speed. Suppose you have a complex calculation:
script A
put (Ncases*covXY-sumX*sumY)/ sqrt((Ncases*sum2X-\
(sumX)^2)*(Ncases*sum2Y-(sumY)^2)) into r
The statement (the formula for the pearson correlation, if you really
want to know) is hardly readable and probably you run into trouble in
putting the parentheses at the right places. So the next script looks
much better:
script B
put Ncases * covXY - sumX * sumY into c
put Ncases * sum2X - (sumX)^2 into r1
put Ncases * sum2Y - (sumY)^2 into r2
put c / sqrt(r1 * r2) into r
Much nicer, but the script is 30% slower because of the overhead of
putting and getting the variables c, r1 and r2. The general rule is:
minimize accessing variables, either to get them, or to put something in them.
The div and mod operators are often used for particular transformations
of numbers. For example, to transform number 0 to 7 into 7, 8 to 15 into
15, 16 to 23 into 23, etc. you may use:
get (i div x + 1) * x - 1
(i is the number to transform, x signifies the range, 8 in this case)
To transform 0 to 7 into 0 to 7, 8 to 15 into 0 to 7, 16 to 23 into 0
to 7, etc. you may use:
(i mod x + 1) - 1
For particular numbers (i.e. 2, 4, 8, 16, etc.) it is faster to use the
bitwise operators:
i bitor (x-1)
respectively
i bitand (x-1)
to obtain the same result, but more then 50 % faster.
To determine whether a number is odd or even, you can use:
(x bitand 2) = 0
(true if even, false if odd)
If you wonder what bitwise operators are (the documentation in MC is
minimal), see the text at the end of this contribution.
----------------------------------
Now for something completely different.
Suppose you have read a real number x, e.g. "0.345678" from a field.
Compare these two scripts:
script C
get x/(1-x)
script D
add 0 to x
get x/(1-x)
Believe it or not, but script D will run considerably faster.
If the content of a field (or a file) is read, MC hasn't the faintest
idea that it's meant to be a number, and stores it as a string of
characters. Thus it has to convert the string into a number, including a
check whether or not x is a number before it actually starts
calculating. In executing script C, this conversion takes place two
times. In script D however, the conversion takes place only once. If
'add 0 to x' is executed, the string x is converted to a number, and the
number is stored into x: now x no longer contains the string "0.345678"
but (a binary representation of) the number 0.345678. Thus it saves a
lot of time in performing calculations on it. The more x's in the
equation, e.g. x*(1+x)/(1-x), the more you gain.
It's a bit tricky to perform a speed test on script D, to compare it
with C. Consider script E:
script E
put fld "inputFld" into x
repeat 100000
add 0 to x
get x/(1-x)
end repeat
After the first loop x is converted to a number. Hence the conversion
takes place only once, instead of 100000 times. You should be very well
aware of such peculiarities in performing speed tests. But you can
safely try:
script F
put fld "inputFld" into x
add 0 to x -- convert it to a real number
repeat 100000
get x/(1-x)
end repeat
It will run nearly 10 times as fast compared to a script without 'add 0
to x', excluding the overhead of the repeat loop.
Generally, if you store a variable, and MC has become aware that this
variable is a number, it will be stored as a binary number: no
conversion is needed if it is used for further calculations.
You may note that if your are doing speed tests on numbers, and test
numbers are read from a field, results will be completely unreliable, if
you assume that the numbers read, are binary numbers.
If a variable is stored as a binary number, and you use statements like
'get length (x)' or 'get char 3 of x', it has to be converted to a
string, before the statement can be executed. For some reason,
converting a number to a string is much slower then converting a string
to a number. Converting an integer to a string takes about 3 times as
much as the other way around, whereas converting a real number is even
slower, because of the more complex formatting (by the way, MC does not
distinguish internally between integers and real numbers; all numbers
are stored as reals).
Suppose variable x is read from a field, containing "0.345678". Compare:
script G
add 0 to x -- convert it to a real number
repeat 100000
get last char of x
end repeat
script H
repeat 100000
get last char of x
end repeat
Script G may take about 50 times as long as script H...
If variable x is binary stored, and you are going to perform a number of
string operations on it (like if char 1 of x = "2"; get offSet (".", x);
put length (x) into s), you better copy it into another variable and
convert the copy to a string first, for example:
put x into s
put empty after s
You should leave the original variable x untouched. Compare script I and J:
script I
put 1/3 into x
put last char of x into last char of x
put 3*x
script J
put 1/3 into x
put 3*x
Script I yields 0.999999 (if the local numberFormat has the default
value), script J yields 1. Because you converted the number to a string,
you lost precision.
To illustrate things, here is a nice example from Xavier, in a reaction
to Part 1 of How to speed up MC. He wrote:
> Another 2 things to consider in your loops:
> - set cursor to busy
> will slow down scripts considerably.
>
> - If you have a "progress" bar, dont change it at every loop,
> do so for each percent change or at every 100th...
> this will also save major time...
>
> for example
> repeat with x = 1 to 100000
> if char -3 of x is not oldx then
> set cursor to busy
> showstatus
> put char -3 of x into oldx
> end if
Although his observations are correct, his solution is not (sorry
Xavier, but I love your example because it illustrates that we really
can gain a lot).
The variable x is a binary one, not a string one (it's produced by MC
itself). So each time 'if char -3 of x is not oldx then' is executed, x
is converted to a string, to be able to get character -3: it takes an
awful lot of time. To be precise, x itself remains intact, the converted
copy is put in some internal temporary variable.
The script takes 370 ticks on my computer, neglecting 'showStatus' of
course. Alternatively, take a look at script J:
put the ticks into myTestTime
repeat with x = 1 to 1000000
if the ticks - myTestTime > 5 then
put the ticks into myTestTime
set cursor to busy
end if
-- do something
end repeat
It takes 160 ticks. Not that much difference? Please note the difference
in the number of repeats: hunderd and thousand versus one million! This
script has some of other advantages too:
- The rotation of the busy cursor is independent of the time 'do
something' takes (unless it takes more then 5 ticks, in which case that
the script would take a day or more).
- You can set a flag, becoming true the first time the ticks -
myTestTime exceeds 5 ticks. Based on the value of x at that moment and
the total number of repeats, you can decide whether or not a progress
bar is shown (assuming that the 'do something's' during the first 5
ticks are representative for the remaining ones). For example, if the
total time is expected to exceed 60 ticks, show a progress bar, else
stick to the busy ball.
For your convenience, here is a more complete scheme.
put 60 into myDelayPref
-- but better let user decide on whether progress bar will be
-- shown if analysis is expected to take more then 1 second,
-- 2 seconds, 5 seconds or whatever
put 0 into myProgresStatus
put the number of lines of myData into nLines
put the ticks into myTestTime
repeat with x = 1 to nLines
if the ticks - myTestTime > 5 then
switch myProgresStatus
case 2 -- analysis takes more then one second
myAdjustProgressBar -- adjusts the progress bar
break
case 1 -- analysis will take less then 1 second
set cursor to busy
break
case 0 -- not checked yet
if (5*nLines)/x > myDelayPref then -- take 5 ticks as criterion
for updating
myShowProgressBar -- show a progress bar
put 2 into myProgresStatus
else
put 1 into myProgresStatus
end if
break
end switch
put the ticks into myTestTime
end if
-- do something
end repeat
If analyses take more then 10 ticks, case 0 will occur the least, hence
this case is the last one; if the analysis takes less then 10 ticks --
well, why bother?
The five ticks delay before a progress bar appears will not be observed
by the user.
Take the switch part (not the ticks check!) out of the script to create
a function handler, returning myProgresStatus; this will hardly add
time, because it is only called each five ticks.
Happy programming,
Next part about arrays, one of the greatest things in MC (maybe within 2
or 3 weeks)
Wil Dijkstra
----------------------------------
About bitwise operators
Numbers (actually, we are only talking about non-negative integer
values) are eventually stored as ones and zeros in memory. To find out
how a particular number looks like, you can use baseConvert. For example
to get the binary representation (base 2) of the number 9 (base 10):
get baseConvert (9, 10, 2)
yields 1001.
The binary representation of 12 is obtained with baseConvert (12, 10,
2), yielding 1100.
What does something like '9 bitAnd 12' means?
The first bit of both binary numbers are both 1, hence the result for
the first bit is 1 (like the result of 'true and true' is true). The
second bits are unequal, hence a '0' (true and false is false). The
third one yields '0' (false and false is false) and the final bits yield
'0'. The result is the binary number 1000. Applying baseConvert (1000,
2, 10) yields 8. And this is exactly what the statement:
get 9 bitAnd 12
produces.
Schematically:
1001 --> 9
1100 --> 12
------
1000 --> 8
Similarly, 9 bitOr 12 yields the binary number 1101, which is 13.
Applying bitXor (exclusive or: either the first, or the second bit
should be 1, but not both) yields 0101, which is 5.
More information about the metacard
mailing list