How to speed up MC, Part 2
Raymond E. Griffith
rgriffit at ctc.net
Mon Apr 7 18:22:00 EDT 2003
Very good stuff here, Wil. Now for a question.
How would this script
get line(-1) of k
compare to
get the last line of k
and to your script below?
Constructs like "word(-1) of k" and "item -3 to -1 of k" are things I use
all the time.
I'd like your analysis, please.
Thanks!
Raymond
on 4/7/03 11:03 AM, Wil Dijkstra at w.dijkstra at scw.vu.nl wrote:
> Thank you for your positive reactions. Some of you correctly observed
> that repeat 10000 in script B is a bit faster then repeat with i = 1
> to 10000. However, I want to make scripts as comparable as possible, to
> focus on the differences that Im going to discuss, and to prevent
> people from thinking that the speed enhancement might be due to other
> differences in the scripts. But please continue adding your comments!
> They are highly appreciated
> Some warnings. I dont have the slightest idea how MC is programmed and
> dont have any inside information (I can only tell that the MC guys did
> a clever job). My knowledge is based on many years programming
> experience with languages like Pascal. This means that I can be wrong in
> explaining underlying mechanisms of how MC works! So dont take my word
> for the truth and nothing but the truth. Dont hesitate to correct me if
> you think Im wrong. Dont adjust your scripts without testing if it
> really increases the speed on your type of data.
> Okay, lets continue with part 2.
>
>
> Suppose you have a list of numbers, for example a list that is read from
> a datafile. Common ways to separate the different numbers are CRs,
> commas or spaces.
> To get a particular number, in case of a comma-delimited list, you may
> use something like:
>
> script A
>
> get item 900 of myList
>
> In case of a space-delimited list, you may use:
>
> script B
>
> get word 900 of myList
>
> Which script is faster? The answer is that script A may be 50 to 100%
> faster than script B (depending on the size, that is, the number of
> characters of the numbers in your list). The same logic as discussed in
> part 1 of How to speed up MC applies. To get a particular item from an
> item-delimited list, the computer has to count commas. That means that
> for each character in the list, the computer has to decide whether it is
> a comma or not, until (in script A) 899 commas are counted. In script B
> the computer has to decide whether a character is a space, a tab or a
> CR: all three characters are word delimiters. If your list is
> space-delimited, change script B to (of course your list should not
> contain spaces):
>
> script C
>
> set itemDelimiter to space
> get word 900 of myList
>
> and it will be (nearly) as fast as script A. The statement set
> itemDelimiter to space hardly takes time (and of course you will put it
> outside any repeat loop).
>
> It is easy to understand now that getting word 100 from myList will take
> much less time then getting word 900. Getting the last word takes the
> most amount of time. If you know your list has 1000 numbers, it may be
> good to know that:
> get word 1000 of myList
> will be faster than
> get last word of myList
>
> The reason is that the computer first has to figure out how many words
> myList contains. Hence,
> get last word of myList
> is equivalent to something like:
> put the number of words of myList into n
> get word n of myList
>
> Now suppose you have some text, simply consisting of characters, lets
> say 5000 characters. Which script will be faster?
>
> script D
>
> get character 5000 of myText
>
> script E
>
> get last character of myText
>
> Contrary to what you may expect now, script E is as fast, or even faster
> than script D. A variable like myList is stored somewhere in memory. The
> computer (or MC) has to know (a) the position of the start of the
> variable in memory and (b) the length (the number of characters) of that
> variable to prevent that other variables are written over existing ones.
> To get a particular character, say character 345, the computer just has
> to add 345 to the starting position of the variable, to know exactly
> where character 345 resides. Consequently, unlike items, words or lines,
> all characters in a text are accessed equally fast, and, even more
> important, very fast.
>
> How can we use this property to speed up our scripts?
> Lets go back to the example of getting the last word, item or line of a
> list. Suppose our list is CR delimited. To get the last line, we could use:
>
> script F
>
> get last line of myList
>
> However, script G will usually be much, much faster!
>
> script G
>
> put the number of chars of myList into nChars
> repeat with i = nChars down to 1
> if char i of myList = cr then exit repeat
> end repeat
> get char i + 1 to nChars of myList
>
> Similarly, to get the one but last line, we can extend our script a bit,
> as a fast alternative to get line -2 of myList. We can use this
> principle to write a function handler that is similar to the lineOffset
> function, but searches backwards, to find the last instance of number x
> in myList. The simple, but slow function, would look something like this:
>
> script H
>
> function scanBackSlow aLine, aList
> put the number of lines of aList into nLines
> repeat with i = nLines down to 1
> if line i of aList = aLine then return i
> end repeat
> return 0
> end scanBackSlow
>
> Script I is much faster however:
>
> function scanBackFast aLine, aList
> put the number of chars of aList into nChars
> put 0 into count
> put nChars into k
> repeat with i = nChars down to 1
> if char i of aList = cr then
> add 1 to count
> if char i + 1 to k of aList = aLine then
> return the number of lines of aList + 1 - count
> end if
> put i - 1 into k
> end if
> end repeat
> return 0
> end scanBackFast
(snip some good stuff here)
More information about the metacard
mailing list