How to speed up MC, Part 2

Mon Apr 7 18:22:00 EDT 2003

Very good stuff here, Wil. Now for a question.

How would this script

  get line(-1) of k

compare to 

  get the last line of k

and to your script below?

Constructs like "word(-1) of k" and "item -3 to -1 of k" are things I use
all the time.

I'd like your analysis, please.

Thanks!

Raymond

on 4/7/03 11:03 AM, Wil Dijkstra at w.dijkstra at scw.vu.nl wrote:

> Thank you for your positive reactions. Some of you correctly observed
> that ‘repeat 10000’ in script B is a bit faster then ‘repeat with i = 1
> to 10000’. However, I want to make scripts as comparable as possible, to
> focus on the differences that I’m going to discuss, and to prevent
> people from thinking that the speed enhancement might be due to other
> differences in the scripts. But please continue adding your comments!
> They are highly appreciated
> Some warnings. I don’t have the slightest idea how MC is programmed and
> don’t have any inside information (I can only tell that the MC guys did
> a clever job). My knowledge is based on many years programming
> experience with languages like Pascal. This means that I can be wrong in
> explaining underlying mechanisms of how MC works! So don’t take my word
> for the truth and nothing but the truth. Don’t hesitate to correct me if
> you think I’m wrong. Don’t adjust your scripts without testing if it
> really increases the speed on your type of data.
> Okay, let’s continue with part 2.
> 
> 
> Suppose you have a list of numbers, for example a list that is read from
> a datafile. Common ways to separate the different numbers are CR’s,
> comma’s or spaces.
> To get a particular number, in case of a comma-delimited list, you may
> use something like:
> 
> script A
> 
> get item 900 of myList
> 
> In case of a space-delimited list, you may use:
> 
> script B
> 
> get word 900 of myList
> 
> Which script is faster? The answer is that script A may be 50 to 100%
> faster than script B (depending on the size, that is, the number of
> characters of the numbers in your list). The same logic as discussed in
> part 1 of “How to speed up MC” applies. To get a particular item from an
> item-delimited list, the computer has to count comma’s. That means that
> for each character in the list, the computer has to decide whether it is
> a comma or not, until (in script A) 899 comma’s are counted. In script B
> the computer has to decide whether a character is a space, a tab or a
> CR: all three characters are word delimiters. If your list is
> space-delimited, change script B to (of course your list should not
> contain spaces):
> 
> script C
> 
> set itemDelimiter to space
> get word 900 of myList
> 
> and it will be (nearly) as fast as script A. The statement ‘set
> itemDelimiter to space’ hardly takes time (and of course you will put it
> outside any repeat loop).
> 
> It is easy to understand now that getting word 100 from myList will take
> much less time then getting word 900. Getting the last word takes the
> most amount of time. If you know your list has 1000 numbers, it may be
> good to know that:
> get word 1000 of myList
> will be faster than
> get last word of myList
> 
> The reason is that the computer first has to figure out how many words
> myList contains. Hence,
> get last word of myList
> is equivalent to something like:
> put the number of words of myList into n
> get word n of myList
> 
> Now suppose you have some text, simply consisting of characters, let’s
> say 5000 characters. Which script will be faster?
> 
> script D
> 
> get character 5000 of myText
> 
> script E
> 
> get last character of myText
> 
> Contrary to what you may expect now, script E is as fast, or even faster
> than script D. A variable like myList is stored somewhere in memory. The
> computer (or MC) has to know (a) the position of the start of the
> variable in memory and (b) the length (the number of characters) of that
> variable to prevent that other variables are written over existing ones.
> To get a particular character, say character 345, the computer just has
> to add 345 to the starting position of the variable, to know exactly
> where character 345 resides. Consequently, unlike items, words or lines,
> all characters in a text are accessed equally fast, and, even more
> important, very fast.
> 
> How can we use this property to speed up our scripts?
> Let’s go back to the example of getting the last word, item or line of a
> list. Suppose our list is CR delimited. To get the last line, we could use:
> 
> script F
> 
> get last line of myList
> 
> However, script G will usually be much, much faster!
> 
> script G
> 
> put the number of chars of myList into nChars
> repeat with i = nChars down to 1
> if char i of myList = cr then exit repeat
> end repeat
> get char i + 1 to nChars of myList
> 
> Similarly, to get the one but last line, we can extend our script a bit,
> as a fast alternative to ‘get line -2 of myList’. We can use this
> principle to write a function handler that is similar to the lineOffset
> function, but searches backwards, to find the last instance of number x
> in myList. The simple, but slow function, would look something like this:
> 
> script H
> 
> function scanBackSlow aLine, aList
> put the number of lines of aList into nLines
> repeat with i = nLines down to 1
> if line i of aList = aLine then return i
> end repeat
> return 0
> end scanBackSlow
> 
> Script I is much faster however:
> 
> function scanBackFast aLine, aList
> put the number of chars of aList into nChars
> put 0 into count
> put nChars into k
> repeat with i = nChars down to 1
> if char i of aList = cr then
> add 1 to count
> if char i + 1 to k of aList = aLine then
> return the number of lines of aList + 1 - count
> end if
> put i - 1 into k
> end if
> end repeat
> return 0
> end scanBackFast

(snip some good stuff here)