itemDelimiter
Jim MacConnell
jmac at consensustech.com
Thu Feb 12 16:02:38 EST 2004
Tom,
--- sure you have multiple answers by now but I'm just now having a minute
--- Also no sleep last night so long winded.. Will refrain in future but
--- I've done this already so here goes... Btw Untested
While I hesitate to give scripting advice as I am just relearning, and prone
to really dumb mistakes ... I'm working through a similar thing. The
routine (corrected) that Brian has is basically the way I do it. There are a
couple of things to consider when you are looking for repeating types of
info like all the paragraphs <p> xxxx </p>
= = = = = = =
The first is to carefully keep track of where you are looking by resetting
the startOffset after you have found an instance of what you are looking for
so you don't "find" the same one again again...
-- Make sure we start at the beginning when looking for a particular tag
-- and that our end starts past our start
Put 0 into startOffset
Put 1 into endOffset
-- Substitute whatever tags you are looking for for the <title> and </title>
-- For example "<p>" and "</p"
put "<title>" into startTag
put "</title>" into endTag
-- Add a loop to keep going...
-- Here (endOffset = 0) flags having found the last tag
Repeat while (endOffset is not 0)
-- NOTE: Added startOffset into the offset(..)
put offset(startTag, theHTML, startOffset) into startOffset
if (startOffset > 0) then
put offset(endTag, theHTML, startOffset + length(startTag) - 1)
into endOffset
if (endOffset > 0) then
put char (startOffset + length(startTag) - 1) to (startOffset +
endOffset - 1) of theHTML into theText
end if
-- Then reset the starting point based on how far you've gone
-- and starting our search from that point on
put startOffset + endOffset into startOffset
end if
-- Here do what you want with theText since you
-- are putting this in a loop. For example:
put "*****" & return & theText after theAnswer
end repeat
= = = = = = =
Another approach is to be less careful about the startOffset and just blow
away the text you've already been through... This means your "offset" lines
can be a little simpler but you need a separate place to store the text..
Not sure if this is an advantage but it makes looking at the HTML text
easier cuz there's less of it as you go
-- Make sure we make a copy of our HTML text
Put theHTML into aSafeCopyoftheHTML
-- Substitute whatever tags you are looking for for the <title> and </title>
-- For example "<p>" and "</p"
put "<title>" into startTag
put "</title>" into endTag
-- Make sure your endTag exists and set up loop
Put Offset(endTag, theHTML ) into endOffset
-- Add a loop to keep going... Using endOffset
-- as a flag having found the last tag
Repeat while (endOffset is not 0)
put offset(startTag, theHTML ) into startOffset
put offset(endTag, theHTML , startOffset + length(startTag) - 1) into
endOffset
if (endOffset > 0) then
put char (startOffset + length(startTag) - 1) to (startOffset +
endOffset - 1) of theHTML into theText
end if
-- Now clean up theHTML by getting rid of what you've used
delete char 1 to endOffset + length(endTag) of theHTML
-- Here do what you want with theText since you
-- are putting this in a loop. For example:
put "*****" & return & theText after theAnswer
end repeat
= = = = =
Finally... Seems like it could/should be broken into a separate function
usage: put grabText(theHTML,"<p>,"</p>","All") into theParagraphs
Function grabText theHTML, startTag, endTag, oneOrAll
Put 0 into startOffset
Put 1 into endOffset
Put 0 into numFound
Put empty into theFoundtext
Repeat while (endOffset is not 0)
put offset(startTag, theHTML, startOffset) into startOffset
if (startOffset > 0) then
put offset(endTag, theHTML, startOffset + length(startTag) - 1)
into endOffset
if (endOffset > 0) then
put char (startOffset + length(startTag) - 1) to
(startOffset + endOffset - 1) of theHTML into theText
put startOffset + endOffset into startOffset
end if
end if
-- Here is the stuff added for the return info
-- in this case using items to store the results
If numFound > 0 then put theText into item numFound of theFoundText
end repeat
return theFoundText
End grabText
More information about the use-livecode
mailing list