• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Parsing HTML
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Parsing HTML


  • Subject: Re: Parsing HTML
  • From: Gary Lists <email@hidden>
  • Date: Sat, 04 Jan 2003 15:37:16 -0500

On or about 1/4/03 2:42 PM, Randal L. Schwartz wrote:

> <tag1
> attribute1="fo'o>b'ar"
> attribute2='lef"t>r"ight'
> attribute3=unquoted
>>
> some text
> </tag1>
>
> so you can't just scan to ">": you need to know if you're inside a
> quoted attribute value or not. And notice that each kind of quotes
> can contain the other kind of quotes. Yeah, messy problem, eh?

Only messy in theory...mostly.

Because...
attribute3 is not valid HTML; attribute2 is not valid HTML; attribute1
should use the greater than entity to be valid.

If anyone really wrote HTML like the sample you offer above, now _that'd_ be
messy. ;)

(Your sample won't even work in all browsers, so it's likely it would never
be written as such.) But, throw in some javascript escaping or some XML <br
/> tags and WAM! ... the Parse HTML routine will probably break.

And you are right, of course, about the general principle, but swapping
on/off a boolean for being inside a tag isn't all that difficult (this is
what I do in BBEdit to fix my empty ALT= attributes. But, there is grep
there, so avoiding such mis-quoted quotes is easier.)

The sub-routine that Sal referenced is a pretty good one, especially for
those whose needs are simpler. Anyone using _only_ this sub-routine in a
real production workflow would be foolhardy, but if you want to quickly get
to the next <IMG...>, then this will do it for you.

Thanks for re-reminding everyone of the useful sub-routines at Apple, Sal.
Keep 'em coming...and don't forget us OS 9ers. ;)


--
Gary

Incoming replies are auto-deleted.
Please post directly to the list or newsgroup.

Really need direct? Rot me at:
email@hidden
Lbhe fhowrpg zhfg ortva "abgwhax:" (ab dhbgrf)
Avpr gb zrrg lbh! Qba'g fcnz zr.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

  • Prev by Date: Re: GUI scripting
  • Next by Date: Re: Parsing HTML
  • Previous by thread: Parsing HTML
  • Next by thread: Re: Parsing HTML
  • Index(es):
    • Date
    • Thread