Re: Invisible Char. - TextCommand
Re: Invisible Char. - TextCommand
- Subject: Re: Invisible Char. - TextCommand
- From: has <email@hidden>
- Date: Fri, 8 Jul 2005 16:33:26 +0100
Michael Ghilissen wrote:
>I have a script that pulls a xml page, which I am parsing using regex in TextCommands to extract the text part of it:
>
>set theResult to find text "<title>([^<]*)</title>([^<]*)<link>([^<]*)</link>([^<]*)<description>([^<]*)</description>" in theText using {"\\n\\1\\n\\5\\n\\n"} regexpflag {"EXTENDED"} with regexp, all occurrences and string result
That's Satimage's 'find text' command. Here's the equivalent in TextCommands:
tell application "TextCommands"
set theResult to search theText for "<title>([^<]*)</title>([^<]*)<link>([^<]*)</link>([^<]*)<description>([^<]*)</description>" expanding to "\\n\\1\\n\\5\\n\\n" with regex
end tell
>The returned text is 'interlaced' with non-visible characters between each letter, [...] In TextWrangler, this character appears to be \x00
Sounds like you've got UTF16-encoded data there. e.g. Maybe you read a UTF16-encoded file as string instead of Unicode text. You've got to watch with text encodings, you can end up in all sorts of knots if you're not careful.
Reading XML files yourself is a bit tricky since the encoding info is buried in the XML itself. If you know for sure it'll always be UTF16-encoded, you can cheat and just use 'read f as Unicode text'. If you can't be certain, you'll have to read 'em in as raw data (typically as string, since that's easiest to work with), dig out the encoding type (e.g. with a regex), then convert the rest of the document's data from that encoding (e.g. using TextCommands' 'convert to unicode' command).
Alternatively, you could use a proper XML parser to handle everything, e.g. Satimage's XMLLib. That'll do all the heavy-lifting and dump everything into a big ol' object model which you can then search for the values you want. It'll take a bit more coding to do that, but will be much more robust.
HTH
has
--
http://freespace.virgin.net/hamish.sanderson/
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden