Re: Unicode Bad Characters
Re: Unicode Bad Characters
- Subject: Re: Unicode Bad Characters
- From: Emile Schwarz <email@hidden>
- Date: Sun, 10 Aug 2008 15:16:40 +0200
Hi all,
REMEMBER: ASCII range is 0 - 127.
When I said "Unicode characters in the range [131 - 159]", I mean
characters whose ascii number is in the range [131 - 159].
No ASCII number exists outside of the 0 - 127 area.
FWIW,
Emile
PS: I understand what you meant, but there is too many people who
believes about ASCII defining characters out of bounds (0 - 127). Some
people also talks about "high ASCII"; go figure!
email@hidden wrote:
Date: Fri, 8 Aug 2008 13:08:23 -0500
From: Luther Fuller <email@hidden>
Subject: Re: Unicode Bad Characters
To: Applescript Users <email@hidden>
Message-ID: <email@hidden>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
On Aug 8, 2008, at 8:13 AM, Mark J. Reed wrote:
On Wed, Aug 6, 2008 at 5:18 PM, Luther Fuller
<email@hidden> wrote:
Late last year, I identified Unicode characters in the range [131 -
159]
Hm. But not 128, 129, or 130?
repeat until (offset of (space & space) in subjectText) = 0
set AppleScript's text item delimiters to {space &
space}
set wordList to (text items of subjectText) as list
set AppleScript's text item delimiters to {space}
set subjectText to wordList as text
end repeat -- multiple spaces replaced with single space
The offset command would find 'pairs of spaces' that text item
delimiters
could not see, resulting in an unending loop.
An odd bug, indeed. I assume you worked around it by adding code to
exit the loop if the length of wordList is 1...
I worked around by removing the offending characters with ...
on removeBadCharacters(theText)
set charList to (characters of theText) as list
repeat with i from 1 to (count items of charList)
ASCII number ((item i of charList) as text)
if (130 < the result) and (the result < 160) then
set item i of charList to "" as Unicode text
end if
end repeat
set AppleScript's text item delimiters to {} -- very necessary
return charList as Unicode text
end removeBadCharacters --------------------------------------
I'm wondering if anyone knows if this fix is documented anywhere?
I haven't looked, but I wouldn't be surprised to find no specific fix
for this problem; it might be something that fell out of the general
improvements to Unicode support in AS 2.0, which I gather involved
rather a lot of rewriting.
When I said "Unicode characters in the range [131 - 159]", I mean
characters whose ascii number is in the range [131 - 159]. The
original problem arose in OS X 10.4.10 and there was no 'character id
n' command.
I've got lots of notes in the form of email messages on this from last
year, and I've used them to try to recreate the problem in 10.5.4 and
10.411. It's just not there! Unicode characters with 'id' in the range
[131 - 159] are well behaved. They are displayed as spaces or as a
"box around A", depending on where the text is displayed. They don't
cause any misbehavior in my scripts. The code above works perfectly,
correctly removing double-spaces.
I have to conclude, documented or not, that a bug in 10.4.10 was
corrected in 10.4.11.
I simply removed the 'removeBadCharacters' handler. Everything works!
End of problem.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden