• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Unicode Bad Characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode Bad Characters


  • Subject: Re: Unicode Bad Characters
  • From: Luther Fuller <email@hidden>
  • Date: Fri, 8 Aug 2008 13:08:23 -0500

On Aug 8, 2008, at 8:13 AM, Mark J. Reed wrote:

On Wed, Aug 6, 2008 at 5:18 PM, Luther Fuller <email@hidden> wrote:
Late last year, I identified Unicode characters in the range [131 - 159]

Hm. But not 128, 129, or 130?

repeat until (offset of (space & space) in subjectText) = 0
set AppleScript's text item delimiters to {space & space}
set wordList to (text items of subjectText) as list
set AppleScript's text item delimiters to {space}
set subjectText to wordList as text
end repeat -- multiple spaces replaced with single space


The offset command would find 'pairs of spaces' that text item delimiters
could not see, resulting in an unending loop.

An odd bug, indeed. I assume you worked around it by adding code to exit the loop if the length of wordList is 1...

I worked around by removing the offending characters with ...

on removeBadCharacters(theText)
	set charList to (characters of theText) as list
	repeat with i from 1 to (count items of charList)
		ASCII number ((item i of charList) as text)
		if (130 < the result) and (the result < 160) then
			set item i of charList to "" as Unicode text
		end if
	end repeat
	set AppleScript's text item delimiters to {} -- very necessary
	return charList as Unicode text
end removeBadCharacters --------------------------------------

I'm wondering if anyone knows if this fix is documented anywhere?

I haven't looked, but I wouldn't be surprised to find no specific fix for this problem; it might be something that fell out of the general improvements to Unicode support in AS 2.0, which I gather involved rather a lot of rewriting.

When I said "Unicode characters in the range [131 - 159]", I mean characters whose ascii number is in the range [131 - 159]. The original problem arose in OS X 10.4.10 and there was no 'character id n' command.


I've got lots of notes in the form of email messages on this from last year, and I've used them to try to recreate the problem in 10.5.4 and 10.411. It's just not there! Unicode characters with 'id' in the range [131 - 159] are well behaved. They are displayed as spaces or as a "box around A", depending on where the text is displayed. They don't cause any misbehavior in my scripts. The code above works perfectly, correctly removing double-spaces.

I have to conclude, documented or not, that a bug in 10.4.10 was corrected in 10.4.11.

I simply removed the 'removeBadCharacters' handler. Everything works! End of problem.

_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden
References: 
 >Unicode Bad Characters (From: Luther Fuller <email@hidden>)
 >Re: Unicode Bad Characters (From: Luther Fuller <email@hidden>)
 >Re: Unicode Bad Characters (From: "Mark J. Reed" <email@hidden>)

  • Prev by Date: Rép: old AppleWorks script failing under 10.5
  • Next by Date: Quit App Over Time
  • Previous by thread: Re: Unicode Bad Characters
  • Next by thread: Re: Unicode Bad Characters
  • Index(es):
    • Date
    • Thread