• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Chinese Characters
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Chinese Characters


  • Subject: Re: Chinese Characters
  • From: "Mark J. Reed" <email@hidden>
  • Date: Wed, 5 Aug 2009 16:05:13 -0400

On Wed, Aug 5, 2009 at 3:18 PM, Simon
Topliss<email@hidden> wrote:
> Hello all,
>
> I have some text and want to know if a character is a Chinese character.
>
> Using AppleScript, I know that the id of the character "用" is 29992.

Converting decimal to hex and vice versa is pretty straightforward; I
use dc(1), but there are other shell tools that would do the job as
well, or you could code up the conversion manually using AppleScript
to do the math.  Here are some handlers that use dc:

on fromHex(someValue)
    do shell script " dc <<<'  16i " & someValue & "p' "
end fromHex

on toHex(someValue)
    do shell script " dc <<<' " & someValue & " 16op' "
end toHex

As far as actually checking the ranges, something like this will work
- it's not speedy, though initially loading the block data is the
slowest part (so e.g. making the findBlock use a binary search
wouldn't be a big win):

property UnicodeBlocks: {}

on fromHex(someValue)
    do shell script " dc <<<'  16i " & someValue & "p' "
end fromHex

on toHex(someValue)
    do shell script " dc <<<' " & someValue & " 16op' "
end toHex

on findBlock(someCharacter)
    if (count UnicodeBlocks) is 0 then
        repeat with aLine in (paragraphs of (read POSIX file
"/System/Library/Perl/5.8.8/unicore/Blocks.txt"))
            if length of aLine is not 0 and text 1 of aLine is not "#" then
                set text item delimiters to "; "
                set blockRange to text item 1 of aLine
                set blockDescription to text item 2 of aLine
                set text item delimiters to ".."
                set blockStart to fromHex(text item 1 of blockRange)
                set blockEnd to fromHex(text item 2 of blockRange)
                set end of UnicodeBlocks to {blockStart, blockEnd,
blockDescription}
            end
        end repeat
    end if
    set someCharacterId to id of someCharacter
    repeat with aBlock in UnicodeBlocks
        if someCharacterId >= item 1 of aBlock and someCharacterId <=
item 2 of aBlock
            return item 3 of aBlock
        end if
    end repeat
end findBlock

findBlock("用")
==> CJK Unified Ideographs

==>


--
Mark J. Reed <email@hidden>
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

References: 
 >Chinese Characters (From: Simon Topliss <email@hidden>)

  • Prev by Date: Re: Loading an AS dict that defines 'point' also causes 'points' to be defined??
  • Next by Date: Re: Chinese Characters
  • Previous by thread: Chinese Characters
  • Next by thread: Re: Chinese Characters
  • Index(es):
    • Date
    • Thread