• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: splitting CJK text into "words"
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: splitting CJK text into "words"


  • Subject: Re: splitting CJK text into "words"
  • From: Markus Spoettl <email@hidden>
  • Date: Thu, 27 Sep 2012 08:33:16 +0200

On 9/26/12 11:12 PM, Martin Wierschin wrote:
I'm trying to split CJK text using the kind of word boundaries detected by
-[NSAttributedString doubleClickAtIndex:]. That method does the job
correctly, but only if the system preferences have the Word Break mode set to
Japanese. I need to ensure this kind of word splitting independent of the
user's system preferences.

It was my understanding that I could use CFStringTokenizer for this task, but
it doesn't seem to be working. Test code that produces improper results:

I have no idea if the system frameworks expose functions for this - since it
knows about it, it could/should. If you end up needing to do it on your own:

There are the Kinsoku rules with are wrap rules for Japanese. Semantially similar rules exist for Chinese and Korean. A simple implementation it not too difficult, see here for a quick overview:

http://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages

Regards
Markus
--
__________________________________________
Markus Spoettl
_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Follow-Ups:
    • Re: splitting CJK text into "words"
      • From: Martin Wierschin <email@hidden>
References: 
 >splitting CJK text into "words" (From: Martin Wierschin <email@hidden>)

  • Prev by Date: Re: NSWindowController and nib in framework
  • Next by Date: NSPredicate / NSArray addObserver:forKeyPath:options:context: exception
  • Previous by thread: Re: splitting CJK text into "words"
  • Next by thread: Re: splitting CJK text into "words"
  • Index(es):
    • Date
    • Thread