Re: splitting CJK text into "words"
Re: splitting CJK text into "words"
- Subject: Re: splitting CJK text into "words"
- From: Markus Spoettl <email@hidden>
- Date: Thu, 27 Sep 2012 08:33:16 +0200
On 9/26/12 11:12 PM, Martin Wierschin wrote:
I'm trying to split CJK text using the kind of word boundaries detected by
-[NSAttributedString doubleClickAtIndex:]. That method does the job
correctly, but only if the system preferences have the Word Break mode set to
Japanese. I need to ensure this kind of word splitting independent of the
user's system preferences.
It was my understanding that I could use CFStringTokenizer for this task, but
it doesn't seem to be working. Test code that produces improper results:
I have no idea if the system frameworks expose functions for this - since it
knows about it, it could/should. If you end up needing to do it on your own:
There are the Kinsoku rules with are wrap rules for Japanese. Semantially
similar rules exist for Chinese and Korean. A simple implementation it not too
difficult, see here for a quick overview:
http://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages
Regards
Markus
--
__________________________________________
Markus Spoettl
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden