• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: splitting CJK text into "words"
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: splitting CJK text into "words"


  • Subject: Re: splitting CJK text into "words"
  • From: Martin Wierschin <email@hidden>
  • Date: Thu, 27 Sep 2012 18:33:40 -0700

> There are the Kinsoku rules with are wrap rules for Japanese. Semantially similar rules exist for Chinese and Korean. A simple implementation it not too difficult, see here for a quick overview:
>
> http://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages

Thanks for the link Markus, but unless I'm missing something, that just goes over line breaking/wrapping, not detecting word boundaries.

So it looks like the Cocoa/CoreFoundation frameworks don't have what's needed for this, but after some digging it seems ICU does:

	http://userguide.icu-project.org/boundaryanalysis

I can just drop down to using the libicu C functions (eg: ubrk_open). Using "ja_JP" there seems to do the trick.

Best,
~Martin


_______________________________________________

Cocoa-dev mailing list (email@hidden)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

References: 
 >splitting CJK text into "words" (From: Martin Wierschin <email@hidden>)
 >Re: splitting CJK text into "words" (From: Markus Spoettl <email@hidden>)

  • Prev by Date: interpretKeyEvents: and insertText: under 10.8
  • Next by Date: Proper KVO with NSTreeController + NSOutlineView
  • Previous by thread: Re: splitting CJK text into "words"
  • Next by thread: NSPredicate / NSArray addObserver:forKeyPath:options:context: exception
  • Index(es):
    • Date
    • Thread