Re: parsing a string into words
Re: parsing a string into words
- Subject: Re: parsing a string into words
- From: Jeffrey Oleander <email@hidden>
- Date: Sun, 26 Apr 2009 10:50:28 -0700 (PDT)
At Sun, 2009-04-26, 09:01, Alastair Houghton <email@hidden> wrote:
>> At 2009 Apr 26, 04:33, Jeffrey Oleander wrote:
>> NSArray * tokens = [string
>> componentsSeparatedByCharactersInSet:
>> whitespaceCharacterSet];
> No, no, no. If you read Gerriet's original post,
> you would have noticed that he even explained
> that what you just said won't work, because not
> all languages use whitespace to separate words
> like English does.
>
> You probably want to be using CFStringTokenizer(),
> at least on OS X. For cross-platform code,
> ICU is probably your best bet.
Thanks for that info and the pointer to the ICU.
http://userguide.icu-project.org/boundaryanalysis
BreakIterator
Character, Word, Line or Sentence
"you provide an appropriate CharacterIterator"
UChar *
I was half expecting that response because I was
aware that "not all languages use white space to
separate words", but hoping for some magic in
NSString.
Unfortunately, CFStringTokenizer is not available
in 10.3.9, and no, I do not have a chest of silver
or gold behind my pillow to run around buying
newer hardware and software, let alone doing so
every 2 years; we're in re-boot-strapping mode
in the land of the globalized Bush-Clinton-Bush-Obama
depression.
This makes ICU suspect:
"Copyright (c) 2000 - 2008 IBM and Others"
Is Apple one of the "Others"? The "using ICU"
list is a mixed bag of reputable firms and
unethical rogues, and I don't see any
additional info on who is behind "ICU".
As much as I enjoy languages (I've taken a few
in college, and 10 years ago I was on a couple
Unicode e-mailing lists mainly to read the
interesting discussion about the differences),
for now I'll stick with the US+Euro+Japanese+
Latin+Hebrew solution that I have, that uses
Objective-C and doesn't drag me into the
complications of Objective-C++ and transferring
data around to different stores based on
different programming language and framework
conventions, that I can immediately use,
can trust, and seems amenable to reasonable
later modification to handle the remote out-liers.
Onward.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden