Re: Regular Expressions?
Re: Regular Expressions?
- Subject: Re: Regular Expressions?
- From: James Montgomerie <email@hidden>
- Date: Fri, 6 Jun 2008 11:19:50 -0700
On 6 Jun 2008, at 08:03, Jens Alfke wrote:
On 6 Jun '08, at 3:23 AM, Jason Stephenson wrote:
As a long time UNIX programmer, I'll suggest looking into the
regexp library that already comes with OS X.
man regcomp on the command line to find out how to use.
It doesn't look as though this library is Unicode-aware. The strings
it takes are C string (char*) with no indication of what encoding is
used, and Unicode or UTF-8 aren't mentioned in the man page. From
that, I'd guess that this library only works with single-byte
encodings (like ISO-Latin-1 or CP-1252, not UTF-8 or the various non-
Roman encodings) and that it will treat all non-ascii characters as
being not spaces and not letters.
In short, I think it only works correctly with plain ascii. IMHO
that's much too limited for most purposes nowadays. Even if you
don't touch user-visible text with it, it's still pretty common to
find non-ascii characters in HTML, XML, even source code.
Of the regex libraries mentioned so far, I recommend RegexKitLite.
It's based on ICU, which is Unicode-savvy, already built into the
OS, and used by lots of Apple apps.
You are correct, but in my casual usage, feeding UTF-8 to the POSIX
regex routines works just fine if you take into account that the
defined character classes are ASCII-aware only, and are aware that the
results you get back are byte offsets, not character offsets - i.e.
don't convert them to NSRanges and expect them to be correct against
the NSString you got the UTF-8 from (similar caveats apply to match
counts etc. - i.e. ".{3}" will happily match two characters if they
take up three bytes).
I wouldn't want to present the regexes to the user, of course, but for
pre-defined regexes in code, it's okay (not great with those caveats
obviously, but alright).
My main complaint about it is that it's /extremely slow/ compared to
most modern regex libraries, but for casual usage, you at least don't
have to link any extra libraries to use it.
I do think that good regex additions to NSString, or an NSRegex class,
are highly overdue in Cocoa.
Jamie.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden