Re: RegEx libraries & unicode support
Re: RegEx libraries & unicode support
- Subject: Re: RegEx libraries & unicode support
- From: Allan Odgaard <email@hidden>
- Date: Fri, 14 May 2004 15:17:48 +0200
On 14. May 2004, at 8:43, Nicholas Riley wrote:
Not to belittle any of the dozen regular expression libraries recently
mentioned, but do any of them support unicode? mainly I am thinking
[...]
At least AGRegex (PCRE) and OgreKit (OniGuruma) have solid Unicode
support; I'm not sure of the individual things you mentioned, but
they're easy enough to download and try out yourself.
This is what I could find at the PCRE page:
[...] the characters that PCRE recognizes as digits, spaces,
or word characters remain the same set as before, all with
values less than 256.
Case-insensitive matching applies only to characters whose
values are less than 256
PCRE does not support the use of Unicode tables and properties
Also, dot and repeats match single code-points (i.e. a base char or a
combining mark, but never both).
I believe MOKit is also based on PCRE, so the same would apply here,
which is not what I call strong unicode support, see
http://www.unicode.org/unicode/reports/tr18/
I wasn't able to find a specification of OniGuruma's level of support.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.