Re: Regular Expressions?
Re: Regular Expressions?
- Subject: Re: Regular Expressions?
- From: Jason Stephenson <email@hidden>
- Date: Fri, 06 Jun 2008 11:29:20 -0400
glenn andreas wrote:
[wrote about how using regex is not a good idea, particularly with
NSString and unicode. Pretty much the same things that Jens wrote earlier.]
Yes, that's all very true. Regex is a poor choice if you're working on
non-ASCII text. I'm generally not doing so, but just yesterday did have
the unpleasant experience of regexing some UTF16 files. (See another
email by me in this thread.)
You could kludge it to work using some options that are available on Mac
OS X and FreeBSD regex libraries. (Don't know if it is available
elsewhere, but likely is.) Essentially, you tell regcomp to ignore nuls
and then you have a lot of fun coding REs that match your UTF16 strings
taking into account endianness and all. I've pondered how it would work
and am confident that it would work, but also concede that it would be a
very ugly hack and be prone to breakage.
One other possible solution is to use the JavaScriptCore and make a
JSStringRef (which works with unichars like NSString), and use
JavaScript's regex support - that way the results will at least have
consistent indices, work well with non-ASCII characters, etc...
That is an excellent option if you're using JavaScriptCore already, or
maybe even if you're not. There's another thing to look into. Anyone for
a unicode text editor that is scriptable in JavaScript? (Hmm, maybe the
world really doesn't need another text editor.) :P
For now, I'm going to look into ICU. I seem to have a couple of copies
of it on my computer.
Cheers,
Jason
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden