[ANN] RegexKitLite
[ANN] RegexKitLite
- Subject: [ANN] RegexKitLite
- From: John Engelhart <email@hidden>
- Date: Sun, 23 Mar 2008 13:16:47 -0400
I've just released what I'm calling 'RegexKitLite'. It targets a
different group of people than the full fledged RegexKit (http://regexkit.sourceforge.net/
).
I put RegexKitLite together after helping some users with some
RegexKit problems, specifically word breaking Thai. After putting
together a quick and dirty wrapper around the ICU regex engine, I
realized I had most of the pieces for a light weight, no frills
objective-c regular expression system. All I needed to do was write
up some docs... which always seems to take the longest amount of time.
To give you an idea of how 'lightweight' the whole package is, the
tarball distribution is a scant 27004 (sic) bytes. Almost all of that
is the documentation, which is packaged as a single HTML file that
covers everything. The documentation file weighs in at 135544 (sic)
bytes.
You can view the documentation online at: http://regexkit.sourceforge.net/RegexKitLite/index.html
You can download the distribution via: http://downloads.sourceforge.net/regexkit/RegexKitLite-1.0.tar.bz2
Highlights are:
Distributed under the terms of the BSD license.
Links to /usr/lib/libicucore so no external regex library is required.
Consists of only two files: the RegexKitLite.h header and the
RegexKitLite.m source file.
Very small. The header is 4498 bytes and the source is 19625 bytes.
That's it!
Multithreading safe.
Small, pseudo least recently used compiled regex cache.
Since ICU requires the string it operates on to be in UTF-16 format,
it tries to first get direct access to the NSStrings UTF-16 buffer.
If it can't, it goes through the full conversion process. It caches
the the conversion result of the last string that required conversion
so subsequent matches are much faster. The caching is actually a
little more complicated than this, though, see the documentation for
more details.
In a nutshell, it provides some glue between Cocoa and the ICU regex
system. It consists of a handful of primitives that are added as a
category extension to NSString. The core methods are:
+ (NSInteger)captureCountForRegex:(NSString *)regexString options:
(RKLRegexOptions)options error:(NSError **)error;
- (BOOL)isMatchedByRegex:(NSString *)regexString options:
(RKLRegexOptions)options inRange:(NSRange)range error:(NSError **)error;
- (NSRange)rangeOfRegex:(NSString *)regexString options:
(RKLRegexOptions)options inRange:(NSRange)range capture:
(NSInteger)capture error:(NSError **)error;
- (NSString *)stringByMatching:(NSString *)regexString options:
(RKLRegexOptions)options inRange:(NSRange)range capture:
(NSInteger)capture error:(NSError **)error;
There's also a handful of obvious 'convenience' methods that are just
wrappers for the above. In reality, everything is done with the
rangeOfRegex: method except for the regex capture count. The
documentation builds a match enumerator as an example.
As you can see, it's pretty minimal. It's ideal for people who need
to get a bit of regex work done and don't want to have to add a whole
lot of cruft to do it. No new classes are added either, it's all just
messages to NSString objects, and you supply the regular expressions
as ordinary NSStrings:
NSString *site = [@"http://www.something.com/link/to/page.html"
stringByMatching:@"http://(.*?)/(.*)" capture:1];
// site == @"www.something.com"
Just that easy.
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden