Re: Using Flex/Lex in a Cocoa project
Re: Using Flex/Lex in a Cocoa project
- Subject: Re: Using Flex/Lex in a Cocoa project
- From: "Michael Ash" <email@hidden>
- Date: Sat, 16 Aug 2008 10:43:04 -0400
On Fri, Aug 15, 2008 at 10:53 PM, John Joyce
<email@hidden> wrote:
> Right now, I'm toying with using Flex/Lex in a Cocoa project.
> Unfortunately, I don't see a reliable or easy way to handle NSStrings
> correctly all the time with Flex.
> Does anybody have any suggestions for such text handling and reliable
> unicode aware regexes?
> I'm seriously not interested in implementing such details in C with Flex.
> Flex is fast and cool for that, but if it's going to be stupidly difficult
> to use reliably with other languages on a mac, it's not a good idea for me.
Depending on exactly what you need, unicode awareness can be fairly
straightforward.
Commonly, unicode in regexes is only needed to pass through
undifferentiated blobs of text, with ASCII delimiters. For example,
imagine parsing a CSV file which potentially has unicode text inside
the quotes. For this case, you can convert the file to UTF-8, and then
constructs like . will accept them. All non-ASCII characters in UTF-8
are represented as bytes 128-255, so if you just pass those through
then you'll be fine. But be aware of some potential problem areas:
- Each non-ASCII character will be more than one byte, and flex will
think of it as more than one character. Write your regexes
accordingly. In particular, avoid length limits on runs of arbitrary
characters, and avoid using non-ASCII characters directly in your
regex.
- It's very difficult to split UTF-8 strings correctly. If you
encounter a run of non-ASCII characters, ensure that you follow that
run through the end, until you get back to ASCII. Don't have a regex
that stops in the middle of it and then expects your code to be able
to do something useful with it.
- If you need to do something with non-ASCII characters besides read
them in one side and write them out the other, for example doing
something special with all accented characters, then Flex is probably
not the right answer.
Besides this it ought to be pretty straightforward. Since Flex just
passes your code straight through to the compiler, you can write
Objective-C in the actions (as long as you compile the result as
Objective-C, of course!), convert the text from UTF-8 back to an
NSString, and take things from there.
Mike
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden