Re: Using Flex/Lex in a Cocoa project
Re: Using Flex/Lex in a Cocoa project
- Subject: Re: Using Flex/Lex in a Cocoa project
- From: "mm w" <email@hidden>
- Date: Mon, 18 Aug 2008 13:40:33 -0700
to avoid the splitting problem
(c < 128) ? "%c" : "\\ux", c);
On Sat, Aug 16, 2008 at 7:43 AM, Michael Ash <email@hidden> wrote:
> On Fri, Aug 15, 2008 at 10:53 PM, John Joyce
> <email@hidden> wrote:
>> Right now, I'm toying with using Flex/Lex in a Cocoa project.
>> Unfortunately, I don't see a reliable or easy way to handle NSStrings
>> correctly all the time with Flex.
>> Does anybody have any suggestions for such text handling and reliable
>> unicode aware regexes?
>> I'm seriously not interested in implementing such details in C with Flex.
>> Flex is fast and cool for that, but if it's going to be stupidly difficult
>> to use reliably with other languages on a mac, it's not a good idea for me.
>
> Depending on exactly what you need, unicode awareness can be fairly
> straightforward.
>
> Commonly, unicode in regexes is only needed to pass through
> undifferentiated blobs of text, with ASCII delimiters. For example,
> imagine parsing a CSV file which potentially has unicode text inside
> the quotes. For this case, you can convert the file to UTF-8, and then
> constructs like . will accept them. All non-ASCII characters in UTF-8
> are represented as bytes 128-255, so if you just pass those through
> then you'll be fine. But be aware of some potential problem areas:
>
> - Each non-ASCII character will be more than one byte, and flex will
> think of it as more than one character. Write your regexes
> accordingly. In particular, avoid length limits on runs of arbitrary
> characters, and avoid using non-ASCII characters directly in your
> regex.
>
> - It's very difficult to split UTF-8 strings correctly. If you
> encounter a run of non-ASCII characters, ensure that you follow that
> run through the end, until you get back to ASCII. Don't have a regex
> that stops in the middle of it and then expects your code to be able
> to do something useful with it.
>
> - If you need to do something with non-ASCII characters besides read
> them in one side and write them out the other, for example doing
> something special with all accented characters, then Flex is probably
> not the right answer.
>
> Besides this it ought to be pretty straightforward. Since Flex just
> passes your code straight through to the compiler, you can write
> Objective-C in the actions (as long as you compile the result as
> Objective-C, of course!), convert the text from UTF-8 back to an
> NSString, and take things from there.
>
> Mike
> _______________________________________________
>
> Cocoa-dev mailing list (email@hidden)
>
> Please do not post admin requests or moderator comments to the list.
> Contact the moderators at cocoa-dev-admins(at)lists.apple.com
>
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
>
--
-mmw
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden