• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Re[2]: gcc and extended character source code
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re[2]: gcc and extended character source code


  • Subject: Re: Re[2]: gcc and extended character source code
  • From: Alastair Houghton <email@hidden>
  • Date: Mon, 11 Feb 2008 15:48:28 +0000

On 11 Feb 2008, at 09:47, Peter Mulholland wrote:

Monday, February 11, 2008, 6:48:25 AM, you wrote:

Variable names that don't use ASCII are illegal -- doesn't matter what
encoding the source code file has, the compiler won't parse the A-
umlaut.

That's true for C89 (sort of... they don't talk about ASCII, but rather about the basic source character set; source code might be expressed in EBCDIC or some other exotic encoding), but not C99 (see below)


If you don't want to change it, have the original author do so,
because it's not standard.

Lame. Typical of *nix to still have its head in the sand.

This isn't anything to do with UNIX.

The ISO C standard (C99 in this case), which is what says what is valid C and what is not, says that

6.4.2.1.3 ...An implementation may allow multibyte characters that are not
part of the basic source character set to appear in identifiers;
which characters and their correspondence to universal character
names is implementation-defined


For "implementation-defined", read "not guaranteed to be portable".

This whole area is rather more complicated than your off-the-cuff dismissal implies. There is no portable way to specify the character encoding of a C source file. Using identifiers with characters outside the basic set requires support from a variety of tools besides just the compiler (e.g. the assembler, the linker, the dynamic linker, debuggers, and potentially other tools besides) and in the case of GCC, which is often sat on top of the system assembler and linker and which has no control over which dynamic linker or debugger you might be using, it's potentially a tricky problem. Moreover, a mistake in this area could break binary compatibility, which is a very disruptive thing to do.

There are also some nasty gotchas, for instance the fact that the mapping from some source character sets to Unicode might not be what people expect; the fact that some scripts include characters that look just like those in other scripts (e.g. people tend to confuse the German esset with the Greek beta character, and some of the Cyrillic and Greek characters look exactly like their Latin counterparts).

Microsoft's compiler may presently allow the use of characters outside the basic set, but because there is no standard behaviour here, doing so is not likely to be portable. Furthermore, it looks to me like the GCC team has been working on this problem for some time, so to say that they have their heads in the sand is rather unfair.

The original author doesn't care - he's only concerned about the code
compiling on Windows, as far as he's concerned it's my job to port it.

Then it's your job to either fix his code or to get him to fix it himself, right? The best fix is to do what Christian Demmer suggested (i.e. replace umlauts and essets with their two-character equivalents). Your German programmer shouldn't find this too objectionable, since it doesn't change the meanings of any of the things that he wrote.


I wouldn't bother messing around with UCNs right now, since I think they're a work in progress and anyway they hamper readability. I notice, for instance, that newer FSF GCC versions require the - fextended-identifiers switch if you want to use them in identifiers. Apple's GCC 4.0.1 doesn't seem to (it seems to accept them with just -- std=c99).

Kind regards,

Alastair.

--
http://alastairs-place.net

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >gcc and extended character source code (From: Peter Mulholland <email@hidden>)
 >Re: gcc and extended character source code (From: David Dunham <email@hidden>)
 >Re[2]: gcc and extended character source code (From: Peter Mulholland <email@hidden>)

  • Prev by Date: What's proper way to exit IB?
  • Next by Date: Re: WebObjects Template in Xcode 3.0
  • Previous by thread: Re: Re[2]: gcc and extended character source code
  • Next by thread: Re: gcc and extended character source code
  • Index(es):
    • Date
    • Thread