Lists

Open Menu Close Menu

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: ASCII vs. MacRoman (was Re: Standard Additions 'read' command - basic questions)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ASCII vs. MacRoman (was Re: Standard Additions 'read' command - basic questions)

Subject: Re: ASCII vs. MacRoman (was Re: Standard Additions 'read' command - basic questions)
From: Axel Luttgens <email@hidden>
Date: Tue, 20 Jan 2004 13:08:33 +0100

Chris Page wrote:

On Jan 19, 2004, at 12:36, Christopher Nebel wrote:

The problem is that your delimiter string is being fetched as Unicode, but then it looks for the Unicode code point in the data, which is *not* 254 (or 255, or whatever), since your primary encoding is probably MacRoman, which doesn't agree with Unicode at all above 127. (127 and below -- that is, ASCII -- is fine.)

I realize this is a separate topic,

Well, perhaps not that much. So, I'll jump through the door you opened ;-)

but: In fact, it's misleading for "ASCII character" to work with values above 127, which are not ASCII. It's too bad people have played fast-and-loose with the term ASCII lo these many decades, including nearly every mainstream programming language. Really, "ASCII character" should produce an error if the numeric value is not valid ASCII. Either it should have been named "MacRoman character" or some other mechanism should have been created for handling other character encodings.

In fact, it's not too late. It might be useful to rename it to something like "MacRoman character" and provide "ASCII character" as a synonym (so "ASCII character" could still be used, but it would decompile to "MacRoman character").

You're absolutely right by writing that "ASCII character" (as well as its twin "ASCII number") has been a confusing terminology choice.
But I'm not sure something like "MacRoman character" would be a better one.

After all, the whole matter had (note the past form) barely to do with encodings.
The real semantics of "ASCII character" was: take an 8-bit integer value and treat it as a character (ie. an element of a string).
If you wanted to, say, construct strings and display them through dialogs then yes, it was a good idea to be compliant with the Mac-Roman encoding (or better, the encoding resulting from your language settings).
But if you wanted to read/create files using an latin-1 or a windows-xyz encoding, you could do so too.
Or even handle multi-bytes chunks (on a byte-per-byte basis, of course) and treat them as, for example, very long integers (perhaps not very efficiently, but it was possible).

In that sense, "8BIT character" (and "8BIT number of") could be a less confusing choice.

But AppleScript is now facing a big challenge, with the introduction the unicode text class, and trying to do so as transparently as possible.
That transparency occurs, amongst others, through implicit coercions.
This may have some side-effects.

For example, with regards to the read command, its default behavior was to read an unstructured stream of bytes and to put it into a string (the only structure being the one dictated by the user through the "for", "until"... arguments).
To be more precise, the default was "read ... as text ...", which happened to provide that mapping between characters and bytes.
Now, it seems that the stream may be structured through the unicode encoding scheme under certain circumstances; should this really be the case, this would be a departure from the default behavior as soon as you are handling bytes with values in the range 128-255.

In the above, Chris N. said the bug is that the delimiter is erroneously fetched as unicode, so that a search for a unicode sequence is undertaken.
But this also means that the traditional "as text" default was defeated, as the code started a search for groups of bytes...
Should the class of the delimiter define the interpretation of a file contents?
Should the "as ..." part (defaulting to some class -which one?) induce a coercion of the delimiter?
Should the "as... " and the delimiter mandatoryly be of the same class?
Should the idea of byte-per-byte access to any file just be abandoned?

Well, as I said, just jumped through the door...

Axel
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.

Follow-Ups:
- Re: ASCII vs. MacRoman (was Re: Standard Additions 'read' command - basic questions)
  - From: Christopher Nebel <email@hidden>
- Re: ASCII vs. MacRoman (was Re: Standard Additions 'read' command - basic questions)
  - From: Walter Ian Kaye <email@hidden>

References:
	>Standard Additions 'read' command - basic questions (From: Chap Harrison <email@hidden>)
	>Re: Standard Additions 'read' command - basic questions (From: Emmanuel <email@hidden>)
	>Re: Standard Additions 'read' command - basic questions (From: Emmanuel <email@hidden>)
	>Re: Standard Additions 'read' command - basic questions (From: Christopher Nebel <email@hidden>)
	>ASCII vs. MacRoman (was Re: Standard Additions 'read' command - basic questions) (From: Chris Page <email@hidden>)

Prev by Date: Re: Read - testing for EOF
Next by Date: Re: Processing files in nested folders
Previous by thread: Quark 6 duplicate script
Next by thread: Re: ASCII vs. MacRoman (was Re: Standard Additions 'read' command - basic questions)
Index(es):
- Date
- Thread