Re: ASCII vs. MacRoman (was Re: Standard Additions 'read' command - basic questions)
Re: ASCII vs. MacRoman (was Re: Standard Additions 'read' command - basic questions)
- Subject: Re: ASCII vs. MacRoman (was Re: Standard Additions 'read' command - basic questions)
- From: Axel Luttgens <email@hidden>
- Date: Tue, 20 Jan 2004 13:08:33 +0100
Chris Page wrote:
On Jan 19, 2004, at 12:36, Christopher Nebel wrote:
The problem is that your delimiter string is being fetched as
Unicode, but then it looks for the Unicode code point in the data,
which is *not* 254 (or 255, or whatever), since your primary encoding
is probably MacRoman, which doesn't agree with Unicode at all above
127. (127 and below -- that is, ASCII -- is fine.)
I realize this is a separate topic,
Well, perhaps not that much. So, I'll jump through the door you opened ;-)
but: In fact, it's misleading for "ASCII character" to work with
values above 127, which are not ASCII. It's too bad people have played
fast-and-loose with the term ASCII lo these many decades, including
nearly every mainstream programming language. Really, "ASCII
character" should produce an error if the numeric value is not valid
ASCII. Either it should have been named "MacRoman character" or some
other mechanism should have been created for handling other character
encodings.
In fact, it's not too late. It might be useful to rename it to
something like "MacRoman character" and provide "ASCII character" as a
synonym (so "ASCII character" could still be used, but it would
decompile to "MacRoman character").
You're absolutely right by writing that "ASCII character" (as well as
its twin "ASCII number") has been a confusing terminology choice.
But I'm not sure something like "MacRoman character" would be a better one.
After all, the whole matter had (note the past form) barely to do with
encodings.
The real semantics of "ASCII character" was: take an 8-bit integer value
and treat it as a character (ie. an element of a string).
If you wanted to, say, construct strings and display them through
dialogs then yes, it was a good idea to be compliant with the Mac-Roman
encoding (or better, the encoding resulting from your language settings).
But if you wanted to read/create files using an latin-1 or a windows-xyz
encoding, you could do so too.
Or even handle multi-bytes chunks (on a byte-per-byte basis, of course)
and treat them as, for example, very long integers (perhaps not very
efficiently, but it was possible).
In that sense, "8BIT character" (and "8BIT number of") could be a less
confusing choice.
But AppleScript is now facing a big challenge, with the introduction the
unicode text class, and trying to do so as transparently as possible.
That transparency occurs, amongst others, through implicit coercions.
This may have some side-effects.
For example, with regards to the read command, its default behavior was
to read an unstructured stream of bytes and to put it into a string (the
only structure being the one dictated by the user through the "for",
"until"... arguments).
To be more precise, the default was "read ... as text ...", which
happened to provide that mapping between characters and bytes.
Now, it seems that the stream may be structured through the unicode
encoding scheme under certain circumstances; should this really be the
case, this would be a departure from the default behavior as soon as you
are handling bytes with values in the range 128-255.
In the above, Chris N. said the bug is that the delimiter is erroneously
fetched as unicode, so that a search for a unicode sequence is undertaken.
But this also means that the traditional "as text" default was defeated,
as the code started a search for groups of bytes...
Should the class of the delimiter define the interpretation of a file
contents?
Should the "as ..." part (defaulting to some class -which one?) induce a
coercion of the delimiter?
Should the "as... " and the delimiter mandatoryly be of the same class?
Should the idea of byte-per-byte access to any file just be abandoned?
Well, as I said, just jumped through the door...
Axel
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.