Re: linebreak in a shell script; was Re: What's wrong with this call to zip?
Re: linebreak in a shell script; was Re: What's wrong with this call to zip?
- Subject: Re: linebreak in a shell script; was Re: What's wrong with this call to zip?
- From: Philip Aker <email@hidden>
- Date: Thu, 28 Feb 2008 18:58:56 -0800
On 08-02-28, at 18:32, Mark J. Reed wrote:
No. The Unicode model defines its terms very specifically. A
character is mapped to one scalar value, which generally maps to one
code point (except for characters outside the BMP in UTF-16), and then
the code point(s) are represented in some number of bytes - 1, 2, 3,
or 4, depending on the code points and transformation format in use.
More than one character may combine to make a single glyph, and
different sequences of characters may be equivalent according to the
canonicalization rules, but no matter how many bytes it takes to
represent a given single Unicode scalar value in a given UTF, it is
still one character.
Didn't I just say that in my analogy? Midi describes its terms very
specifically. Notes are mapped to one scalar value but that which
makes up a note message is a code point. I'm just making the analogy
to a more general concept beneficial across a spectrum of uses, not
descending into the minutiae of a particular implementation. Actually
wishing more to trace the origin of the idea. Which may be in
Frequency Shift Key Modulation for all I know.
Ok, so there's some magical weirdness with crlf in as 2.0. Don't
conflate it with the Unicode stuff, though. It may have gone in at
the same time but this behavior is definitely not a Unicode thing.
Well the idea of one "character" having two code points is certainly
a Unicode thing.
IMO, not really. If you describe the notion more generally, such as
an
entity ascribed to one or more values with the meaning of the values
following the first contingent upon the value of the first lying
within certain ranges, then MIDI was at that point in the early '80s.
And one certainly has to ask how the so called two-byte languages in
vintage MacOS worked. Unicode simply standardized the concept as it
pertains to character set representations.
Philip Aker
echo email@hidden@nl | tr a-z@. p-za-o.@
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden