Re: CRLF (was BUG: More on C-header text files)
Re: CRLF (was BUG: More on C-header text files)
- Subject: Re: CRLF (was BUG: More on C-header text files)
- From: email@hidden
- Date: Tue, 2 Jul 2002 13:45:10 -0400
On Mon, 01 Jul 2002 13:34:56 -0700, Philip Aker <email@hidden> speculated,
>
On Sunday, June 30, 2002, at 03:41 PM, Paul Berkowitz wrote:
>
>
> I could never figure out why they needed two separate
>
> characters for marking the ends of lines. That now makes sense.
>
> Can you, or some else, explain or "remind" us why, when
>
> reducing that to one character, Macs (and MS Word, even in
>
> Windows) took CR whereas Unix ("Epoch" beginning 1/1/1970) took
>
> LF? Why so many different implementations?
>
>
I'd like to know the real history too.
>
>
I thought that originally CRLF came about because terminals were
>
modelled after typewriters and had a hard-wired number of
>
columns (64) and a somewhat variable page height (defined by the
>
number of rows).
Actually, physical media drove the format. First, there was
the Hollerith card, developed by IBM (or, as it was then know,
the Hollerith Tabulating Machine Company) for the 1850 US Census.
The "IBM card" holds one line of 80 columns. Usually, the
rightmost 8 columns are used to punch sequence number, so if
you drop the box of cards, you can run them through the collator
and sort them back into order. So the punch card defines a 72
column line, and a page length limited only by the number of
cards you can carry.
The teletype was developed in the early 1900's and came on to
the market in the 1920's.
http://www.thocp.net/hardware/history_of_teletype_development_.htm
It had a 72-column line as well (at least, in the ASR-33 model,
which was enormously popular as a computer terminal in the 1970's).
http://www.rdrop.com/~jimw/ttybr-4.jpg
Now, the teletype had a carriage return character and a line feed character.
This was useful on computers if you needed to
overstrike characters. For example, you could underscore text
by typing the text, sending carriage return, and typing the
underscores. You can also spew the printed page out more
efficiently by just typing line feeds without the wear and tear
of banging the print head back against the stops each time.
Remember, the teletype was an electromechanical device, driven by solenoids.
Send a carriage return, and the carriage-returning
solenoid would fire.
The conventions for line-end characters predate DOS, Unix, and
Mac. Multics and the DEC PDP-8 and PDP-10 were some systems I
used before Unix came into existence, and they used the line feed
(and the other two paper-moving characters, Vertical Tab and Page
Feed) to indicate the end of line. Why? Basically, the "line"
wasn't finished until the paper advanced. It was driven more by
the buffering out output from the mainframe, through a
communications processor, to the teletype, than by user input.
The OS communications processor would take the user-typed carriage
return and add in the line feed.
DOS derived a lot of its conventions from CP/M, which took a lot
of cues from the PDP-10 operating system, TOPS-10. (Things like
the ASCII character set, the "slant" to separate option switches
from the command, and three-letter file extensions, for example.)
Then Unix came along, and changed a bunch of stuff. They still
needed to use the line feed as the line end character, to maintain compatibility
with these communications processors that looked
for line feed. But the carriage return was just excess baggage.
So they used line feed as their line end character, and relied
on the drivers and comm processors to put in carriage returns
if needed.
The Mac also had the opportunity to break from the teletype
history, and they didn't need to maintain compatibility with
old equipment. So they just took the simple approach--the
carriage return--since that's what the keyboard sends.
So what's the implication for AppleScript developers? (Trying
desperately to bring this thread back on topic....) Depending
where you get files from, they may have CR, LF, or CRLF as line
end characters. If its a text file, you should normalize it to
use whatever convention you prefer, or is best supported by the
tools we have. Up until OS X, we usually relied on our front-ends
(FTP client, MIME-equipped e-mail client, or File Exchange) to
normalize text files to the Mac CR line ends. But now, we'll
see LF line ends routinely, and must deal with it ourselves.
BBEdit is a wonderful example of a program the reads and writes
in all three modes.
--
Scott Norton Phone: +1-703-299-1656
DTI Associates, Inc. Fax: +1-703-706-0476
2920 South Glebe Road Internet: email@hidden
Arlington, VA 22206-2768 or email@hidden
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.