Re: [CocoaProgrammingForAbsoluteBeginners] Accessing 8-bit bytes from a file using Cocoa
Re: [CocoaProgrammingForAbsoluteBeginners] Accessing 8-bit bytes from a file using Cocoa
- Subject: Re: [CocoaProgrammingForAbsoluteBeginners] Accessing 8-bit bytes from a file using Cocoa
- From: Uli Kusterer <email@hidden>
- Date: Mon, 1 May 2006 22:01:40 +0200
Am 01.05.2006 um 14:29 schrieb Paul Lynch:
you have assumed that your data is old style ascii, and have
discovered by accident that even plain ascii can have many
different encodings.
That's nonsense. ASCII is an encoding, and it can't "have many
different encodings". That would be like saying that pure red can be
certain shades of green.
As Sherm alluded, ASCII is defined as a certain set of 128
different characters (numbered from 0 to 127). There are many
encodings that start with the same 128 characters but also allow
different characters. The most common of these are ISO Latin-1 and
Windows Latin-1. Those fill all 256 possible characters. A very
common one on the Mac is MacRoman, which also matches the others in
the first 128 characters, but then contains different ones in the
ones that have the high bit set.
So, unless or until you know what encoding your text is in, you
won't be able to make much sense of it.
Phil Faber wrote:
I want to be able to read from any file type (.txt, .doc, .xls,
etc) one 'literal byte' (that is, a string of eight ones-and-zeros)
at a time. The file could be any size. For each byte I want to be
able to get the actual ASCII value of that byte (a number from zero
to 255). (I know that some data has to be represented by one, two
or four bytes but I am particularly interested in accessing each 8-
bit byte one at a time.)
Either you want to read text or data. If you want to read data, use
NSData (hence the name) or any of the other low-level access methods
(NSStream, NSFileHandle, mmap(), fread(), FSRead() ...).
(a) this only works for basic ASCII characters (for example, "A"
appears as ASCII 65) but not non-standard characters (for example,
"å" appears as ASCII -116 ... yes, MINUS 116)
Look up "signedness" in the C language reference of your choice.
-116 is perfectly correct for a signed char.
Cheers,
-- M. Uli Kusterer
http://www.zathras.de
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden