Re: Question about line breaks and file types
Re: Question about line breaks and file types
- Subject: Re: Question about line breaks and file types
- From: Chuck Soper <email@hidden>
- Date: Mon, 4 Aug 2003 11:07:33 -0700
On the subject of Unicode text files, I have some questions about the
byte order mark (BOM). Unicode text files may or may not contain a
byte order mark at the beginning of the file. The following code
automatically recognizes the encoding as UTF-8 only if the file has a
byte order mark.
  NSString * myFile = @"~/myUTF8File.txt";
  myFile = [inFileName stringByStandardizingPath];
  NSString * source = [NSString stringWithContentsOfFile:myFile];
TextEdit does not write a BOM when saving a UTF-8 file so I use
BBEdit. The above code fails with TextEdit UTF-8 files. I assume that
that I could probably add a line of code to change the encoding, but
I want the code to recognize the encoding.
Should my code be changed to better recognize encodings?
Should TextEdit write a byte order mark for Unicode files?
Chuck
At 10:28 AM -0700 8/4/03, Douglas Davidson wrote:
On Sunday, August 3, 2003, at 10:19  PM, Francisco Tolmasky wrote:
Is there a necessary connection between line breaks and file types.
For example, should a unicode text file use unicode line breaks.  I
ask this because programs like BBEdit let you change the line break
style on the fly, seemingly without changing the file type.
There is no necessary connection between the encoding and the line
break types; in particular, just because a file is in UTF-16 or
UTF-8 doesn't mean that it should use Unicode line breaks.  In fact,
the Unicode breaks are probably among the least likely to be
recognized by code that doesn't recognize all of the types, so you
probably shouldn't use them unless you have some compelling reason
for doing so.
Secondly, if I have a textview with UTF 8 characters in it, and
someone pastes something into it, do I have to do anything with it,
like, for example convert it to UTF 8, or is it converted
automatically or what?
The backing store for an NSTextView is an NSTextStorage, which is a
subclass of NSMutableAttributedString.  An NSMutableAttributedString
consists of an NSString plus attributes, and NSString is
conceptually a sequence of unichars--the actual storage may vary,
but it is concealed by the NSString interface.  You need to worry
about encodings only when you are converting your text into some
other form, for example if you are storing it in a plain text file.
As long as your text is within the text system, Cocoa takes care of
it all for you.  This includes handling encodings for all of the
standard pasteboard types that Cocoa recognizes.  If you were
implementing your own custom pasteboard type, you might have to
concern yourself with this.
Douglas Davidson
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: 
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.