• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Question about line breaks and file types
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about line breaks and file types


  • Subject: Re: Question about line breaks and file types
  • From: Chuck Soper <email@hidden>
  • Date: Mon, 4 Aug 2003 11:07:33 -0700

On the subject of Unicode text files, I have some questions about the byte order mark (BOM). Unicode text files may or may not contain a byte order mark at the beginning of the file. The following code automatically recognizes the encoding as UTF-8 only if the file has a byte order mark.
NSString * myFile = @"~/myUTF8File.txt";
myFile = [inFileName stringByStandardizingPath];
NSString * source = [NSString stringWithContentsOfFile:myFile];

TextEdit does not write a BOM when saving a UTF-8 file so I use BBEdit. The above code fails with TextEdit UTF-8 files. I assume that that I could probably add a line of code to change the encoding, but I want the code to recognize the encoding.

Should my code be changed to better recognize encodings?
Should TextEdit write a byte order mark for Unicode files?
Chuck

At 10:28 AM -0700 8/4/03, Douglas Davidson wrote:
On Sunday, August 3, 2003, at 10:19 PM, Francisco Tolmasky wrote:

Is there a necessary connection between line breaks and file types. For example, should a unicode text file use unicode line breaks. I ask this because programs like BBEdit let you change the line break style on the fly, seemingly without changing the file type.

There is no necessary connection between the encoding and the line break types; in particular, just because a file is in UTF-16 or UTF-8 doesn't mean that it should use Unicode line breaks. In fact, the Unicode breaks are probably among the least likely to be recognized by code that doesn't recognize all of the types, so you probably shouldn't use them unless you have some compelling reason for doing so.

Secondly, if I have a textview with UTF 8 characters in it, and someone pastes something into it, do I have to do anything with it, like, for example convert it to UTF 8, or is it converted automatically or what?

The backing store for an NSTextView is an NSTextStorage, which is a subclass of NSMutableAttributedString. An NSMutableAttributedString consists of an NSString plus attributes, and NSString is conceptually a sequence of unichars--the actual storage may vary, but it is concealed by the NSString interface. You need to worry about encodings only when you are converting your text into some other form, for example if you are storing it in a plain text file.

As long as your text is within the text system, Cocoa takes care of it all for you. This includes handling encodings for all of the standard pasteboard types that Cocoa recognizes. If you were implementing your own custom pasteboard type, you might have to concern yourself with this.

Douglas Davidson
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • BOM and UTF-8 (was Re: Question about line breaks and file types)
      • From: Dustin Voss <email@hidden>
References: 
 >Re: Question about line breaks and file types (From: Douglas Davidson <email@hidden>)

  • Prev by Date: RE: format specifiers
  • Next by Date: RE: Mutable != Mutable ?
  • Previous by thread: Re: Question about line breaks and file types
  • Next by thread: BOM and UTF-8 (was Re: Question about line breaks and file types)
  • Index(es):
    • Date
    • Thread