On 21 Jun 2016, at 2:16 PM, Jim Underwood <email@hidden> wrote:
So, what is the best practice when reading a file in AppleScript that you did not write, and you do not have any authoritative information about its encoding?
You don't have much choice. UTF-16 isn't used much, and AS only reads UTF-8 or MacRoman (actually, I think that's MacRoman on English-language systems -- I'm not sure). So the best you can do is try UTF-8, and if you get an error, fall back to MacRoman and hope. One of the good things about UTF-8 is that if it uses any characters outside the base set that are common to most encodings (the 128 ASCII characters) they are encoded in a particular way. This means that if you try to open a file saved in another encoding, and it uses more than the basic roman characters, there's almost no chance of it succeeding.
If you're prepared to use ASObjC, there is a method that tries to guess. It was only introduced in 10.10, and I think it had some problems before 10.11. You read the file as data and pass it to the method.
The method takes a dictionary of encoding options, which are basically hints about how you want the conversion done. For example, you can specify that you never want a lossy conversion. You can also specify the likely language if you know it. You can specify if it's likely to have come from Windows. And you can give it a list of encodings to prefer.
Here's an example with two options: don't do a lossy conversion, and it's probably an English document:
use AppleScript version "2.5" use scripting additions use framework "Foundation"
set aPOSIXpath to POSIX path of (choose file) set anNSData to current application's NSData's dataWithContentsOfFile:aPOSIXpath set theOptions to current application's NSDictionary's dictionaryWithObjects:{false, "en"} forKeys:{current application's NSStringEncodingDetectionAllowLossyKey, current application's NSStringEncodingDetectionLikelyLanguageKey} set {theEncoding, theString, wasLossy} to current application's NSString's stringEncodingForData:anNSData encodingOptions:theOptions convertedString:(reference) usedLossyConversion:(reference) if theEncoding is 0 then -- it was an error end if
if you know the file is from Windows, you could use these options instead:
set theOptions to current application's NSDictionary's dictionaryWithObjects:{false, "en", true} forKeys:{current application's NSStringEncodingDetectionAllowLossyKey, current application's NSStringEncodingDetectionLikelyLanguageKey, current application's NSStringEncodingDetectionFromWindowsKey} |