Hi, Guys--
Thanks for some very interesting observations...
On Jul 29, 2014, at 12:42 PM, Stan Cleveland wrote:
set srtFile to (choose file with prompt "Select a file:") set docLines to paragraphs of (read srtFile) repeat with i from 1 to (count docLines) set srtLine to item i of docLines -- do something with this line end repeat
Very nice! I had not thought of that, perhaps because I grew up with limited memory. But can read srtFile diagnose encodings?
On Jul 29, 2014, at 2:40 PM, Christopher Stone wrote:
If I was dealing with different encodings I'd just use the Satimage.osax: # readtext (verb)read a text file. readtext is aware of the presence of BOM. set data_ to paragraphs of (readtext file_)
Ah! Yes!. I had forgotten about readtext! Thanks for reminding me. On Jul 30, 2014, at 1:24 AM, Emmanuel LEVY wrote: It's a matter of probability. For instance if I write: C'est l'été !
Wow. What an elegant explanation.
I tried making three files with TextEdit.
Into all three I copied the string, "C'est l'été !" In the "Save As..." step
I gave each a unique name (and no extension) And changed the Encoding pop-up at the bottom of the box One to UTF-8 One to Western (MacOS Roman) One to Western (Windows Latin 1)
I then opened all three with TextWrangler.
I was frankly surprised to find that all three TW windows displayed exactly the same string, "C'est l'été !" but differed in their Encoding boxes in their bottom frames One said UTF-8 One said Western (MacOS Roman) One to Western (Windows Latin 1)
So can we say that TextWrangler is guessing?
I tried a similar experiment with the string "It is summertime!" (I omitted the apostrophe so as to be sure to offer no clue.) I deleted the three files, then used TextEdit to create three new ones with three different encodings. Then I Quit TextEdit.
I opened all three files (at once) with TextWrangler and lo! Again, all three windows displayed "It is summertime!", and each had its unique encoding in its bottom frame.
Where is TextWrangler getting this information?
--Gil
|