Re: en-dash and em-dash
Re: en-dash and em-dash
- Subject: Re: en-dash and em-dash
- From: Deivy Petrescu <email@hidden>
- Date: Mon, 20 Jun 2016 22:16:30 -0400
Well, I think I know what is going on, but I don’t understand why it happens.
I have the same kind of issues with pages in Portuguese (Spanish, French, Romanian, etc will have the same problem).
I actually tried to see if my clipboard was correct that is, that it contained the accented characters, and it did.
But writing it to a file, no matter how, garbles the characters. It does not matter what encoding I am using.
When I open the text in BBEdit these characters are Gremlins.
I actually change all the possible Portuguese cases, e.g. á to á and now it is fine.
So the problem happens as we write the text to a file.
> On Jun 20, 2016, at 19:32 , Mitchell L Model <email@hidden> wrote:
>
> I let the dashes distract me. They are just the first strange characters I ran into. The problem is general.
>
> Consider this page. Its tab reads
>
> PEG.js – Parser Generator for JavaScript
>
> both in Safari and in Script Debugger’s inspection of that tab’s name.
>
> With each of the attempts described below the output file contains
>
> PEG.js Ð Parser Generator for JavaScript
>
> Or for even more fun this page, whose tab reads
>
> Examples overview — PyObjC — the Python ⟷ Objective-C bridge
>
> but gets written out as
>
> Examples overview — PyObjC — the Python ⟷ Objective-C bridge
>
>
> (1) I append the name of the tab by concatenating to an existing string: "the name of theTab”
> (2) I append the name of the tab by concatenating to an existing string: "the name of theTab
> as «class utf8»”
> (3) I write the file with “write theText to theFile”
> (4) I write the file with “write theText to theFile as «class utf8»
> (5) I write the file with this handler extracted from Christopher’s code
> (6) In desperation I thought it might be initial string to which other text was appended
> that had to be UTF8 rather than the way the file was written, so I initialized the string
> with ‘set string to “This is the title of the file” as «class utf8»'. (I never encountered «class utf8»
> before — and I don’t know how I was supposed to know about it — so I don’t know whether
> there is a UTF8 string in AppleScript.)
>
> I can’t make sense of this. I know there are encoding issues. I know how to manage them in other languages, such as Python. In all my years of AppleScript coding I never noticed characters with codes higher than MacRoman’s 256 in files written with “write theString to theFile”, but then again I have only very occasionally used AppleScript to write files, so the weird characters might have been there without my noticing.
>
> Here is Christopher’s handler I referred to:
>
>> On Jun 19, 2016, at 8:17 PM,Christopher Stone <email@hidden> wrote:
>>
>> on writeUTF8(_text, _file)
>> try
>> if _file starts with "~/" then
>> set _file to POSIX path of (path to home folder as text) & text 3 thru -1 of _file
>> end if
>> set fRef to open for access _file with write permission
>> set eof of fRef to 0
>> write _text to fRef as «class utf8»
>> close access fRef
>> on error e number n
>> try
>> close access fRef
>> on error e number n
>> error "Error in writeUTF8() handler!" & return & return & e
>> end try
>> end try
>> end writeUTF8
>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> AppleScript-Users mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
> Archives: http://lists.apple.com/archives/applescript-users
>
> This email sent to email@hidden
Deivy Petrescu
email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden