Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Encoding

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding

Subject: Re: Encoding
From: Shane Stanley <email@hidden>
Date: Sun, 31 Aug 2014 22:41:26 +1000

On 31 Aug 2014, at 7:14 pm, koenig.yvan <email@hidden> wrote:

> To be complete, when I read the file with an hexadecimal editor,
>
> the first occurrence is displayed as :
>
>
> 63 65 69 63 E3 6F 20  which, according to character ID values means ceicão
>
>
> the second is displayed as :
>
> 63 65 69 63 C3 E3 6F 20   which, according to UTF-8 values means also ceicão

I think you've made an error in transcribing. That last case should be:

  63 65 69 63 C3 A3 6F 20

That's what's showing in the screenshot you sent me off-list, and that's what I get when reading the file "as data". So there's no conflict.

The important missing piece of information is that it is a PDF file. A PDF is made up of several sections, and they are not encoded the same. The last case is from the XMP metadata part of the file, which is XML presumably encoded as UTF-8, and the earlier example looks to be in PDFDocEncoding, which I gather is a variation on iso-8859-1.

There's no way you can sensibly read a full PDF file using a single encoding -- it has to be treated as a binary file.

I think we canvassed the options for reading the text in a PDF here a while back.

--
Shane Stanley <email@hidden>
<www.macosxautomation.com/applescript/apps/>

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users

This email sent to email@hidden

Follow-Ups:
- Re: Encoding
  - From: Emmanuel LEVY <email@hidden>

References:
	>Encoding (From: "koenig.yvan" <email@hidden>)
	>Re: Encoding (From: Shane Stanley <email@hidden>)
	>Re: Encoding (From: "koenig.yvan" <email@hidden>)

Prev by Date: Re: Encoding
Next by Date: Re: Encoding
Previous by thread: Re: Encoding
Next by thread: Re: Encoding
Index(es):
- Date
- Thread