Re: Bad Characters from Unicode
Re: Bad Characters from Unicode
- Subject: Re: Bad Characters from Unicode
- From: Sander Tekelenburg <email@hidden>
- Date: Fri, 5 Oct 2007 02:13:23 +0200
["meta http-equiv" in <http://www.joelonsoftware.com/articles/Unicode.html>]
At 09:29 -0400 UTC, on 2007-10-03, Mark J. Reed wrote:
> On 10/3/07, Sander Tekelenburg <email@hidden> wrote:
>> Nice article. The simplification about meta http-equiv is IMO on the edge
>> though:
>
> Well, I think the point isn't that it's often possible to arrange for
> a proper header; it's more that there's no standard way of doing it.
I can't follow. That applies to *all* local stuff. The way you write a utf-8
file in AppleScript is different from how you do that in some other
programming language. The point of these types of standards is to be able to
exchange data with other systems, not to prescribe how to generate or store
data on a local system.
[...]
> Sure, you can make
> a .htaccess file that says "all HTML files in this directory are
> UTF-8", but that doesn't help you have different files of the same
> type with different encodings, unless you define different file
> suffixes for each one.
Right. (And in fact, saying "all files in this directory are utf-8" is
dangerous anyway if you don't ensure that they in fact *are* utf-8.) But
whether you give each HTML file its own meta http-equiv, or specify its
character encoding through some other mechanism, you still need to bother to
define it per file. And then I'd think that it's easier for most people to
use something like file names ("index.utf8.html"). The syntax is easier to
remember, and it can be used for all file types, not just HTML files.
Returning to AppleScript: when you write text to a file, you could save it as
filename.utf-16.txt (utf-16 being the default since Mac OS X). Or when you
explicitly use some other character encoding, call it filename.utf-8.txt, or
filename.mac-roman.txt. If you do that consistently you'll have an easier
time working with those files later.
I realize that there is a certain amount of ugliness to such "visible meta
data". But on the other hand, for most people it is easier to work with than
with "invisible meta data".
> And it also doesn't work if your web server
> happens to be IIS or SunOne instead of Apache.
Maybe. I don't know what specific problems those servers have. But note that
the article is very much a rant about programmers not 'getting' character
encoding. He names programmers of text editors specifically, but I don't see
why the same wouldn't apply to programmers of web servers. Whether it's a
text editor or a web server that doesn't allow users to easily set the
correct character encoding is basically the same problem.
>> So far it hasn't been defined what a browser should do in that case
> [of conflict between HTTP header and <meta> tag].
>
> Interesting. I thought the current behavior (HTML overrides HTTP) was
> defined as correct.
Actually, AFAIK the only thing that was ever officially defined about the
meta http-equiv is that it was to be used by the server to generate a HTTP
Content-Type header; that it was to be ignored by UAs. (I guess it somewhat
makes sense that, in practice, behaviour is actually the opposite. After all,
it's not that practical to have servers read the contents of files before
serving them, whereas the UA already needs to parse the file anyway.)
> Is the W3C also going to address the case of a
> three-way mismatch with XHTML and the <?xml?> processing directive?
Sorry, I'm not sure exactly what you're referring to.
--
Sander Tekelenburg, <http://www.euronet.nl/~tekelenb/>
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden