Re: [OT] File Encoding question again
Re: [OT] File Encoding question again
- Subject: Re: [OT] File Encoding question again
- From: Ondra Cada <email@hidden>
- Date: Mon, 5 Dec 2005 20:37:01 +0100
Hi,
On 5.12.2005, at 20:23, Chuck Hill wrote:
Is there a test I can run on the string (or the file) which gives
back the current encoding? I could not find it in Javadoc. I can
change the encoding but I need to know which encoding the file
came with, as far as I understand it.
Is there a way to know? When the user exports from excel on a
windows machine he neither knows nor cares (nor could I find a way
to tell excel which encoding to use). But in Germany we have all
those Umlauts .... so we need to be careful with the encoding.
This is not really (at all) my area of expertise, but I don't think
it is possible to know by examining the file. AFAIK, all the 8
bit encodings (Mac Roman whatever, UTF-88, Win Latin, Western Latin
etc), are too similar to correctly guess at.
There are heuristics which can give *comparatively* good results, but
they are *never* absolutely dependable, and they tend to be complex
(and, you generally need to know the language the text is written
in). We are speaking of uglies like frequency analysis of the text,
or even trying to run texts created by using different encodings
through a spellchecker, selecting the encoding which causes the
smallest number of unknown words.
Unless one has to do this, it is much better to allow the user to
select the encoding freely with a feedback (are the data displayed
all right? If no, try another...).
---
Ondra Čada
OCSoftware: email@hidden http://www.ocs.cz
private email@hidden http://www.ocs.cz/oc
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden