Re: NSXML and invalid UTF8 characters
Re: NSXML and invalid UTF8 characters
- Subject: Re: NSXML and invalid UTF8 characters
- From: Andrew Thompson <email@hidden>
- Date: Sun, 31 Jan 2010 12:42:05 -0500
I'm a little surprised that no one else mentioned it, but are you sure that you actually want to strip the characters?
As Sixten Otto said
> For what it's worth, another common cause of problems with stuff
> pasted from Word (at least on the web), is Word docs that contain
> characters from the Windows-1252 character set that are invalid UTF-8
> byte sequences. Most commonly, 0x80-0x9F, which is the range where
> Windows-1252 differs from ISO-Latin-1
0x80 to 0x9F in codepage 1252 inclues the Euro sign, the bullet (option-8 on the mac) the en-dash and em-dash... i.e. all things that will be found even in English text.
(Reference http://msdn.microsoft.com/en-us/goglobal/cc305145.aspx)
These can all be represented in unicode, but you'd have to run the text through a converter. Which will lead to the question, how do you know the encoding of what was pasted in?
Generally you wouldn't, but you can play guessing games based on probabilities if you see these char values.
My only point is, you may end up annoying users by dropping part of what they tried to paste in. This may or may not be acceptable in your case.
AndyT (lordpixel - the cat who walks through walls)
A little bigger on the inside
(see you later space cowboy, you can't take the sky from me)
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden