Re: Convert Charatcers
Re: Convert Charatcers
- Subject: Re: Convert Charatcers
- From: cricket <email@hidden>
- Date: Tue, 9 Mar 2004 09:32:47 -0800
One (probably) simpler way to do this would be to use the initWithHTML:
method on NSAttributedString. If you know the encoding of the web page
(which is usually indicated in the raw source), you can do something
like this:
// rawSourceOfHTMLPage, in this case, would be an NSString containing
the raw
// source of the web page in question
NSDictionary *attributes;
NSData *data = [rawSourceOfHTMLPage
dataUsingEncoding:NSISOLatin1StringEncoding];
NSAttributedString *attrString = [[NSAttributedString alloc]
initWithHTML:data documentAttributes:&attributes];
At this point, [attrString string] will be a string representation and
attributes will be a dictionary of attributes.
- cricket
On Mar 9, 2004, at 9:09 AM, Alastair Houghton wrote:
On 9 Mar 2004, at 16:12, Lorenzo wrote:
Hi,
I download an html file from the web containing symbols like
",  , ’, etc, etc.
How can I convert these symbols to the right characters?
Look at this document to find-out about character references (the
&<whatever>; strings):
http://www.w3.org/TR/html4/charset.html#h-5.3
(BTW, ISO 10646 is equivalent to Unicode)
Then use this page to decode the named ones:
http://www.w3.org/TR/html4/sgml/entities.html
If you're rendering pages from the web, another problem that you have
to contend with is the ubiquity of Windows... a lot of web pages
inadvertently use the Windows (ANSI) character set, which could cause
confusion if you aren't expecting it.
The actual mechanics of converting the character data are just normal
string manipulation.
Kind regards,
Alastair.
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.