• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Best way to parse XML data of non-ASCII encoding...?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best way to parse XML data of non-ASCII encoding...?


  • Subject: Re: Best way to parse XML data of non-ASCII encoding...?
  • From: Kevin Viggers <email@hidden>
  • Date: Wed, 6 Apr 2005 23:08:19 -0600

Hi Simon,

You may want to look into the XML4C (http://www.alphaworks.ibm.com/tech/xml4c). XML4C was once an IBM Alphaworks project, and is now the combination of the Apache Xerces-C++ validating XML parser and IBMs International Components for Unicode (ICU), both open source, and portable C++ code.

http://xml.apache.org/xerces-c/
http://www-306.ibm.com/software/globalization/icu/index.jsp

As a point of interest, there is a relevant quote from the XML specs (http://www.w3.org/TR/2004/REC-xml11-20040204/#charencoding).

"Each external parsed entity in an XML document MAY use a different encoding for its characters. All XML processors MUST be able to read entities in both the UTF-8 and UTF-16 encodings."

So strictly, it seems that XML processors must support UTF-8 and UTF-16, and *may* optionally accept others.

Hope this helps!
Kevin

On 5-Apr-05, at 11:46 AM, Simon Liu wrote:

I know of NSXMLParser but unfortunately it's for 10.3 only, and I need
to support 10.2 machines as well...

On Apr 5, 2005 6:40 PM, Kevin Viggers <email@hidden> wrote:
If you can get away with an event-driven (SAX-style) XML parsing
approach, you may want to give the NSXMLParser a try. I haven't used it
yet, but it looks to at least have some constants relating to
encodings.


http://developer.apple.com/documentation/Cocoa/Reference/Foundation/
ObjC_classic/Classes/NSXMLParser.html

Kevin


On 5-Apr-05, at 11:12 AM, Simon Liu wrote:

Hi,

I am doing some XML parsing for 10.2+, thus I am using Core
Foundation's XML functions, such as CFXMLTreeCreateFromData().

Things are working fine except for XML files with non-ASCII
characters.  The functions seem to ignore the encoding attribute of
the xml tag, such as in:

<?xml version="1.0" encoding="shift_jis" standalone="yes"?>

Given an XML file in the above encoding, with Japanese characters as
values between tags, the routines crash.

However, if I first convert the file to UTF8, things work fine...

NSString *s = [NSString stringWithContentsOfURL:sourceURL];
NSData *xmlData = [s dataUsingEncoding:NSUTF8StringEncoding];
// use as CFDataRef in CFXMLTreeCreateFromData()

Is this the expected behaviour?  Is there a more elegant way to parse
non-ASCII XML files?

Regards,
Simon
 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden



 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


_______________________________________________ Do not post admin requests to the list. They will be ignored. Cocoa-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: This email sent to email@hidden
  • Follow-Ups:
    • Re: Best way to parse XML data of non-ASCII encoding...?
      • From: Simon Liu <email@hidden>
References: 
 >Best way to parse XML data of non-ASCII encoding...? (From: Simon Liu <email@hidden>)
 >Re: Best way to parse XML data of non-ASCII encoding...? (From: Simon Liu <email@hidden>)

  • Prev by Date: Re: Memory leak in simple code?
  • Next by Date: Re: ADC Core Data article
  • Previous by thread: Re: Best way to parse XML data of non-ASCII encoding...?
  • Next by thread: Re: Best way to parse XML data of non-ASCII encoding...?
  • Index(es):
    • Date
    • Thread