NSXMLDocument & friends coalesce runs of spaces to one
NSXMLDocument & friends coalesce runs of spaces to one
- Subject: NSXMLDocument & friends coalesce runs of spaces to one
- From: Fritz Anderson <email@hidden>
- Date: Thu, 19 Nov 2015 10:31:00 -0600
I need to calculate offsets into a Word document XML (.docx) archive using two methods: Counting characters in NSAttributedString’s interpretation, and the text nodes (etc.) of the document XML itself. The offsets have to match. They don’t, mostly because of the way the parser treats runs of space characters.
I can explain the need, but let’s keep this brief.
NSXMLDocument (Node, Element…) elides runs of spaces in text nodes to single spaces. This is a problem, because the scholars who produced the Word source files learned in 1975 to double-space at the end of sentences. NSAttributedString renders the multiple spaces as such; thus the character counts diverge.
"what Socrates wanted. Plato implies"
(two spaces) comes through as
"what Socrates wanted. Plato implies"
(one space).
Already tried:
* Passing the NSXMLDocumentTidyXML option to NSXMLDocument(data:, options:) takes care of single-space elements, but not this.
* NSXMLNodePreserveWhitespace sounds useful, but makes no difference.
* The nodes themselves already have the attribute `xml:space="preserve"`.
* Intercepting every `<w:t/>` element and _forcing_ `xml:space="preserve"`, need it or not, makes no difference.
* If there’s a way for my code, as an NSXMLDocument (etc.) client, to examine the source text before filtering, I haven’t found it.
* I assume it doesn’t matter that I’m working in Swift.
Ideas?
Further details can be found in Stack Overflow at http://stackoverflow.com/questions/33770055/nsxmldocument-family-runs-of-whitespace-collapsed-to-one
— F
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden