Re: using AppKit additions in background threads
Re: using AppKit additions in background threads
- Subject: Re: using AppKit additions in background threads
- From: Michael Thon <email@hidden>
- Date: Wed, 07 Sep 2011 07:24:36 +0200
On Sep 6, 2011, at 9:23 PM, Douglas Davidson wrote:
>
> On Sep 6, 2011, at 11:53 AM, Jens Alfke wrote:
>
>> On Sep 6, 2011, at 11:11 AM, Michael Thon wrote:
>>
>>> Yup, they're HTML, all right. Now I'm thinking of moving this code to a separate command line app that I can call from the main application. It should work, but I'm not sure if I'd need to provide a runloop for the HTML importing to work.
>>
>> The background tool will need to link against WebKit and AppKit, so it won’t be strictly-speaking ‘background’. You can mark its bundle with a special key (LSBackgroundOnly?) to keep it from showing up in the Dock or getting a menu-bar though.
>>
>> The bigger problem is how this tool sends results back to the main app. An NSAttributedString is an in-memory object, and the tool has a separate address space. I guess you could try archiving the string and sending back the data, but I’m not sure whether all the different attribute values used in parsed HTML are archivable.
>>
>> What do you use the attributed string for, if this is a background-only operation? Maybe there’s a less expensive way to accomplish it.
>
> One possibility would be to convert the HTML to RTF or RTFD, which could be loaded in the background. For that sort of conversion we already have a tool on the system, /usr/bin/textutil. There are also other potential methods for parsing HTML, if the intent is for something other than full editable rich text support.
>
> Douglas Davidson
>
The app is to be used to find potential cases of plagiarism. I'm importing documents that the users have on their local computers as well as web pages that they select from the www. In the case of HTML, the user does not need to edit it, and it will be presented to the user in a WebView. I need to extract plain text from the html in order to compare it to the user's document. The html has not been loaded into the WebView instance yet, so there is no possibility to extract the plain text from there. So, in a background thread (an NSOperation) I fetch the url using NSURLConnection, and then convert the NSData object to NSAttributedString, and from there I convert it to an NSString. Later, if a potential case of plagiarism is detected, the user will view the web page by loading the original url into a WebView. The previous conversion to plain text is accurate, such that if I select a substring of it (a potential plagiarized sentence) and use [webView searchFor:subString ...] in the WebView instance, the webView can find the string and highlight it for the user.
So, I don't need editable rich text, but I do need a plain text string that faithfully represents the text content of the original html displayed in a WebView. I've found that NSAttributedString's initWithData:... method does a good job of the conversion, and now I see why, because its using WebKit to do the actual conversion. There might be a lighter weight way to do the conversion, but I haven't investigated alternatives yet. One drawback to my approach is that I can't get the web page title using NSAttributesString. I don't know how to get to that yet. I've considered trying to parse the html myself, but I think that will eventually just cause me a lot of grief.
Mike
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden