Re: Removing extra blank lines from generated html?
Re: Removing extra blank lines from generated html?
- Subject: Re: Removing extra blank lines from generated html?
- From: Hugi Þórðarson <email@hidden>
- Date: Fri, 18 Apr 2008 23:34:37 +0000
I've been using Tidy for a couple of years to dynamically clean up
HTML and it works like a charm. Yes, it's strict - but that's one of
the things I like about it :-). Just for kicks, you can check out the
effect of using Tidy by viewing the source of these two links:
http://hugi.karlmenn.is/?useTidy=false
http://hugi.karlmenn.is/?useTidy=true
But recently, I'm more interested in using the DOM to manipulate the
response. There's just something perversely delightful about
manipulating pages as object hierarchies rather than strings.
Unfortunately, reality usually requires working with some badly formed
HTML, and in that respect, Tidy is nice - it is forgiving and will
attempt to fix your terrible, disgusting HTML. So, an example of what
you can do:
---
public WOResponse dispatchRequest( WORequest request ) {
ByteArrayInputStream in = response.content().stream();
ERXRefByteArrayOutputStream out = new ERXRefByteArrayOutputStream();
Document d = tidy().parseDOM( in, null );
NodeList divNodes = d.getElementsByTagName( "div" );
int i = divNodes.getLength();
while( i > 0 ) {
Node n = divNodes.item( --i );
n.appendChild( d.createTextNode( "YARRRR, A MIGHTY FINE DIV I
WAS" ) );
}
String prettyPrintedDocument = convertDOMDocumentToString( d );
response.setContent( prettyPrintedDocument );
}
---
Which is nice. Apart from the facts that (a) Tidy seems to provide a
rather lacklustre implementation of w3.Document, which I have no idea
how to work around, and (b) I don't know **** about the DOM yet
(although I know enough to guess the API sucks - Right?). I'm still
just experimenting.
Sorry for the long post about nothing... I guess I just like to touch
my keyboard.
- hugi
// Hugi Thordarson
// http://hugi.karlmenn.is/
On 18.4.2008, at 22:24, Mike Schrag wrote:
How extreme do you want to get? You can work some wonders with your
HTML in dispatchRequest
Personally, I find that this generates the cleanest possible
response:
public WOResponse dispatchReqcuest( WORequest request ) {
WOResponse respone = super.dispatchRequest( request );
response.setContent( "" );
return response;
}
I experimented last year with running output through tidy, but it
ends up breaking all kinds of things (tidy is just too strict for
most HTML):
public WOResponse dispatchRequest(WORequest request) {
WOResponse response = super.dispatchRequest(request);
if (MDTApplication.contentTypeHTML(response)) {
ByteArrayInputStream in = response.content().stream();
ERXRefByteArrayOutputStream out = new
ERXRefByteArrayOutputStream();
tidy().parseDOM(in, out);
response.setContent(out.toNSData());
}
return response;
}
Possibly exporting the formatter from WOLips to an external jar
might give better results because it's designed to be very forgiving
about how it interprets your HTML, but in the scheme of things, this
is probably not worth the effort.
ms
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden