Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Extracting HTML tags in Java



<email@hidden> wrote:

>How does one go about extracting HTML tags from a source
>file in Java. is there a pre-canned package or do I need to
>roll my own with either regex or substring?
>
>These are "pretty normal" tags I am parsing, that I have
>either generated myself or have been involved with the
>system development.

It depends:
- on the meaning of "extract".
- on the meaning of "source file".
- on the meaning of "pretty normal".
- what you intend to do after extracting the markup.

Do you want to render the HTML? Convert it to something else, say, XHTML
or RTF? Create a document object model? Summarize the content? Display
the markup semantics?

There's also the question of how much HTML markup you want to parse. All
of HTML 1.0? Subset of 1.0? All of 4.0? Subset? Include images,
scripting, frames, href's, etc.?

It's fairly simple to "extract" and manipulate specific tags for specific
purposes, and without using regex or substring. For example, see my open
source Fancy-API2Mac converter:
<http://www.amug.org/~glguerin/sw/#doc-convert>

Without knowing what you're trying to do, any suggestions are more likely
to be insufficient or overkill than appropriate.

-- GG
_______________________________________________
java-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/java-dev
Be sure to read the FAQ http://developer.apple.com/java/faq/ before posting
Do not post admin requests to the list. They will be ignored.



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.