Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: AUGD: MUG Newsletter Indexing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AUGD: MUG Newsletter Indexing

Subject: Re: AUGD: MUG Newsletter Indexing
From: Jo Booth <email@hidden>
Date: Wed, 15 Feb 2006 16:29:11 +1300


On 15/02/2006, at 15:50 , Paul Richards wrote:

On Feb 7, 2006, at 12:41 AM, Greg Sharp wrote:
These newsletters are retained in PDF format.
I am wondering if anyone has any suggestions on how one might be able to develop an Index for all this material, preferably something that would be accessible online.
This functionality is also built in to a lot of servers.
I got to thinking about this more over the past couple of days and remembered another thing about my previous explorations into PDF indexing. One of the workarounds I tried was converting the PDFs to text and indexing the text. The thing that tripped me up on that was things like space padding and other visual text treatments that were placed in the PDF to make it look good, either by the DTP software or by the PDF process itself, but which disrupted the natural flow of text when it was converted back. I am curious about whether this also poses a problem for the various PDF indexing methods that have been mentioned in this thread.

Unless you are searching for a phrase (and probably even then) most search engines treat the pdf / text file / html file as a bunch of keywords - and ignore whitespace. The excerpts they present to you in the search results may look a little weird due to pdf->html conversion ;) but when you go to a search results it is in it's native pdf format.

I think ;)

-Jo.
WelMac VP / NMGite
http://forums.welmac.org.nz

Attachment: PGP.sig
Description: This is a digitally signed message part

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Augd mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

Follow-Ups:
- Re: AUGD: MUG Newsletter Indexing
  - From: Paul Richards <email@hidden>

References:
	>Re: AUGD: MUG Newsletter Indexing (From: Greg Sharp <email@hidden>)
	>Re: AUGD: MUG Newsletter Indexing (From: Paul Richards <email@hidden>)

Prev by Date: Re: AUGD: MUG Newsletter Indexing
Next by Date: Re: AUGD: Re: Video Podcast of Mac User Group meetings
Previous by thread: Re: AUGD: MUG Newsletter Indexing
Next by thread: Re: AUGD: MUG Newsletter Indexing
Index(es):
- Date
- Thread