• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
WO & Lucene
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

WO & Lucene


  • Subject: WO & Lucene
  • From: Michael Parlee <email@hidden>
  • Date: Tue, 26 Oct 2004 10:59:33 -0700

Chuck and Dov,

Very cool to see that both of you are using Lucene. I've just begun integrating it into a small project I'm working on so to hear that others are using it gives me a bit more confidence that I'm doing the right thing.

A couple of questions though. Are you using it to index records in a relational database? If so how are you keeping your index up to date? It took my 600Mhz iBook about 6 hrs to create my index! FYI, I'm flattening several to-many relationships into a "body" field which is indexed but not stored. Also, I did my initial index in straight up JDBC.

I'd love to hear a little about how each of you have approached your Lucene-WO integration.

Thanks,

Mike

On Oct 25, 2004, at 2:12 PM, Dov Rosenberg wrote:

I agree with Chuck, we have integrated Lucene into our CMS system. It
ROCKS!!

The best part about it is that you don't have the EOF overhead of dealing
with large numbers of records. Lucene's capabilities seem limited only by
disk space.


You can also index any binary document if you can extract the text from it.
We use PDFBox to grab the text out of attached PDFs for indexing.



-- Dov Rosenberg Conviveon Corporation http://www.conviveon.com


On 10/25/04 5:04 PM, "Chuck Hill" <email@hidden> wrote:

I'm not certain that you need to get the content out of the database, but
(2) and Lucene will certainly boost performance and drop memory usage.
I'll strongly suggest going with something that includes Lucene in the mix.
Lucene is awesome!


Chuck


At 10:17 AM 25/10/2004 -0700, David Holt wrote:

I have just upped my test database of documents from 1500 records to
17,000. I have a WOComponent with a WODisplayGroup and the appropriate
qualifier fields for searching. If I qualify the data source by putting a
value in one of the search fields I get a list of documents as expected. If
I submit the form without information in any of the qualifier fields (this
used to return the whole data set divided into paged results), I get the
following exception from the application after a minute or so of waiting:
Error:
com.webobjects.foundation.NSForwardException [java.lang.OutOfMemoryError]
null


It is a MySQL database, WO 5.2.3, OS X Server 10.2
I have a blob field that holds the text content of the documents (for
searching) as well as a URL field pointing to the original document on the
file system. One of the qualifier fields is used to search the content field.


Three strategies I can think of to fix the problem are:
1. Increase system memory (not a good long term solution as the documents
will grow over time)
2. Put the blob field in a separate table so it is not loaded with the
WODisplayGroup (not sure if I can still do searches on that field if I do
that)
3. Get the content out of the database and use a combination of PDFbox and
Lucene to provide the content searching separate from my database.


What would you suggest is the best strategy? Or have I misidentified the
problem?


Thanks,
David


 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


_______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: This email sent to email@hidden
  • Follow-Ups:
    • Re: WO & Lucene
      • From: Petite Abeille <email@hidden>
    • Re: WO & Lucene
      • From: Dov Rosenberg <email@hidden>
References: 
 >Re: Search results in an out of memory exception (From: Dov Rosenberg <email@hidden>)

  • Prev by Date: Re: CSS Integration and SubComponents (Partial components?)
  • Next by Date: Re: books for getting started?
  • Previous by thread: Re: Search results in an out of memory exception
  • Next by thread: Re: WO & Lucene
  • Index(es):
    • Date
    • Thread