WO & Lucene
WO & Lucene
- Subject: WO & Lucene
- From: Michael Parlee <email@hidden>
- Date: Tue, 26 Oct 2004 10:59:33 -0700
Chuck and Dov,
Very cool to see that both of you are using Lucene. I've just begun
integrating it into a small project I'm working on so to hear that
others are using it gives me a bit more confidence that I'm doing the
right thing.
A couple of questions though. Are you using it to index records in a
relational database? If so how are you keeping your index up to date?
It took my 600Mhz iBook about 6 hrs to create my index! FYI, I'm
flattening several to-many relationships into a "body" field which is
indexed but not stored. Also, I did my initial index in straight up
JDBC.
I'd love to hear a little about how each of you have approached your
Lucene-WO integration.
Thanks,
Mike
On Oct 25, 2004, at 2:12 PM, Dov Rosenberg wrote:
I agree with Chuck, we have integrated Lucene into our CMS system. It
ROCKS!!
The best part about it is that you don't have the EOF overhead of
dealing
with large numbers of records. Lucene's capabilities seem limited only
by
disk space.
You can also index any binary document if you can extract the text
from it.
We use PDFBox to grab the text out of attached PDFs for indexing.
--
Dov Rosenberg
Conviveon Corporation
http://www.conviveon.com
On 10/25/04 5:04 PM, "Chuck Hill" <email@hidden> wrote:
I'm not certain that you need to get the content out of the database,
but
(2) and Lucene will certainly boost performance and drop memory usage.
I'll strongly suggest going with something that includes Lucene in
the mix.
Lucene is awesome!
Chuck
At 10:17 AM 25/10/2004 -0700, David Holt wrote:
I have just upped my test database of documents from 1500 records to
17,000. I have a WOComponent with a WODisplayGroup and the appropriate
qualifier fields for searching. If I qualify the data source by
putting a
value in one of the search fields I get a list of documents as
expected. If
I submit the form without information in any of the qualifier fields
(this
used to return the whole data set divided into paged results), I get
the
following exception from the application after a minute or so of
waiting:
Error:
com.webobjects.foundation.NSForwardException
[java.lang.OutOfMemoryError]
null
It is a MySQL database, WO 5.2.3, OS X Server 10.2
I have a blob field that holds the text content of the documents (for
searching) as well as a URL field pointing to the original document
on the
file system. One of the qualifier fields is used to search the
content field.
Three strategies I can think of to fix the problem are:
1. Increase system memory (not a good long term solution as the
documents
will grow over time)
2. Put the blob field in a separate table so it is not loaded with the
WODisplayGroup (not sure if I can still do searches on that field if
I do
that)
3. Get the content out of the database and use a combination of
PDFbox and
Lucene to provide the content searching separate from my database.
What would you suggest is the best strategy? Or have I misidentified
the
problem?
Thanks,
David
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden