• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Spellchecking queries to a database
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Spellchecking queries to a database


  • Subject: Re: Spellchecking queries to a database
  • From: Arturo PĂ©rez <email@hidden>
  • Date: Thu, 23 Oct 2003 19:14:04 -0400

On Thursday, October 23, 2003, at 05:49  AM, petite_abeille wrote:

Hi Arturo,

Arturo Pirez wrote:
=======
Yep. Google must the the best spellchecker out there by far... sigh...

I think it's the only one on the web. Or maybe the only one attached to a search engine.



Wordnet doesn't have all the information necessary to duplicate the functionality.

Right... but it's a start. And short of indexing the entire Internet I'm not aware of any "extensive" source of data ;) Your best bet is to combine whatever you can find.

In any case WordNet is best suited for finding synonyms. Do you know of anything that can break a word up into phonemes? That would be better.



No taxonomy that I'm aware of does.

As far as "taxonomy" goes, one thing one could leverage is the dmoz catalog:


http://rdf.dmoz.org/

But... practically speaking... how would a taxonomy fit in the picture?

WordNet is a taxonomy. So taxonomies in general fit into the picture if WordNet is of interest to you.



The functionality can't be duplicated with any RDBMS.

Or it would be too cumbersome to do so.

We tried. Or rather some confused developers at my former place of occupation did. It was a disaster from the get-go.



To try and do it with a natural language search engine like lucene would negatively impact search performance,
to put it mildly.

Well... it depends on how much resources you can throw at it. Memory is cheap ;)

Memory isn't the problem. It's actually computationally expensive. Of course, you can always trade off memory for computes. Off the cuff I'd say you'd need something like 500GB. We had 120GB-150GB and it wasn't enough.



There's a reason that Google has 50 PhDs in mathematics and natural language processing. To make the
rest of us miserable. :-)

Or happy, as one could use the Google API to access this functionality programatically when practical.

Now there's a solution. Would they permit its use in this case?

None of the stemming algorithms will do that. It must be some sort of
distance metric. But (optimal) string transformations of that sort are NP-complete IIRC. So you need
massive amounts of computes to do it.

Personally, I would use a statistical approach to improve the suggestions.

Statistical approaches bug me. But it would work if you had enough data and enough computes to distill the raw data into a useful form. Last time I tried something like this it kept an SGI Origin 2000 busy for a month and did not complete. And it only needed a working set of 24GB.



Cheers,

PA.

-------
WebObjects in Philadelphia.  You want a cheesesteak with that?
Visit http://webobjects.meetup.com
_______________________________________________
webobjects-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/webobjects-dev
Do not post admin requests to the list. They will be ignored.

  • Follow-Ups:
    • Re: Spellchecking queries to a database
      • From: petite_abeille <email@hidden>
References: 
 >Re: Spellchecking queries to a database (From: petite_abeille <email@hidden>)

  • Prev by Date: Re: Newbie: Mixing HTML & WO Pages?
  • Next by Date: Re: warnings for EC without lock?
  • Previous by thread: Re: Spellchecking queries to a database
  • Next by thread: Re: Spellchecking queries to a database
  • Index(es):
    • Date
    • Thread