• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Fuzzy string searching. String distance algorithm based on a tree technique?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fuzzy string searching. String distance algorithm based on a tree technique?


  • Subject: Re: Fuzzy string searching. String distance algorithm based on a tree technique?
  • From: Tim Ramsey <email@hidden>
  • Date: Thu, 21 Oct 2004 22:59:48 -0500


On Thursday, October 21, 2004, at 12:03 PM, Theodore H. Smith wrote:

Does anyone know a string distance algorithm that is based on a tree technique?

You can use Hamming distance to compute distance between trees. Erect 2 bit strings (or some equivalent). Each bit string has 1 element per node. Light bits in one string corresponding to nodes that are connected in one tree and bits in the other so they correspond to nodes connected in the other tree. The Hamming distance between the bit strings is the distance between the 2 trees. The trick for using this is to define the trees appropriately for your problem. A fuzzy search can be computed by specifying the maximum Hamming distance that can separate similar strings. BTW, you can build the bit strings directly from test strings or whatever, if that is what you are comparing. You don't necessarily have to bother with trees unless you have other reasons besides the distance calculation (like phrase structure modeling).

Or one that is recursive (stack based), but also calls itself in such a way that its operation could be drawn as a tree, cutting off branches which are too dissimilar and not going down those paths.

I know of Levenshtein for doing string distances, I've used it sucessfully in the past, but that is not what I need. I need a tree based one.

Why?

I need to adapt it for a different purpose, which is "fuzzy string searching". Levenshtein is simply not adaptable for this purpose. At least if it is, I haven't figured out how ;o)

Any smart people have any ideas?

--
Theodore H. Smith - Software Developer.
http://www.elfdata.com

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden



Tim

--
"No one has yet presented a single scrap of creditable evidence
to support the proposition that life should be taken seriously."
-H.L. Mencken

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden
References: 
 >Fuzzy string searching. String distance algorithm based on a tree technique? (From: "Theodore H. Smith" <email@hidden>)

  • Prev by Date: Re: Retains and bindings
  • Next by Date: Re: Bug in NSArrayController? (immutable instead of mutable dictionaries)
  • Previous by thread: Fuzzy string searching. String distance algorithm based on a tree technique?
  • Next by thread: Toolbar Item Validation in a Panel
  • Index(es):
    • Date
    • Thread