Re: Fuzzy string searching. String distance algorithm based on a tree technique?
Re: Fuzzy string searching. String distance algorithm based on a tree technique?
- Subject: Re: Fuzzy string searching. String distance algorithm based on a tree technique?
- From: Tim Ramsey <email@hidden>
- Date: Thu, 21 Oct 2004 22:59:48 -0500
On Thursday, October 21, 2004, at 12:03 PM, Theodore H. Smith wrote:
Does anyone know a string distance algorithm that is based on a tree
technique?
You can use Hamming distance to compute distance between trees. Erect 2
bit strings (or some equivalent). Each bit string has 1 element per
node. Light bits in one string corresponding to nodes that are
connected in one tree and bits in the other so they correspond to nodes
connected in the other tree. The Hamming distance between the bit
strings is the distance between the 2 trees. The trick for using this
is to define the trees appropriately for your problem. A fuzzy search
can be computed by specifying the maximum Hamming distance that can
separate similar strings. BTW, you can build the bit strings directly
from test strings or whatever, if that is what you are comparing. You
don't necessarily have to bother with trees unless you have other
reasons besides the distance calculation (like phrase structure
modeling).
Or one that is recursive (stack based), but also calls itself in such
a way that its operation could be drawn as a tree, cutting off
branches which are too dissimilar and not going down those paths.
I know of Levenshtein for doing string distances, I've used it
sucessfully in the past, but that is not what I need. I need a tree
based one.
Why?
I need to adapt it for a different purpose, which is "fuzzy string
searching". Levenshtein is simply not adaptable for this purpose. At
least if it is, I haven't figured out how ;o)
Any smart people have any ideas?
--
Theodore H. Smith - Software Developer.
http://www.elfdata.com
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
Tim
--
"No one has yet presented a single scrap of creditable evidence
to support the proposition that life should be taken seriously."
-H.L. Mencken
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Cocoa-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden