Re: similar strings
Re: similar strings
- Subject: Re: similar strings
- From: "Gary (Lists)" <email@hidden>
- Date: Mon, 09 Jan 2006 12:05:26 -0500
"Feat" wrote:
> Hi, list!
>
> I need to sort strings by their degree of similarity, ignoring case. Is there
> a quick way to tell that "xxx abc zzz" and "ppp abc qqq" share the "abc"
> segment?
>
> --
> Jym Feat -- Paris FR 75018
Jym,
In general, there is an algorithm for calculating text variation (or
similarity), and that is called the Levenshtein Distance.
Luckily, this is a standard function in PHP, and that is easily accessible
via AppleScript and 'do shell script', presuming you are using OSX.
The Levenshtein Distance function is part of PHP as a standard function, so
you do not have to implement the algorithm yourself.
The basic explanation of the Lev Distance is that the value tells you the
number of character transformations needed to make one string identical to
the other.
So, the word "the" and the word "they" have a Levenshtein Distance of 1.
The strings "hello world" and "hell worm" have an LD of 3 (3 character
transformations must be applied to make string 2 identical to string 1.)
Here is a very simple example of a working usage of that in PHP, via
AppleScript. This could be cleaned up and/or condensed, but I've clipped it
from my test sheet just as it is (a bit windy).
----------------------------------------------------------------------
-- CALCULATING LEVENSHTEIN DISTANCE VIA PHP+APPLESCRIPT
set thisText to "hello"
set thatText to "hell"
-- Single Line, No Wrap!
set phpScr to "$lev=levenshtein('" & thisText & "', '" & thatText & "');echo
$lev;"
set shCmdStub to "php -r "
set sh to shCmdStub & (quoted form of phpScr)
set res to do shell script sh
--> "1"
----------------------------------------------------------------------
Note that in my simple sample, the result is returned as text, not an
integer. You can test for and then coerce that to an integer if you want to
do some other math or value comparison with the value.
I hope that helps.
--
Gary
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden