RE: duplicates in a file
RE: duplicates in a file
- Subject: RE: duplicates in a file
- From: email@hidden
- Date: Fri, 19 Jul 2002 16:00:23 EDT
Sounds like you need to write a mailing address 'cleaner'
first. Something that automatically looks for (and replaces
with, respectively, where \t is tab):
\tP.O.[space] -> \tPO[space]
\tPO[space,space] -> \tPO[space]
\tSt.[space] -> \tSaint[space]
\tST.[space] -> \tSaint[space]
\tSt[space] -> \tSaint[space]
\tST[space] -> \tSaint[space]
[space]Str.[space or \t] -> [space]ST[space or \t]
(on and on, takes about 42 search & replace operations)
you get the idea... Once that happens, then you can comb
through for duplicates automatically, based on matching
addresses and whichever name is longest. How many addresses
are in this file? For files under 3,000 addresses, such
combing manually (in excel or BBEdit or Nisus, etc) will
take about an hour to an hour and half, which may be less
time than writing a script to do it. If you have in excess
of 3,000 addresses, consider sending the file out for
address-verification by the post office, which will
automatically return the file with the addresses in a
standardized format (which can then be cleared of
duplicates very quickly).
Once the file has been 'cleaned', given that the files are
ascii, tab delimited, return as end of line, then you could
do the removal of duplicates without ever opening an
application, just by reading the file into AS one line at a
time (using read until return parameter to load a
variable), then break it down by tabs, and check it against
the previous line. This ought to be quite fast. Meanwhile,
BBEdit or Nisus Writer will both offer you the same
capability at high speeds (and it might be worth noting
that Nisus Writer's GREP feartures are much more user
friendly, allowing you to choose standard expressions in
plain english from a drop down menu, if you want to run
this by hand or aren't extremely familiar with GREP).
Best Wishes,
=-= Marc Glasgow
Rick wrote:
>
Anybody got any ideas on removing duplicate entries in a
>
tab delimited file. I used the following script comparing
>
"C1R1" against "C1R2" and so on. After looking a little
>
closer, I realize that there needs to be a little more
>
considered here, see the snip of the data to see what I'm
>
referring to. This Excel script is excruciatingly slow, I'm
>
hoping that this can be done with BBEdit.
_______________________________________________
applescript-users mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/applescript-users
Do not post admin requests to the list. They will be ignored.