Re: What is Best Method To Determine Duplicate Items in a Large List?
Re: What is Best Method To Determine Duplicate Items in a Large List?
- Subject: Re: What is Best Method To Determine Duplicate Items in a Large List?
- From: Jim Weisbin <email@hidden>
- Date: Sun, 13 Aug 2017 08:59:26 -0400
Jim Underwood <email@hidden> wrote:
> Anyone have any bright ideas and/or tools to speed up identification of dup
> text items in a large list (~30,000)?
I assume you are looking for for consecutive repeated words?
It’s easy with egrep:
egrep "\b([a-zA-Z0-9]+) \1\b” test.txt
However, my attempts to do this with ASObjC using NSRegularExpression have so
far failed. I think the \b (word boundary) and \1 (repeated item) have to be
escaped, not sure.
One caveat is that this will find legitimately repeated words, such as, ‘It’s
true that that is the case…"
Jim Weisbin | C.T.O. | Human | Post Human | 27 West 20th Street | Suite 801 |
New York, NY | 10011 | (212) 352-0211 | (917) 375-2272 | 2046 Broadway |
Santa Monica, CA | 90404 | (310) 264-0211 telephone | www.humanworldwide.com
<http://www.humanworldwide.com/>
Click here <http://www.humanworldwide.com/#commercials> to view our online reel
_______________________________________________
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden