• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Avoiding duplicate records
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Avoiding duplicate records


  • Subject: Avoiding duplicate records
  • From: Miguel Arroz <email@hidden>
  • Date: Tue, 15 Jan 2008 14:55:27 +0000

Hi!

I'm thinking how to approach the following problem, and I would like to know opinions about this, because I may be overcomplicating this, as I often do.

I need to manage contact lists. A contact is an object with an email, first name, last name, and some flags. The important thing is the email, that's what make a contact unique.

A contact list may have tens of thousands of contacts (this is not a theoretical limit, it's a requirement), and cannot have duplicate records (ie, two contacts with the same email).

Well, my first approach is to create a restriction on the DB that will prevent the existence of two records with the same email on the same contact list.

Then, let's suppose I have a contact list with 10k contacts, and I'm adding another 10k contacts. The basic approach is:

1) Divide the 10k in batches of 100, to make this manageable.
2) Try to insert the 100 contacts.
3) If an exception raises due to the UNIQUE constraint, remove the offending object and try again.


This has an obvious problem, which is the fact that in the worst case, the 100 contacts may be repeated, making this very inefficient.

  So, what I though was, if I have a failure:

1) Fo a fetch request to get the contacts with the emails of the 100 contacts batch (ie, blablabla where email = email1 or email = email2 or email = email3 ...).
2) Remove duplicates in memory using a fast method, like putting the stuff in NSSets or whatever.
3) Try to save again. Of course, it may still fail (concurrency sucks) but the probability is much lower.


This is all thought with the assumption that the UNIQUE-related exception is thrown when the first offending object is inserted, so I won't get all the information I need in one single exception, which I'm not 100% sure it's true yet.

  So... suggestions! Is this too crappy? :)

  Yours

Miguel Arroz

Miguel Arroz
http://www.terminalapp.net
http://www.ipragma.com



Attachment: smime.p7s
Description: S/MIME cryptographic signature

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

  • Follow-Ups:
    • Re: Avoiding duplicate records
      • From: David Avendasora <email@hidden>
    • Re: Avoiding duplicate records
      • From: "Daniele Corti" <email@hidden>
    • Re: Avoiding duplicate records
      • From: Mike Schrag <email@hidden>
  • Prev by Date: Re: die terrible error message (Unable to find framework named "YourApplicationName")
  • Next by Date: Re: Cheat: Run EOModeler on Leopard
  • Previous by thread: Re: Project Builder / Xcode files
  • Next by thread: Re: Avoiding duplicate records
  • Index(es):
    • Date
    • Thread