• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Avoiding duplicate records
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Avoiding duplicate records


  • Subject: Re: Avoiding duplicate records
  • From: Miguel Arroz <email@hidden>
  • Date: Wed, 16 Jan 2008 01:15:19 +0000

Hi!

  Well, on my PowerBook G4 1.67 Ghz, not yet with conflicts:

1050 records saved one by one: 14 secs
1050 records saved in 50 record batches: 6 secs
1050 records saved in 500 record batches: 5 segundos
10000 records saved in 50 record batches: 31 secs
10000 records saved in 500 record batches: 26 secs
[Curiosity] pasting a 10k line text block on a Safari text area: >1 minute!


  Facing this results, some preliminary conclusions:

1) This takes much less time than what I expected. If it runs at this speed in my old PowerPC, with a slow drive, lots of processes running (including Eclipse), WO app running in development mode etc, then on a real server (with intel procs) it will run even faster (much faster for what I've seen of Java running on intel).

2) There are no significative differences between 50 and 500 in the size batch. I was NSLogging every time I saved a batch, so I did a lot more logging in the 50 batch-sized tests. Logging takes a lot of time, so I think globally it's not that different.

3) Inserting one by one is noticeable slower, but not THAT slower (I logged every 50 inserts, so no special logging time here).

So, I think what I'll do is to write in batches of 50 or so, and if a batch fails, then I write the batch contacts one by one. It's probably a bit slower than fetching, removing duplicates and saving, but it's not that bad and it's much easier to code, and it won't fail a second time if concurrent updates are being made (each contact will be saved, or not, period). It's actually fast enough to not be made on a background process, but instead on an AJAXed long response.

  Thank all of you for the help!

  Yours

Miguel Arroz

On 2008/01/15, at 15:19, Mike Schrag wrote:

1) Fo a fetch request to get the contacts with the emails of the 100 contacts batch (ie, blablabla where email = email1 or email = email2 or email = email3 ...).
2) Remove duplicates in memory using a fast method, like putting the stuff in NSSets or whatever.
3) Try to save again. Of course, it may still fail (concurrency sucks) but the probability is much lower.


This is all thought with the assumption that the UNIQUE-related exception is thrown when the first offending object is inserted, so I won't get all the information I need in one single exception, which I'm not 100% sure it's true yet.
Depending on how your unique constraint is configured, it may throw when the first conflicting insert happens or at the end of the commit (this is that deferrable initially deferred, thing, which I've honestly never tried on a unique constraint, but presumably it works the same).

The only thing I would consider is how frequent conflicts will be. If conflicts will be frequent, it may be cheaper to fetch dupes first to weed them out (so you're not constantly failing out 100- insert blocks).

I think if I were in your position I would just benchmark:
1) committing one at a time -- this is logically the easiest, but it may be the overhead for this is way high ... but WO doesn't do batching inserts ANYWAY, so who knows
2) fetching 100, comparing, deduping, then inserting and committing
3) inserting 100, committing, catch exception (fetch 100, comparing, deduping, inserting, rinse and repeat)


You might also just benchmark the fetching and the inserting independently so you know the relative cost of 100 of each for your average data.

ms

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
40guiamac.com


This email sent to email@hidden

Miguel Arroz http://www.terminalapp.net http://www.ipragma.com



Attachment: smime.p7s
Description: S/MIME cryptographic signature

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

References: 
 >Avoiding duplicate records (From: Miguel Arroz <email@hidden>)
 >Re: Avoiding duplicate records (From: Mike Schrag <email@hidden>)

  • Prev by Date: excludeObjectFromPropertyWithKey failed faulting...
  • Next by Date: Re: Project Builder / Xcode files
  • Previous by thread: Re: Avoiding duplicate records
  • Next by thread: Re: Avoiding duplicate records
  • Index(es):
    • Date
    • Thread