Mailing Lists: Apple Mailing Lists
Image of Mac OS face in stamp
benchmarking statistics
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

benchmarking statistics




One problem that we (in the Vector & Numerics Group) have been struggling with is how to report some sort of meaningful statistics on run times for microbenchmarks. What we experience is that 98% of the run times come in at the same low value +-3%, which lies very near the minimum time. Then there are a population of statistical outliers that take a LOT longer (like 1000 times as long), which we explain away as being a side effect of being on a preemptive multitasking operating system -- some other task interrupts us and runs for up to 10 milliseconds. This unfortunately lands between our two clock measurements and so is incorrectly billed to our local test as extra time taken. Our tests need more precision than something like getrusage is able to deliver.


So far we've been getting around this by just reporting the minimum time. However, that doesn't really accurately report situations where our performance really is better described by a bimodal distribution due to a optimization error that we'd like to find and fix. The outliers are so big that other estimates of performance like mean and std deviation are very far from the typical case and don't report representative numbers.

I am curious whether any of you out there have developed a robust statistical treatment for the data in the face of this sort of problem. I've been thinking of comparing the likelihood estimator between a normal distribution and a bimodal normal distribution with something like a F-test to see if the bimodal case is a better estimator to some threshold of statistical significance. ..but then I'm just an amateur statistician who knows just enough to get himself into big trouble using the wrong test. It's also a lot of work, so I was wondering if there was anything simpler or at least more likely to work that I might try first.

Mostly we want to apply this to our internal regression tests. They produce a lot of data, so we're looking for some highly automated method that wont waste our time with a lot of false positives.

Ian
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden




Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2011 Apple Inc. All rights reserved.