Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[apple scitech] Re: Scitech Digest, Vol 4, Issue 166



Questions:

1. PPC or Intel (or both)?
2. Real or Complex matrices?
3. Single precision or double precision (or integer / multi- precision / exotic format)?


The answer to those in mind, I'd make the following suggestions, in order of increasing specificity:

(All of the cases)
For a 4x4 determinant, I would use a purely algebraic determinant function, and inline it into your code if possible. Gaussian elimination is unquestionably the right solution for larger matrices, but even in the best case scenario, GE will still be choosing pivots by the time an algebraic 4x4 determinant has finished executing. Tests and branches (and the accompanying branch mispredictions) are quite expensive relative to the cost of a few multiplies with no branching.


(Single precision real arithmetic, Intel or non-G3 PPC)
You must be doing a *lot* of 4x4 determinants for it to be the dominant term in your calculation. Therefore, I would take a serious look at Eric Postpischil's suggestion of gathering up 4 matrices, and computing their 4 determinants simultaneously.


(Intel only)
If you *must* do the determinants one at a time, and are on Intel, *strongly* consider moving your project to x86_64 (if it isn't already). Using the most naïve scalar 4x4 determinant function I could come up with, I saw a speedup in throughput from 1/64 cycles to 1/34 cycles on a Core2 Duo laptop, just from moving to 64-bit (-Os, pretty much stock xcode build settings).


(Single precision real arithmetic on Intel)
If you really, really want to get fiddly, then yes, it is possible to improve the 4x4 determinant somewhat using the SSE packed arithmetic instructions, but not by a whole lot – it's simply too permute-y to achieve the kind of speedup you might hope for (this is why Eric's suggestion is your best bet - no permutes required). With a packed arithmetic determinant routine in assembly, I'm seeing throughput timings of 1/28 cycles – a bit under 20% faster than the scalar code. It's probably possible to do better, but I believe it will only be a few cycles. If you genuinely can't use the approach that Eric suggests, I'm happy to provide you with this code, though I suggest you test it thoroughly for correctness, as I have given it only superficial testing.


Cheers,
– Steve


On Nov 1, 2007, at 12:08 PM, email@hidden wrote:

Hello,

I want to know the fastest way to calculate the determinant of a 4x4
matrix on the Mac? Can the accelerate framework help here?

Currently, I link to GNU Scientific Library's (GSL) cblas, which
operate on gsl_matrix data-types. Would linking to the Apple library
be a better idea?

From profiling, my code spends ~30% of it's time calculating
determinants.

Cheers,

Dan.

_______________________________________________ Do not post admin requests to the list. They will be ignored. Scitech mailing list (email@hidden) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/scitech/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.