Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Intel Core architecture SSE implementation




On Apr 11, 2006, at 12:10 PM, Rick Berry wrote:

The diagram implies that ports 2, 3 & 4 are for load / store and that ports 0 & 1 are for computation. It shows that port 0 & 1 have both a floating point (x87 scalar? SSE scalar?) and SSE unit. Does this mean that Core can dispatch two independent SSE instructions per cycle (although presumably still with a 2 cycle latency)? Can it execute independent (scalar) floating point multiply and add per cycle?

Yes, in principle, at least for short periods of time. Other things like loop overhead, limitations on the number of register file read ports, decode bandwidth, floating point operand edge case stalls, etc. can get in the way of achieving sustained 2 flop/cycle in scalar arithmetic in real world problems.


Additionally, a lot of the optimisation literature says to avoid instructions which generate multiple micro-ops. However, I haven't been able to find anything which details which instructions decode to how many micro-ops. I assume that this will probably only be applicable to the more convoluted instructions (e.g. operating systems support), but it would still be nice to know.

That is not a good assumption. If you play around with Shark a bit, you'll notice that there are PMCs for instructions retired and µops retired.


Also, are there plans to release documentation about optimising for the Intel architecture? By which I mean something comparable to the current copious (and excellent) Altivec documentation.

Hmmm... I think maybe that falls under the area of questions about future products. I can certainly tell you that we understand the value of such documentation, needing that sort of info ourselves to vectorize/optimize our own code! There is some stuff up already...


	http://developer.apple.com/hardware/ve/sse.html

...but the critical throughput/latency table and µop counts are, as you point out, missing.

Ian _______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden

This email sent to email@hidden
References: 
 >Intel Core architecture SSE implementation (From: Rick Berry <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.