On Fri, 25 Nov 2005, Rustam Muginov wrote:
> Thank you :)
> I realy went into overcomplicated way.
> Subtraction from zero is the way to go.
>
Depending on the degree of utilization of the vector permute unit, there
might be a way to increase overall throughput. This can only work if there
is more work being done in addition to the sign change.
1. take two complex input vectors and reorder the data into one real
vector and one imaginary vector with two permutes
2. flip the sign of the imaginary vector with subtraction
3. restore the original order of data with another two permutes
This still takes at least two clock cycles per vector (limited by permute
now), and uses 2.5 instructions per input vector. BUT in case the permute
unit was idle, you now have managed to offload real work to it, and you
gained three issue slots for further computational instructions.
Holger
P.S.: If you can keep imaginary and real components in separate vectors
over the course of more computation, you might see even more
performance improvement.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/perfoptimization-dev/email@hidden
This email sent to email@hidden