A few small steps into the right direction, plus one big step into
field of vectorized string processing.
So we'll get back our vec_sel (yeah!), vec_min, vec_max, plus a new
test-and-set instruction which will facilitate atomic bitfield
operations (ie cmpxchg16b using XMM), float dot product!!!, 32x32->64
bit mult, and lots of packed GPR<->XMM manipulations that will ease
the lack of a permute unit.
Where's the "new" register file described? No need to add more
GPR's, but hopefully they'll increase XMM to 32 (or more) registers.
It would be nice even if there were the same 16 XMM for use as src/
dest with any instruction, plus 16 to 48 CXMM registers to hold
constant values, ie src-only, with special MOV inst's to set their
values. Better yet just give us 64 XMM's. ;)
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden