BTW, I think you call an optimized RGB->YUV function via QuickTime.
(Or even more optimized via OpenGL, if you write a pixel shader.)
You can also call vImageMatrixMultiply_<fmt>.
In any case, the answer to your question is that _mm_mulhrs_epi16 +
_mm_adds_epi16 is *almost* the same thing as vec_mradds. The
double saturation will cause off by one errors for some rare cases
near the limits of saturation.
For SSE2, this is the best I know of. It should be exactly correct:
static inline vSInt16 vec_mradds_epi16( vSInt16 a, vSInt16 b,
vSInt16 c ) __attribute__ ((always_inline));
static inline vSInt16 vec_mradds_epi16( vSInt16 a, vSInt16 b,
vSInt16 c )
{
static const vUInt16 overflowVal = (const vUInt16) {0x4000,
0x4000, 0x4000, 0x4000, 0x4000, 0x4000, 0x4000, 0x4000};
vSInt16 prodHi = _mm_mulhi_epi16( a,
b ); //High 16 bits of a * b
vUInt16 prodLo = _mm_mullo_epi16( a,
b); //Low 16 bits of a * b
vSInt16 overflow = _mm_cmpeq_epi16( prodHi, overflowVal );
prodHi = _mm_adds_epi16( prodHi,
prodHi ); //This operation is
exact, except for the case where a * b are -32768, in which case
overflow = -1
//so, the real top 15 bits of the product is prodHi
- overflow
//In the overflow case, prodLo is zero, so it is
always safe to add in the next bit of the product (rounded up)
prodLo = _mm_avg_epu16( _mm_srli_epi16( prodLo, 14 ),
_mm_xor_si128( prodLo, prodLo ) ); //calculate the next bit of
the rounded up product
prodHi = _mm_add_epi16( prodHi,
prodLo ); //and add it in
//Add the overflow into C. The overflow is only a worry if
the result is large positive.
//In this case, all positive C yield the same result, so we
don't need to worry about numerical errors due to overflowing C when
we do this
c = _mm_subs_epi16( c, overflow );
//saturated add of c to the product gives the result
return _mm_adds_epi16( prodHi, c );
}
Ian
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden