Mailing Lists: Apple Mailing Lists
Image of Mac OS face in stamp
Re: vec_mradds -> _mm_maddubs_epi16 ???
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: vec_mradds -> _mm_maddubs_epi16 ???




On Nov 15, 2006, at 9:30 AM, Ian Ollmann wrote:


On Nov 15, 2006, at 7:13 AM, John Stiles wrote:

BTW, I think you call an optimized RGB->YUV function via QuickTime.
(Or even more optimized via OpenGL, if you write a pixel shader.)

You can also call vImageMatrixMultiply_<fmt>.

In any case, the answer to your question is that _mm_mulhrs_epi16 + _mm_adds_epi16 is *almost* the same thing as vec_mradds. The double saturation will cause off by one errors for some rare cases near the limits of saturation.

For SSE2, this is the best I know of. It should be exactly correct:

static inline vSInt16 vec_mradds_epi16( vSInt16 a, vSInt16 b, vSInt16 c ) __attribute__ ((always_inline));
static inline vSInt16 vec_mradds_epi16( vSInt16 a, vSInt16 b, vSInt16 c )
{
static const vUInt16 overflowVal = (const vUInt16) {0x4000, 0x4000, 0x4000, 0x4000, 0x4000, 0x4000, 0x4000, 0x4000};
vSInt16 prodHi = _mm_mulhi_epi16( a, b ); //High 16 bits of a * b
vUInt16 prodLo = _mm_mullo_epi16( a, b); //Low 16 bits of a * b
vSInt16 overflow = _mm_cmpeq_epi16( prodHi, overflowVal );


prodHi = _mm_adds_epi16( prodHi, prodHi ); //This operation is exact, except for the case where a * b are -32768, in which case overflow = -1
//so, the real top 15 bits of the product is prodHi - overflow
//In the overflow case, prodLo is zero, so it is always safe to add in the next bit of the product (rounded up)


prodLo = _mm_avg_epu16( _mm_srli_epi16( prodLo, 14 ), _mm_xor_si128( prodLo, prodLo ) ); //calculate the next bit of the rounded up product
prodHi = _mm_add_epi16( prodHi, prodLo ); //and add it in


//Add the overflow into C. The overflow is only a worry if the result is large positive.
//In this case, all positive C yield the same result, so we don't need to worry about numerical errors due to overflowing C when we do this
c = _mm_subs_epi16( c, overflow );


        //saturated add of c to the product gives the result
        return _mm_adds_epi16( prodHi, c );
}

Ian
_______________________________________________
Do not post admin requests to the list. They will be ignored.
PerfOptimization-dev mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >vec_mradds -> _mm_maddubs_epi16 ??? (From: Marc Van Olmen <email@hidden>)
 >Re: vec_mradds -> _mm_maddubs_epi16 ??? (From: Paul Russell <email@hidden>)
 >Re: vec_mradds -> _mm_maddubs_epi16 ??? (From: John Stiles <email@hidden>)
 >Re: vec_mradds -> _mm_maddubs_epi16 ??? (From: Ian Ollmann <email@hidden>)



Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2011 Apple Inc. All rights reserved.