Hi
I'm translating some code I wrote 3 years ago, scratching my head
because it is my first week in SSE.
RGB to YUV conversion that and now needs to convert it to SSE code
// Y = ( 8414 R + 16519 G + 3208 B)/32768 + 16
// Cb = (-4857 R - 9535 G + 14392 B)/32768 + 128
// Cr = (14392 R - 12052 G - 2341 B)/32768 + 128
// Convert the first three input vectors. Note that
// only the top 16 bits of the 32 bit product are
// stored. This is the same as doing the divide by 32768.
v07_signed_short = vec_mradds( v04_signed_short, v12_Const_8414,
(vSInt16)v12_Const_0 ); // (R0 .. R7) * 8414
v08_signed_short = vec_mradds( v04_signed_short,
v12_Const_NEG_4857, (vSInt16)v12_Const_0 ); // (R0 .. R7) *
-4857
v09_signed_short = vec_mradds( v04_signed_short, v12_Const_14392,
(vSInt16)v12_Const_0 ); // (R0 .. R7) * 14392
v07_signed_short = vec_mradds( v05_signed_short, v12_Const_16519,
v07_signed_short ); // += (G0 .. G7) * 16519
v08_signed_short = vec_mradds( v05_signed_short,
v12_Const_NEG_9535, v08_signed_short ); // += (G0 .. G7) * -9535
v09_signed_short = vec_mradds( v05_signed_short,
v12_Const_NEG_12052, v09_signed_short ); // += (G0 .. G7) * -12052
v07_signed_short = vec_mradds( v06_signed_short, v12_Const_3208,
v07_signed_short ); // Y+= (B0 .. B7) * 3208
v08_signed_short = vec_mradds( v06_signed_short, v12_Const_14392,
v08_signed_short ); // Cb += (B0 .. B7) * 14392
v09_signed_short = vec_mradds( v06_signed_short,
v12_Const_NEG_2341, v09_signed_short ); // Cr+= (B0 .. B7) * -2341
I was wondering if vec_mradds can be translated to the SSE3
_mm_maddubs_epi16
I SSE3 is core 2 duo and XEON 5100 which is fine for me.
or has anyone suggestions how I can translate above code?