llvm-gcc-4.2 generates incorrect code for certain SSE intrinsics (RADAR #11934110)
llvm-gcc-4.2 generates incorrect code for certain SSE intrinsics (RADAR #11934110)
- Subject: llvm-gcc-4.2 generates incorrect code for certain SSE intrinsics (RADAR #11934110)
- From: Paul Russell <email@hidden>
- Date: Mon, 23 Jul 2012 10:47:33 +0100
Just an FYI and wondering if anyone else has seen this problem or anything similar - llvm-gcc-4.2 seems to generate incorrect code for certain SSE intrinsics. The following code demonstrates the problem:
#include <stdio.h>
#include <tmmintrin.h> // SSSE3
#include <Accelerate/Accelerate.h>
vUInt8 _mm_hmax_epu8(const vUInt8 v)
{
vUInt8 vmax = v;
vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 1));
vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 2));
vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 4));
vmax = _mm_max_epu8(vmax, _mm_alignr_epi8(vmax, vmax, 8));
return vmax;
}
int main(void)
{
vUInt8 v1 = _mm_setr_epi8(34, 201, 96, 11, 28, 149, 66, 87, 12, 56, 76, 84, 51, 175, 91, 45);
vUInt8 v2;
printf("v1 = %vu\n", v1);
v2 = _mm_hmax_epu8(v1);
printf("v2 = %vu\n", v2);
return 0;
}
$ gcc -Wall -mssse3 _mm_hmax_epu8.c -framework Accelerate -o _mm_hmax_epu8 && ./_mm_hmax_epu8
gives:
v1 = 34 201 96 11 28 149 66 87 12 56 76 84 51 175 91 45
v2 = 201 201 201 175 201 201 201 175 201 201 201 175 201 201 201 175
Compiling this with regular gcc 4.2 gives the correct result:
v1 = 34 201 96 11 28 149 66 87 12 56 76 84 51 175 91 45
v2 = 201 201 201 201 201 201 201 201 201 201 201 201 201 201 201 201
Looking at the generated code it appears that llvm-gcc is trying to convert _mm_alignr_epi8 to something other than PALIGNR for certain shifts, but the logic for this is incorrect.
Paul
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xcode-users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden