On May 12, 2006, at 2:44 PM, Ben Weiss wrote: On May 12, 2006, at 4:57 AM, Sanjay Patel wrote: 3. When a pointer (to say, a char) is cast to a vUInt16* pointer,  gcc implicitly assumes that it is 16-byte aligned, and loads data  with movdqa. Is it legal for gcc to assume this? Do misaligned  loads always have to be coded explicitly?
Minimal test case:
#include <xmmintrin.h>
__m128 foo( char* a ) { Â Â Â Â return *( __m128* )( a ); }
Codegen: ...     movl  8(ëp), êx     movaps (êx), %xmm0 ...
Exactly; 'movaps' is an aligned load instruction, and will crash if char* a is misaligned. Strictly speaking, casting your pointer to a different type is not legal 'C'. So I think the answer is going to be "your code is wrong, and the compiler can generate whatever it wants."
(envisioning the chaos that would ensue if compilers suddenly decided to enforce this...)Â Â If it's illegal C, why doesn't gcc generate an error (or at least a warning?)
I think this is a semantic issue. The syntax is legal C, but I'm not sure if the result is defined. See, for example, the section on -fstrict-aliasing in the gcc man page. -fstrict-aliasing is not turned on by default at any optimization level in Apple's compiler, despite what the man page says, but that section is still a good guide to how unexpected type conversions can affect optimizers. Always use the _mm_load* intrinsics with SSE code. You should *never* cast MMX/SSE data - not even from __m128 to __m128i, etc. Some compilers (ICC) will not accept the syntax.
Apple's Altivec/SSE page says:
"SSE has a wide variety of data type conversions. Like AltiVec, if you wish to simply use a vector of one type (e.g. vFloat) as a vector of another type (e.g. vSInt32) without changing the bits, you can do that with a simple typecast: vFloat {1.0f, 1.0f, 1.0f, 1.0f }; vSInt32 one; Are they wrong? I've never seen fancy intrinsics used for casts like this...
They should probably update that page, yeah. Can you file a bug about it? Anyway, using the _mm_loadu intrinsics for unaligned loads is clearly the best approach; I was more wondering why gcc generates aligned loads as a default, and whether that's the "right thing" for the compiler to do. The compiler should generate aligned loads when it has reason to believe the data is aligned, which it does in most cases on Mac OS X. The question of whether that's true in this particular case is probably something best settled by filing a bug report, which'll end up with one of the compiler engineers....
-Eric
|