Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[apple scitech] Re: SGEMV slow in Leopard?




Indeed I would read the Apple documentation as JavaMode always being off on PPC and on on Intel.

This discussion is going to be rife with double negatives, since it is a non-Java bit in the VSCR:


	non-Java bit on  -> denormals off
	non-Java bit off ->  denormals on

So to simplify things, I'll just talk about denormals being on or off.

Originally (MacOS 9) denormals were on by default for everything. We introduced MacOS X, which had a new kernel. Due to some confusion on our part related to the above described double negatives, we shipped that with AltiVec denormals off by mistake. Nobody noticed for a couple of years. Around MacOS X.2 or so, we spotted the mistake and tried to turn them back on. However, we found that enough time had gone by that a whole class of applications (mostly audio) had come to depend on denormals being off, and when we turned them on, the denorm penalty was enough to cause the audio to break. I don't recall whether we actually shipped the system to customers or not in this state, or if it was just dev seed releases, but we eventually were forced to turn the denormals back off so as not to break these apps. That is the way I think they are going to stay on PowerPC for the foreseeable future. For PowerPC scalar code, denormals are on by default. (The FPSCR Non-IEEE bit is off.) For AltiVec code, denormals are off on MacOS X. (The VSCR NJ bit is on.)

I think we would have happily kept things that way for Intel. Unfortunately, on our platform, Intel scalar code is largely done on the vector unit. That means we had to turn denormals on for Intel on the vector unit so that the scalar code doesn't break. If you want to turn denormals off on Intel, I urge you to do it this way:

	#include <fenv.h>
	#include <AvailabilityMacros.h>

	#pragma STDC FENV_ACCESS on
	
	#if defined( __i386__ ) || defined( __x86_64__ )
		fenv_t	oldEnv;
		fegetenv( &oldEnv );						// save the old environment for later
		fesetenv( FE_DFL_DISABLE_SSE_DENORMS_ENV );	// turn denormals off

		#if  MAC_OS_X_VERSION_MIN_REQUIRED <= MAC_OS_X_VERSION_10_4
			//Tiger workaround -- make sure that worked
			int mxcsr = _mm_getcsr();
			if( (mxcsr & 0x8040) != 0x8040 )
				_mm_setscr( mxcsr | 0x8040 );
		#endif
	#endif

	...		// do some work with denorms off

	#if defined( __i386__ ) || defined( __x86_64__ )
		fesetenv( &oldEnv );						// restore floating point state
	#endif	


...rather than monkeying with the mxcsr.

Why?

</begin FUD>

We are evaluating (but have not yet completely delivered) methods to have a faster math library entrypoints for the 99% of apps out there that never look at the floating point environment. It relies on you correctly using fenv (per C99 requirement) to change the floating point state. Failure to comply may result in unpredictable math library behavior on a future OS.

The groundwork for that is already in leopard. You may have noticed that many of your math library symbols show up in shark as _powf $fenv_access_off rather than just _powf in Shark. This happens because as a backward compatibility shim. We've aliased the I-don't- care-about-the-floating-point-state entrypoints to the safe ones for now (two names point to the same place.) For some reason, shark and some other tools pick up the alias name rather than the normal name. On a future OS, these may become two different functions. Most (new) code will get the faster fenv_access_off variants. Legacy code and code that relies on the math library doing the right thing to/with the floating point state will get the classic entrypoints. Which one you get depends on whether or not you include fenv.h, use the pragma above and maybe set some compiler flags.

...which brings us to the hammer:

The $fenv_access_off entrypoints assume two things:

1) You aren't going to look at the flags, so we don't need to waste time making sure they are set properly.
2) The floating point environment (rounding mode, denorm behavior) is in the default state,
so we don't have to go through expensive workarounds just in case the rounding mode
might be set to -inf or denormals are turned off.


If you change these things behind our back, using for example _mm_setcsr(), then you might get the wrong entrypoint and the math library might return the wrong answer

</end FUD>

...on some hypothetical future OS + compiler.

At least for now, you don't need to worry. The current compiler + linker resolves all math library calls to the safe entrypoints. The names you see in Shark are just an illusion of Christmas future. Our current math library routines are carefully checked not to set spurious underflow, so it is unlikely that we are using denormals in internal arithmetic except where appropriate. In most cases, you'll likely just get the denorm result flushed to zero, or maybe the correct denorm result returned in some cases. There are however, some functions for which f(denorm) and f(0) are very different. The former might return some normal number whereas the latter might return NaN or Inf and maybe set invalid. In these cases, you may not like the answer you get.

For some future compiler and some future set of $fenv_access_off entrypoints, all bets are off and real chaos may occur.

Ian
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Scitech mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/scitech/email@hidden

This email sent to email@hidden


Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.