Your output callback is called in a more or less "real time" sense. The callback is invoked by the OS as needed to satisfy the requirements of the output device.
See
https://developer.apple.com/library/mac/documentation/musicaudio/Conceptual/AudioUnitProgrammingGuide/AudioUnitDevelopmentFundamentals/AudioUnitDevelopmentFundamentals.html
and read the bit about "Pull Model"
libxmp, or other similar decoding type of software typically generate data faster than real time. On the flip side, they are normally too expensive to call on a realtime thread as well, not to mention have buffer sizes different than your generator.
Basically your "main" method is already running on a separate thread.
You need to accurately detect when the buffer is getting full, and then sleep before generating the next block. You can more or less count on that your buffer will be drained in real time, so if you have a 2 second buffer, then you should sleep for less than 2 seconds. Your margin of safety may vary depending on hardware.
Remember that usleep sleeps AT LEAST for the amount of time you specify. You should consider using a proper thread synchronization mechanism. Have a look at NSCondition, which has the "waitUntilDate:" method
-Kevin