Max simultaneous source count with Remote I/O Callback much lower than with OpenAL
Max simultaneous source count with Remote I/O Callback much lower than with OpenAL
- Subject: Max simultaneous source count with Remote I/O Callback much lower than with OpenAL
- From: Hari Karam Singh <email@hidden>
- Date: Fri, 10 Feb 2012 11:01:34 +0000
- Expiry-date: Mon, 09 Apr 2012 23:00:00 +0000
- Organization: AncientBohemia Ltd.
Hi,
I'm working on an iPhone4+ app which plays multiple long samples simultaneously, streaming them from the disk. I original had it working pretty smoothly in OpenAL with 32 sources. After discovering OpenAL was a wrapper for the AU 3D Mixer, I thought I'd get better performance rewriting the engine (minus all the 3D stuff) as a Remote I/O unit with an input render callback. I was wrong. Even though I'm just summing the sources I get about 1/5 the performance - heavy CPU use and cracking up at 10-15 simultaneous sources.
Can anyone help me understand this disparity in performance? Is the AU 3D Mixer super optimised with assembly is there something I should be doing better in my callback? "register" variables maybe? Or adding more than 1 source to the total with each loop?? I'm getting desperate here!
I've pasted the callback code below (is this encouraged on this list??) and here as well: http://pastebin.com/AJDVY1j0
A few quick notes first...
The C++ class's (SECAAudioRenderer and SECAPlaybackSource) which hold the source vars and audio data are fairly streamlined. There are no locks, disk reads, memory allocations or obj-c messages. All C++ method calls are inlined. OSAtomicCompareAndSwap32 is used to update flags which are read by the buffer feeding thread. "readFramesFromBuffer" doesn't actual read in or copy any data - it simple returns a pointer to a pre-filled ring buffer and updates it's read head (again via OSAtomic...).
The source and output format is interleaved PCM 16bit integer. In the middle, I convert to Float32 via inlined operator overrides. I thought maybe this was the problem but a quick hack to use only int's seemed to indicate that this wasn't a major bottleneck.
It also doesn't seem to matter significantly how large the ioBufferDuration is, nor what the interval is for the thread (currently main) which feeds the ring buffers from the disk (currently called every 0.25s). The bottleneck really seems to be on the summing portion. Also, it's just as slow with or without the pitch'ed playback.
Here's the callback code. Thanks in advance for any help. I really appreciate it as I'm starting to tear my hair out!
Gratefully,
Hari Karam Singh
static OSStatus SECARenderCallback (void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData
)
{
/////////////////////////////////////////
// BUFFERS & CONTROL VARS
/////////////////////////////////////////
SECAAudioRenderer *renderer = (SECAAudioRenderer *)inRefCon;
NSInteger highestSourceIdx = renderer->highestSourceIdx;
MDTimeProfiler *t = renderer->t;
// Output vars
SECAAudioSampleInt16 *outputFrames = (SECAAudioSampleInt16 *)ioData->mBuffers[0].mData; // single interleaved
UInt32 outputFramesReqd = inNumberFrames; // Note "1 frame" === "1 interleaved L/R sample"
// If nothing playing then set flag and return
if (renderer->noSourcesArePlaying()) {
*ioActionFlags |= kAudioUnitRenderAction_OutputIsSilence;
return noErr;
}
/////////////////////////////////////////
// RENDER THE OUTPUT
/////////////////////////////////////////
UInt32 numOutputFramesCovered = 0;
// Reset the summing buffer to 0 and ensure it's big enough
// (EDITOR's NOTE: memsets to 0 and is currently set large enough to not require the failsafe)
renderer->clearSummingBufferAndCheckSize(outputFramesReqd);
SECAAudioSampleFloat32 *summingBuffer = renderer->summingBuffer.data();
//t->start();
// Loop through the sources, scale for pitch and sum them
for (int s=0; s<=highestSourceIdx; s++) {
SECAPlaybackSource *source = &renderer->playbackSources[s];
if (!source->isPlaying())
continue;
// Set finished for those queued to stop
if (source->isQueuedToStop()) {
source->callbackSetFinished();
continue;
}
SEAudioParameter pitch = source->pitch();
SEAudioParameter volume = source->volume();
// Get frames required wrt pitch scaling.
// Calculatethe starting read offset (before readFrames updates it).
// It's = decimal frame position plus 1 frame * scaling for pitch
Float32 sourceFramesReqd = (Float32)pitch * (Float32)outputFramesReqd;
Float32 frameStartOffset = source->frameReadPos - floor(source->frameReadPos);
// Read the frames and get the "real" (ie output) frames it corresponds to
// in case it's less than requested. It should be a whole number but
// round j.i.c.
SECAAudioSampleInt16 *sourceFrames;
NSUInteger framesAvailableForOutput = round(source->readFramesFromBuffer((void **)&sourceFrames, sourceFramesReqd) / pitch);
// SELOG1("CB: Source %p: vol=%.1f pitch=%.1f framesAvail=%u", source, volume, pitch, framesAvailableForOutput);
// The read start position for the source = lastPos + 1 * pitch
Float32 relativeFrameReadPos;
// Calculate the frames...
for (int f=0; f<framesAvailableForOutput; f++) {
// Read position for source scaled as per pitch
relativeFrameReadPos = (Float32)f * pitch + frameStartOffset;
// If right on an integer frame then use that value
Float32 frameFraction = relativeFrameReadPos - floor(relativeFrameReadPos);
SECAAudioSampleFloat32 sOut; // sources sample output
if (frameFraction == 0) {
// Convert to float sample first
sOut = (SECAAudioSampleFloat32)sourceFrames[(int)relativeFrameReadPos];
} else {
// otherwise scale between the 2
SECAAudioSampleFloat32 s1 = sourceFrames[(int)floor(relativeFrameReadPos)];
SECAAudioSampleFloat32 s2 = sourceFrames[(int)ceil(relativeFrameReadPos)];
sOut.left = (s2.left - s1.left) * frameFraction + s1.left;
sOut.right = (s2.right - s1.right) * frameFraction + s1.right;
}
summingBuffer[f].left += volume * sOut.left;
summingBuffer[f].right += volume * sOut.right;
//
// TODO - LIMITER
//
/// Clipping is handled in the {@see SECAAudioSampleFloat32::operator SECAAudioSampleInt16()}
}
// End-of-Buffer check. Check EOF first otherwise it's a dropout!
if (framesAvailableForOutput != outputFramesReqd && source->isEOF()) {
source->callbackSetFinished();
} else if (framesAvailableForOutput != outputFramesReqd) {
//SELOG1(@"DROPOUT! The disk reader didn't fill the buffer in time.");
}
// Update the maximum output frames which we've covered thus far
if (numOutputFramesCovered < framesAvailableForOutput) {
numOutputFramesCovered = framesAvailableForOutput;
}
}
//t->mark(1);
// Convert into our output buffer and zero pad any remainder
for (int f=0; f<numOutputFramesCovered; f++) {
outputFrames[f] = (SECAAudioSampleInt16)summingBuffer[f];
}
if (numOutputFramesCovered < outputFramesReqd) {
memset(outputFrames + numOutputFramesCovered, 0, (outputFramesReqd - numOutputFramesCovered) * sizeof(outputFrames[0]));
}
return noErr;
}
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden