Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Max simultaneous source count with Remote I/O Callback much lower than with OpenAL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Max simultaneous source count with Remote I/O Callback much lower than with OpenAL

Subject: Max simultaneous source count with Remote I/O Callback much lower than with OpenAL
From: Hari Karam Singh <email@hidden>
Date: Fri, 10 Feb 2012 11:01:34 +0000
Expiry-date: Mon, 09 Apr 2012 23:00:00 +0000
Organization: AncientBohemia Ltd.

Hi,

I'm working on an iPhone4+ app which plays multiple long samples simultaneously, streaming them from the disk.  I original had it working pretty smoothly in OpenAL with 32 sources.  After discovering OpenAL was a wrapper for the AU 3D Mixer, I thought I'd get better performance rewriting the engine (minus all the 3D stuff) as a Remote I/O unit with an input render callback.  I was wrong.  Even though I'm just summing the sources I get about 1/5 the performance - heavy CPU use and cracking up at 10-15 simultaneous sources.

Can anyone help me understand this disparity in performance?  Is the AU 3D Mixer super optimised with assembly is there something I should be doing better in my callback? "register" variables maybe? Or adding more than 1 source to the total with each loop??  I'm getting desperate here!

I've pasted the callback code below (is this encouraged on this list??) and here as well:  http://pastebin.com/AJDVY1j0

A few quick notes first...

The C++ class's (SECAAudioRenderer and SECAPlaybackSource) which hold the source vars and audio data are fairly streamlined. There are no locks, disk reads, memory allocations or obj-c messages.  All C++ method calls are inlined.  OSAtomicCompareAndSwap32 is used  to update flags which are read by the buffer feeding thread. "readFramesFromBuffer" doesn't actual read in or copy any data - it simple returns a pointer to a pre-filled ring buffer and updates it's read head (again via OSAtomic...).

The source and output format is interleaved PCM 16bit integer.  In the middle, I convert to Float32 via inlined operator overrides.  I thought maybe this was the problem but a quick hack to use only int's seemed to indicate that this wasn't a major bottleneck.

It also doesn't seem to matter significantly how large the ioBufferDuration is, nor what the interval is for the thread (currently main) which feeds the ring buffers from the disk (currently called every 0.25s).  The bottleneck really seems to be on the summing portion.  Also, it's just as slow with or without the pitch'ed playback.

Here's the callback code.  Thanks in advance for any help.  I really appreciate it as I'm starting to tear my hair out!

Gratefully,
Hari Karam Singh


static OSStatus SECARenderCallback (void                        *inRefCon,
                                    AudioUnitRenderActionFlags  *ioActionFlags,
                                    const AudioTimeStamp        *inTimeStamp,
                                    UInt32                      inBusNumber,
                                    UInt32                      inNumberFrames,
                                    AudioBufferList             *ioData
                                    )
{
    /////////////////////////////////////////
    // BUFFERS & CONTROL VARS
    /////////////////////////////////////////

    SECAAudioRenderer *renderer = (SECAAudioRenderer *)inRefCon;
    NSInteger highestSourceIdx = renderer->highestSourceIdx;

    MDTimeProfiler *t = renderer->t;

    // Output vars
    SECAAudioSampleInt16 *outputFrames = (SECAAudioSampleInt16 *)ioData->mBuffers[0].mData;  // single interleaved
    UInt32 outputFramesReqd = inNumberFrames; // Note "1 frame" === "1 interleaved L/R sample"


    // If nothing playing then set flag and return
    if (renderer->noSourcesArePlaying()) {
        *ioActionFlags |= kAudioUnitRenderAction_OutputIsSilence;
        return noErr;
    }

    /////////////////////////////////////////
    // RENDER THE OUTPUT
    /////////////////////////////////////////

    UInt32 numOutputFramesCovered = 0;

    // Reset the summing buffer to 0 and ensure it's big enough
    // (EDITOR's NOTE: memsets to 0 and is currently set large enough to not require the failsafe)
    renderer->clearSummingBufferAndCheckSize(outputFramesReqd);

    SECAAudioSampleFloat32 *summingBuffer = renderer->summingBuffer.data();

    //t->start();

    // Loop through the sources, scale for pitch and sum them
    for (int s=0; s<=highestSourceIdx; s++) {

        SECAPlaybackSource *source = &renderer->playbackSources[s];

        if (!source->isPlaying())
            continue;


        // Set finished for those queued to stop
        if (source->isQueuedToStop()) {
            source->callbackSetFinished();
            continue;
        }

        SEAudioParameter pitch = source->pitch();
        SEAudioParameter volume = source->volume();

        // Get frames required wrt pitch scaling.
        // Calculatethe starting read offset (before readFrames updates it).
        // It's = decimal frame position plus 1 frame * scaling for pitch
        Float32 sourceFramesReqd = (Float32)pitch * (Float32)outputFramesReqd;
        Float32 frameStartOffset = source->frameReadPos - floor(source->frameReadPos);

        // Read the frames and get the "real" (ie output) frames it corresponds to
        // in case it's less than requested.  It should be a whole number but
        // round j.i.c.
        SECAAudioSampleInt16 *sourceFrames;
        NSUInteger framesAvailableForOutput = round(source->readFramesFromBuffer((void **)&sourceFrames, sourceFramesReqd) / pitch);

      //  SELOG1("CB: Source %p: vol=%.1f pitch=%.1f framesAvail=%u", source, volume, pitch, framesAvailableForOutput);
        // The read start position for the source = lastPos + 1 * pitch
        Float32 relativeFrameReadPos;

        // Calculate the frames...
        for (int f=0; f<framesAvailableForOutput; f++) {


            // Read position for source scaled as per pitch
            relativeFrameReadPos = (Float32)f * pitch + frameStartOffset;

            // If right on an integer frame then use that value
            Float32 frameFraction = relativeFrameReadPos - floor(relativeFrameReadPos);

            SECAAudioSampleFloat32 sOut;    // sources sample output

            if (frameFraction == 0) {
                // Convert to float sample first
                sOut = (SECAAudioSampleFloat32)sourceFrames[(int)relativeFrameReadPos];

            } else {
                // otherwise scale between the 2
                SECAAudioSampleFloat32 s1 = sourceFrames[(int)floor(relativeFrameReadPos)];
                SECAAudioSampleFloat32 s2 = sourceFrames[(int)ceil(relativeFrameReadPos)];

                sOut.left = (s2.left - s1.left) * frameFraction + s1.left;
                sOut.right = (s2.right - s1.right) * frameFraction + s1.right;
            }

            summingBuffer[f].left += volume * sOut.left;
            summingBuffer[f].right += volume * sOut.right;

            //
            // TODO - LIMITER
            //

            /// Clipping is handled in the {@see SECAAudioSampleFloat32::operator SECAAudioSampleInt16()}
        }

        // End-of-Buffer check. Check EOF first otherwise it's a dropout!
        if (framesAvailableForOutput != outputFramesReqd && source->isEOF()) {
            source->callbackSetFinished();

        } else if (framesAvailableForOutput != outputFramesReqd) {
            //SELOG1(@"DROPOUT! The disk reader didn't fill the buffer in time.");
        }

        // Update the maximum output frames which we've covered thus far
        if (numOutputFramesCovered < framesAvailableForOutput) {
            numOutputFramesCovered = framesAvailableForOutput;
        }
    }

    //t->mark(1);

    // Convert into our output buffer and zero pad any remainder
    for (int f=0; f<numOutputFramesCovered; f++) {
        outputFrames[f] = (SECAAudioSampleInt16)summingBuffer[f];
    }


    if (numOutputFramesCovered < outputFramesReqd) {
        memset(outputFrames + numOutputFramesCovered, 0, (outputFramesReqd - numOutputFramesCovered) * sizeof(outputFrames[0]));
    }

    return noErr;
}





 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

Follow-Ups:
- Re: Max simultaneous source count with Remote I/O Callback much lower than with OpenAL
  - From: Michael Tyson <email@hidden>

Prev by Date: Re: 'aumf' Audio Units not showing in Soundtrack Pro 3?
Next by Date: unable to get audio device name for Apogee ONE
Previous by thread: Re: 'aumf' Audio Units not showing in Soundtrack Pro 3?
Next by thread: Re: Max simultaneous source count with Remote I/O Callback much lower than with OpenAL
Index(es):
- Date
- Thread