On 15/08/2010, at 3:47 PM, Bob Ingraham wrote:
Fair enough (yes, I should have run a few numbers through that one.)
So, here;s the issues: I have signed 16-bit integers (-32768 to 32767) representing a quantized wave form (below / above x-axis) and I want to add these "waves" together.
So, I started out with (assuming 2:1 down-mix):
Mixed-Sample = Sample1 + Sample2
But this produces overflow/underflow and I assumed that just allowing an overflow/underflow to "wrap" according to the rules of signed 16-bit integer arithmetic doesn't produce a correct result.
Quite right, your waveform will clip and you'll hear static. Wrapping will give you aliasing, which won't sound very nice either.
So I tried, in an attempt to "re-scale" the resultant sum back into the 16-bit domain:
Mixed-Sample = (Sample1 + Sample2) / 2
But this doesn't work either.
In what way won't it work? This logic seems sound if the sample rates match. You're reducing the volume of both signals to 50% so that there won't be clipping.
So, how does one properly model the addition of two-or-more sound waves, represented in digital LPCM?
How do you handle overflow/underflow?
In a digital system overflow (ie volume above 0dBFS (ie digital 1.0/-1.0 for float) will give you clipping.
Just clamp values outside the range to the max value (i.e., values over 32767 are set to 32767)?
This will sound terrible.
That doesn't seem to scale well when mixing more channels. In the app I'm working on, I need to mix up to 8 channels together. In that extreme case, the resultant summed "waveform" will certainly be outside the valid sample-value range.
Or does one apply some scaling factor based on the total number of channels being down-mixed together?
If so, how does one derive said scaling factor?
The scaling factor should be /N, but you're reducing the volume of each signal input by a factor of N.
I note you mentioned silence detection. You may want to check this algorithm, as you may be distorting your waveform by sampling with too short a time period.
As a working example. I downmix 5.1 to stereo in one of my apps, the code looks like:
// 5.1 - SMPTE order - ie L R C LFE Ls Rs
for (UInt32 i=0; i<frameCount; i++) {
// Left
floatBuffer[i*2] = floatInputBuffer[i*_channelCount]; // L -> L
floatBuffer[i*2] += floatInputBuffer[i*_channelCount+2] * minus3dB; // C -3dB -> L
floatBuffer[i*2] += floatInputBuffer[i*_channelCount+4] * minus3dB; // Ls -3dB -> L
// Right
floatBuffer[i*2+1] = floatInputBuffer[i*_channelCount+1]; // R -> R
floatBuffer[i*2+1] += floatInputBuffer[i*_channelCount+2] * minus3dB; // C -3dB -> R
floatBuffer[i*2+1] += floatInputBuffer[i*_channelCount+5] * minus3dB; // Rs -3dB -> R
}
minus3dB is the square root of 0.5, and floatInputBuffer is interleaved Float32 samples in L R C LFE Ls Rs (SMPTE) order. In theory this could clip if the L or R channels is very loud at any point in time, but I haven't hit an issue in testing. You could easily multiply your summed value by 0.5 or 0.7 if you strike clipping.
Ryan.