RE: embedding sound problem
RE: embedding sound problem
- Subject: RE: embedding sound problem
- From: "Edwards, Waverly" <email@hidden>
- Date: Mon, 12 Jul 2010 08:49:17 -0500
- Acceptlanguage: en-US
- Thread-topic: embedding sound problem
- James Chandler wrote in a previous email -
<<
I wonder what would be 'close enough for rock'n'roll'? Would it be too imprecise to either normalize both signals, or use dynamic processing to auto-gain both signals, and then do the mixing and assume that the relationship is close enough?
>>
After doing a lot of homework on the subject I have concluded that I should not try to do this in real-time but to do this offline. I have also concluded that my best option is to normalize both sound sources, reduce the speech source, mix then reduce the amplitude of overall product.
I looked at weighted measures, AGC, replay gain, companders and a lot more.
Ultimately, I surmised that your ideas were right on target for what I need. It took a few more days of research to come to the same conclusion that you did. I do think the independent research that intersected with your conclusion was a valuable use of my time.
Thank you kindly for your insights,
Waverly
-----Original Message-----
From: James Chandler Jr [mailto:email@hidden]
Sent: Wednesday, July 07, 2010 4:42 PM
To: Edwards, Waverly
Cc: 'email@hidden'
Subject: Re: embedding sound problem
Hi Waverly
The phrase "close enough for rock'n'roll" is a slang phrase which means a solution that may be far from perfect, but it is good enough to suffice. Another common variant is "close enough for jazz." But jazz has generally high performance standards, and so the jazz variant of the phrase doesn't imply enough slop in the tolerated imperfect solution <g>. Also common-- "close enough for government work."
In dynamic range compressors, there is the common 'compressor' which does not adjust audio below a threshold, but it reduces dynamic range above a threshold. For instance, perhaps a compressor would not affect phrases in a signal whose current level is below -12 dB, or whatever is selected as the threshold.
Then there is Automatic Gain Control, which has a much wider range of signals processed.
http://en.wikipedia.org/wiki/Automatic_gain_control
There isn't any threshold in an AGC or leveling amp. The process just attempts to keep the output at a desired level regardless whether the input is loud or soft.
AGC as often implemented, such as cheap portable tape recorders of the past-- It would use fairly long attack and release constants so that short-term dynamics of less than one or two seconds would be preserved, but in the long term very quiet phrases in the audio would be about the same output level as very loud phrases in the audio.
Your envelope Attack and Release time constants can affect what the compressor or AGC is responding to. With short attack time constant, it would tend to work as a peak-responding AGC, and level all peaks in the program materal. With longer attack and release time constants, it would allow short-term amplitude variations, but tend to work as an average-responding AGC.
So once you have an AGC algorithm working (one AGC to operate on the source, and another AGC to operate on the mask)-- You could easily experiment with the time constants to see which works better-- Leveling against peaks (short time constants), or leveling against short-term averages (medium time constants), or leveling against long-term averages (long time constants).
Perhaps for your use, an attack time in the ballpark of 20 to 100 ms, and a release time in the ballpark of 1 or 2 seconds? It is just a guess, based on common AGC settings for old electronic gear. Maybe something different would be better.
You could also do RMS sensing for the control envelope, but I'd guess that average sensing might be 'close enough for rock'n'roll'.
One other trick which MIGHT be useful-- It might be beneficial to use frequency-shaping to feed the envelope detection. It may make the auto-level more 'constant to the ear' to use an inverse fletcher-munson curve or some other kind of frequency weighting. For instance, on USA sound level meters, one often has a choice of flat, A weighting, or C weighting. Also, a choice of fast or slow response.
http://en.wikipedia.org/wiki/Sound_level_meter
http://en.wikipedia.org/wiki/A-weighting
Perhaps it could make the digital algorithm more 'scientifically sound' if one could steal weightings and time constants from one of the sound level metering standards?
Apologies if these ideas are too far in left field. Actually applying these ideas is not uber-complicated, once one decides what should be done.
jcjr
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden