Re: Audio recording bitdepth
Re: Audio recording bitdepth
- Subject: Re: Audio recording bitdepth
- From: Brian Willoughby <email@hidden>
- Date: Wed, 2 Dec 2009 18:34:09 -0800
On Dec 2, 2009, at 15:13, Paul Davis wrote:
IMHO, your notes on about "bit transparency" are pushing the point
of honesty.
The techniques that generate bit transparency (i.e. int->float->int ==
original value) fail to map the full dynamic range during at least one
transformation between the two formats.
This is not true. The only range which is not fully reached is the
float range of -1.0 to +1.0, and Bjorn points out that this is a
nominal range. The definition of nominal is something which exists
"in name only." The actual float range does not include +1.0, a
distinction which many miss.
Your yourself note that the
problem is that two's complement format generates 1 more negative
value than positive values for the full dynamic range, and thus can't
be represented perfectly by a symmetric floating point range.
Except that there is not problem at all representing the same with a
non-symmetric floating point range. Check the archives of this
mailing list for proof.
This
therefore creates a conundrum: either you ensure perfect bit
transparency for an int->float->int conversion by introducing
distortion on the inbound transform OR you avoid distortion on the
inbound conversion and add it on the outbound.
You seem to be confused here. There is no distortion if you simply
ignore +1.0 and leave it out.
Your table suggests that transparency is the only virtue, whereas in
fact there are two virtues, and you have to choose one. Bit
transparency would presumably be the right goal if you do no
processing whatsoever on the signal - appropriate for reading integer
format samples from disk, pushing them into CoreAudio or another
float-only audio API, and then back out through an integer format
audio interface.
Upon closer examination, this turns out to be the only case.
However, if you intend to do any kind of processing
on the samples, one can make a case for preserving every bit of the
dynamic range when converting to float and before applying the
computational processing to the values, and then putting up with a
tiny amount of distortion on the outbound.
All analog conversions are done with integer codes. Focusing upon
the arbitrary float range is meaningless because you cannot utilize
the values without converting to integer.
While it's true that synthetic and processed samples can reach +/-1.0
in float representation, there is no possibility of converting such
signals to analog without either clipping or a missed code. Thus,
the only way to preserve dynamic range is to avoid +1.0 in the first
place.
Paul, I think you've confused yourself by focusing on the limits and
not on the content itself.
When you say "preserving every bit of the dynamic range," I believe
that you're completely mistaken. When converting from int to float,
if you multiply by any constant other than a pure power of two, then
you are smearing the dynamic range across multiple bits. In other
words, in binary, 32768.0 is 0100000000000000, but 32767.0 is
0111111111111111. Whenever you multiply or divide an integer sample
by 32767.0, you're making 15 copies of the original dynamic range and
summing them, which actually destroys the original value. It's only
by rounding that you get back to the original value. The only way to
preserve the original is to use a conversion factor which has only
one significant bit in binary. Using 32768.0 merely changes the
format of the bits once, without altering their dynamic range at
all. Keep in mind that the dynamic range of float is nearly
boundless; the -1.0 to +1.0 range is an arbitrary selection, not a
true limit.
An important factor to consider is that your actual content will
never extend across the entire dynamic range of an integer format
unless there is distortion or DC biasing, both of which are not
terribly important to preserve. In other words, your desire to
preserve the "dynamic range" of a theoretical signal which somehow
extends precisely from -32768 to +32767 is pointless because there is
no such signal. Only content with compressed dynamics and/or
clipping fits that range. If someone were to record or generate a
truly undistorted signal with that range, it would be pure luck. A
real analog sine wave would need a negative DC bias of one half of 1
LSB, and audio gear nearly always removes DC bias. Even getting
close to this theoretical maximum dynamic range would risk clipping
distortion, and thus you'll find that it isn't done except where
distortion and dynamic range compression is accepted. Without the DC
bias, if the signal actually hits -32768, then it must be clipped
when captured as an integer because 32767 is the positive limit. You
cannot preserve dynamic range that isn't coded.
One way to think of the whole issue is this: A float sample of
exactly +1.0 is basically impossible, and should be avoided
completely. It will never exist coming from an integer A/D (which is
the only kind available) and it will never be preserved without
clipping for integer D/A. Thus, it is completely pointless to focus
on any kind of conversion factor which attempts to include +1.0 among
the possible float values.
All non-synthesized audio originates in twos-complement integer, and
therefore the missing positive extreme code is already missing.
There is no point in focusing on creating a +1.0 code in float
representations when it does not correspond to anything
Obviously, synthetic waveforms could easily touch +1.0, but those
examples represent instances where the sample generator should be
altered to avoid creating those values in the first place. It's
hardly a justification for a flawed conversion factor.
To bring this back to the topic of the mailing list, the choice for
CoreAudio to use pure powers of two for conversions between integer
and floating point formats is mathematically sound. There is no
provable benefit to artificially inducing the occurrence of the +1.0
code in the float representation. You cannot increase dynamic range
during conversion, nor can you preserve "more" dynamic range than
what is in the original.
My apologies if I am not explaining this clearly.
Brian Willoughby
Sound Consulting
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden