• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Audio recording bitdepth
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Audio recording bitdepth


  • Subject: Re: Audio recording bitdepth
  • From: Brian Willoughby <email@hidden>
  • Date: Wed, 2 Dec 2009 18:34:09 -0800

On Dec 2, 2009, at 15:13, Paul Davis wrote:
IMHO, your notes on about "bit transparency" are pushing the point of honesty.

The techniques that generate bit transparency (i.e. int->float->int ==
original value) fail to map the full dynamic range during at least one
transformation between the two formats.
This is not true. The only range which is not fully reached is the float range of -1.0 to +1.0, and Bjorn points out that this is a nominal range. The definition of nominal is something which exists "in name only." The actual float range does not include +1.0, a distinction which many miss.

Your yourself note that the
problem is that two's complement format generates 1 more negative
value than positive values for the full dynamic range, and thus can't
be represented perfectly by a symmetric floating point range.
Except that there is not problem at all representing the same with a non-symmetric floating point range. Check the archives of this mailing list for proof.

This
therefore creates a conundrum: either you ensure perfect bit
transparency for an int->float->int conversion by introducing
distortion on the inbound transform OR you avoid distortion on the
inbound conversion and add it on the outbound.
You seem to be confused here. There is no distortion if you simply ignore +1.0 and leave it out.


Your table suggests that transparency is the only virtue, whereas in
fact there are two virtues, and you have to choose one. Bit
transparency would presumably be the right goal if you do no
processing whatsoever on the signal  - appropriate for reading integer
format samples from disk, pushing them into CoreAudio or another
float-only audio API, and then back out through an integer format
audio interface.
Upon closer examination, this turns out to be the only case.

However, if you intend to do any kind of processing
on the samples, one can make a case for preserving every bit of the
dynamic range when converting to float and before applying the
computational processing to the values, and then putting up with a
tiny amount of distortion on the outbound.
All analog conversions are done with integer codes. Focusing upon the arbitrary float range is meaningless because you cannot utilize the values without converting to integer.

While it's true that synthetic and processed samples can reach +/-1.0 in float representation, there is no possibility of converting such signals to analog without either clipping or a missed code. Thus, the only way to preserve dynamic range is to avoid +1.0 in the first place.


Paul, I think you've confused yourself by focusing on the limits and not on the content itself.


When you say "preserving every bit of the dynamic range," I believe that you're completely mistaken. When converting from int to float, if you multiply by any constant other than a pure power of two, then you are smearing the dynamic range across multiple bits. In other words, in binary, 32768.0 is 0100000000000000, but 32767.0 is 0111111111111111. Whenever you multiply or divide an integer sample by 32767.0, you're making 15 copies of the original dynamic range and summing them, which actually destroys the original value. It's only by rounding that you get back to the original value. The only way to preserve the original is to use a conversion factor which has only one significant bit in binary. Using 32768.0 merely changes the format of the bits once, without altering their dynamic range at all. Keep in mind that the dynamic range of float is nearly boundless; the -1.0 to +1.0 range is an arbitrary selection, not a true limit.

An important factor to consider is that your actual content will never extend across the entire dynamic range of an integer format unless there is distortion or DC biasing, both of which are not terribly important to preserve. In other words, your desire to preserve the "dynamic range" of a theoretical signal which somehow extends precisely from -32768 to +32767 is pointless because there is no such signal. Only content with compressed dynamics and/or clipping fits that range. If someone were to record or generate a truly undistorted signal with that range, it would be pure luck. A real analog sine wave would need a negative DC bias of one half of 1 LSB, and audio gear nearly always removes DC bias. Even getting close to this theoretical maximum dynamic range would risk clipping distortion, and thus you'll find that it isn't done except where distortion and dynamic range compression is accepted. Without the DC bias, if the signal actually hits -32768, then it must be clipped when captured as an integer because 32767 is the positive limit. You cannot preserve dynamic range that isn't coded.

One way to think of the whole issue is this: A float sample of exactly +1.0 is basically impossible, and should be avoided completely. It will never exist coming from an integer A/D (which is the only kind available) and it will never be preserved without clipping for integer D/A. Thus, it is completely pointless to focus on any kind of conversion factor which attempts to include +1.0 among the possible float values.

All non-synthesized audio originates in twos-complement integer, and therefore the missing positive extreme code is already missing. There is no point in focusing on creating a +1.0 code in float representations when it does not correspond to anything

Obviously, synthetic waveforms could easily touch +1.0, but those examples represent instances where the sample generator should be altered to avoid creating those values in the first place. It's hardly a justification for a flawed conversion factor.

To bring this back to the topic of the mailing list, the choice for CoreAudio to use pure powers of two for conversions between integer and floating point formats is mathematically sound. There is no provable benefit to artificially inducing the occurrence of the +1.0 code in the float representation. You cannot increase dynamic range during conversion, nor can you preserve "more" dynamic range than what is in the original.

My apologies if I am not explaining this clearly.

Brian Willoughby
Sound Consulting

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


References: 
 >Re: Audio recording bitdepth (From: Doug Wyatt <email@hidden>)
 >Re: Audio recording bitdepth (From: Bjorn Roche <email@hidden>)
 >Re: Audio recording bitdepth (From: Bjorn Roche <email@hidden>)
 >Re: Audio recording bitdepth (From: Paul Davis <email@hidden>)

  • Prev by Date: Re: Audio recording bitdepth
  • Next by Date: Re: Audio recording bitdepth
  • Previous by thread: Re: Audio recording bitdepth
  • Next by thread: Re: Audio recording bitdepth
  • Index(es):
    • Date
    • Thread