Re: AudioFile -> AudioConverter -> float PCM
Re: AudioFile -> AudioConverter -> float PCM
- Subject: Re: AudioFile -> AudioConverter -> float PCM
- From: William Stewart <email@hidden>
- Date: Thu, 4 Aug 2005 12:27:03 -0700
Just to broaden this discussion abit
On 04/08/2005, at 7:39 AM, email@hidden wrote:
Marc,
Regarding packets vs. frames in relation to channels vs. samples,
here's my current understanding from what documentation exists:
We'll handle frames first, and discuss packets at the end of this
note.
Remember that Version 1 of Audio Units was architected around
Lets just completely forget that V1 audio units ever existed - they
just never happened and we don't care about them anymore :-)
INTERLEAVED streams.
This is also how PCM data is laid out in files for examples. Also,
many devices will present interleaved streams (built in, MOTU's FW
interface present interleaved streams for the different logic
elements of the device - eg, analog ins/outs, digital ins/outs).
So for a 6-CHANNEL (5.1) surround stream, the
individual SAMPLES would be in a FRAME as follows (sample "times"
are A, B, C...) for INTERLEAVED data, in ONE buffer:
BUFFER 1:
A1 A2 A3 A4 A5 A6 B1 B2 B3 B4 B5 B6 C1 C2 C3 C4 C5 C6 ...
<--+---frameA---> <-----frameB----> <-----frameC---->
^
|
individual sample (A2) for channel 2
This is intuitive for the interleaved case, that each interleaved
sample
"lives in" a surrounding frame, and there is one frame per "sample
point
across all channels".
For stereo, it would be LRLRLR of course, where each LR pair of
samples is a frame.
In Version 2 of Audio Units, the convention moved to having
For AudioUnits the cannonical format is to use deinterleaved buffers
because
NON-interleaved buffers because these proved to be more convenient
to process. V2
AudioUnits can (and in fact do) handle interleave data as well using
the AudioConverter and the AUConverter
can still handle interleaved data "at the boundaries"
via the converter, but the canonical format internally is linear PCM
32-bit floating point with SEPARATE buffers for each channel.
So the NON-interleaved version of the above example would look like:
BUFFER 1: A1 B1 C1 ...
BUFFER 2: A2 B2 C2 ...
BUFFER 3: A3 B3 C3 ...
BUFFER 4: A4 B4 C4 ...
BUFFER 5: A5 B5 C5 ...
BUFFER 6: A6 B6 C6 ...
frameA frameB frameC ...
Right, in the stereo case LLLL RRRR
So the frame has now "turned" 90-degrees and cuts ACROSS the buffers.
In both cases, a frame holds all the data points, across all channels,
for a given instant in time.
In the 1-channel case, a frame IS a single sample.
All of the above is for the case where you have ACCESS to individual
samples.
This is also signified in the ASBD where the mBitsPerSample field is
NON-zero - see below
What about compressed or variable bitrate formats where
the smallest manipulable chunk of data contains multiple frames?
That's where the "packet" comes in.
If you have some encoding scheme whereby, say, 1024 frames are
combined
together in some manner, such that the individual frames are
algorithmically commingled and cannot be separated from each other
by simple array indexing, then you need another name for this chunk,
and that name is "packet".
That's why you'll see "frames per packet" in the Audio Stream Basic
Description.
With separable samples, a packet is the same size as a frame, since
you can easily separate data into discrete chunks at the frame level.
(In the degenerate case of just 1 channel with separable samples,
a packet IS a frame IS a sample.)
In the surround stream example above, a frameA and packetA are the
same thing.
It's only when the frames get commingled in the encoding that Apple
makes the distinction between packets and frames, since you can't
access the individual frames without decoding.
Hope this helps,
Peter
There are some additional comments about all of this with CAF Files
(and in the docs for them - http://developer.apple.com/audio). By
using a mathematical representation like this, it is possible to
describe audio data in a complete manner so that a parser needs to
know nothing about the format to be able to read and write audio data
to a CAF file.
It does this by read/write data using Packets (this is also why the
AudioFile API is defined in terms of packets, even for pcm files).
The frames per packet tells you the duration for each packet (when
combined with the sample rate). If bitsPerSample is NON-zero, then
you know that you can go down to the individual digital sample level
- which you can for linear pcm of course. If bitsPerSample is zero,
then you can't - the packet then is an indivisible chunk of data that
would need "decoding" or "decompressing" before you would have
individual samples you can deal then deal with.
Bill
--
mailto:email@hidden
tel: +1 408 974 4056
________________________________________________________________________
__
"Much human ingenuity has gone into finding the ultimate Before.
The current state of knowledge can be summarized thus:
In the beginning, there was nothing, which exploded" - Terry Pratchett
________________________________________________________________________
__
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden