Proposal for S/PDIF extensions, comments welcome
Proposal for S/PDIF extensions, comments welcome
- Subject: Proposal for S/PDIF extensions, comments welcome
- From: Brian Willoughby <email@hidden>
- Date: Fri, 25 Jan 2002 08:27:01 -0800
The topic of MP3 files and support for variable bit rate data has come up
recently on this list. I have also been putting some thought into
CoreAudio support for S/PDIF, and how that might be done sensibly,
preferably by defining a Format ID (e.g. kAudioFormatDigitalInterface) to
indicate these standard streams. I decided to read the latest
documentation (header files) to see how this idea might fit into the
CoreAudio design. Please reference CoreAudioTypes.h:
PROPOSAL
Seeing as how "extensions are required for variable bit rate data and for
constant bit rate data where the channels have unequal sizes," I would
like to propose that these extensions to CoreAudio also allow for
constant bit rate data with additional non-audio bits attached - e.g.
AES/EBU, IEC 958, S/PDIF, & EIAJ CP-340.
The exact format for these audio-with-auxiliary-data streams is not
important, but I would like to make a suggestion which might make my
request clearer in the process.
EXAMPLE
AES3 streams are built of 32-bit subframes which are collected together
into 192-sample blocks (typically 2-channel - i.e. a 64-bit frame /
sample). In each 32-bit subframe are a 1-bit audio sample validity flag,
1-bit user data, 1-bit audio channel status, 1-bit parity, and optionally
4-bit auxiliary data. The latter is only available with the 20-bit
sample option, since the 24-bit sample option collides with the auxiliary
data. It would seem logical to present the AES/EBU stream in software as
a sequence of 32-bit subframes, probably grouped into 192-sample blocks.
Arguably, the usable subframe data could be represented in a 27-bit
value, but it would seem to be more efficient to just use full 32-bit
words. The 4-bit preamble violates biphase rules and cannot be directly
represented, so codes 1, 2, and 3 could be substituted for Preamble 1,
Preamble 2, and Preamble 3. Codes 0 and 4 through 15 would be reserved
in this 4-bit field. It is questionable whether to allow the application
to set parity for output, but it should at least be presented to the
application on input for full disclosure of the digital audio stream
state.
If the data stream were *always* presented in blocks, then the audio
channel status could be decoded by the driver and presented as a 24-byte
decoded array. The user data could be presented similarly, except that
the specification allows for blocking other than the 192-sample block,
and the application would be forced to assemble the user bytes. However,
given that an AES3 block is 4.35 ms, forcing the data to be presented in
these large chunks would cause more latency than the other CoreAudio
formats. Fortunately, the "convenient" representation of AES3 subframes
as 32-bit words allows for block boundaries to be detected even when the
data is presented in smaller chunks. The application would merely need
to detect the appropriate preamble codes (similar to a MIDI parsing, only
at much higher bandwidth!).
The representation suggested above would allow CoreAudio applications to
examine and preserve potentially useful meta-data in an AES3 stream.
Examples include: auxiliary audio data samples or other data in the 4-bit
extension, user bit data as defined by other equipment using AES3, and
access to audio channel status data. Audio channel status data includes:
emphasis, frequency lock, alphanumeric channel origin data, alphanumeric
channel destination data, device local sample address code, time-of-day
sample address code, and non-audio mode indicators for DTS and Dolby
Digital, copyright assertion flag, category code (broadcast with country
code, digital convertor/DSP type, laser-optical/CD, musical instrument
synth/mic, tape/disk DAT/VCR), source number (15), channel number (15),
and clock accuracy (Level I, II, III). I could not find a reference for
SMPTE DAT, but I understand that these non-audio bits are used to place
drop-frame time codes onto DAT in the film industry.
I'm sure not all of the data options above are utilized by digital audio
equipment, but these formats represent formidable standards that have
been accepted worldwide. It would be great if CoreAudio defined a
system-level convention for passing this external data between
applications and drivers without having each hardware vendor define a
different custom extension to their driver. Since these formats exist
independently of the hardware adaptors, and are accepted in the industry,
it makes sense that applications should be able to access the data in a
generic way without requiring driver-specific code.
P.S. I realize that some meta-data, such as SMPTE Timecode, could be
interpreted by the driver and provided via existing CoreAudio mechanisms.
However, even adding a lot of data structures and parameters to CoreAudio
could not fully express the capabilities of AES/EBU, since some of the
auxiliary data can change on a per-sample, per-channel basis.
P.P.S. Another idea would be to attach the meta data as an additional
"channels" in parallel with the standard 32-bit floating point samples
that CoreAudio prefers. Drivers would convert the "normal" channels to
24-bit audio integers and then merge the bits from the meta-data channels
sample by sample.
Brian Willoughby
Sound Consulting