Re: ExtAudioFileSeek and ExtAudioFileTell?
Re: ExtAudioFileSeek and ExtAudioFileTell?
- Subject: Re: ExtAudioFileSeek and ExtAudioFileTell?
- From: Stephen Davis <email@hidden>
- Date: Wed, 1 Apr 2009 01:20:33 -0700
[warning: off-the-cuff answer written while dosed with cold medicine
so likely to contain errors but hopefully will give you enough info to
stop beating your head against the wall of confusion and make some
progress, or at least give you enough info to ask more questions]
Compressed files like AAC and MP3 have one or more packets at the
beginning of the file that represent the internal delay through the
psychoacoustic encoder and, as such, are supposed to be removed from
the resulting decoded sample frames before getting sent to the
destination. The decoder needs to be fed these packets to prime the
pump, as it were, but the sample frames involved don't contribute to
the final output. To clarify, a non-integral multiple of the frames-
per-packet value should get dropped from the beginning of the file and
that value varies by encoder. The packet size for AAC is most often
1024 frames per packet and I believe the "drop this many leading
frames" value for the Nero encoder in question is 1600 while the Apple
encoder is usually 2112. BTW, other versions of the Nero encoder have
used 1024 as the value.
Historically speaking, the Nero encoder has used slightly "illegal"
mechanisms (in the strict MPEG-4 sense b/c there is no official
mechanism that works reliably, not to start a flame war on *that*
subject) to convey this information so 3rd party software attempting
to interpret the information might be getting it wrong. In this case,
the AudioFile implementation is a 3rd party interpretation. Note that
the iTunes gapless information storage method is also "illegal".
I don't have the headers in front of me at the moment but there is
probably a property you can query to get the audio file packet table
info which specifies the number of valid sample frames in the file,
the encoder delay value (e.g. 1600) and a corresponding encoder
"drain" value for the end of the file which is the number of sample
frames at the end which should be cut off early.
Hope that made some sense. I'm not sure if what you're seeing is a
bug in the implementation or if you just need to account for the
additional gapless info to get what you want. You might want to try
with an .m4a file generated from the WAV/AIFF source by afconvert or
iTunes and see if those files behave properly with your code as it is
now.
hth,
stephen
On Mar 31, 2009, at 11:15 PM, James Chandler Jr wrote:
Hi yet another dumb question. Mac OS 10.5.6 Build 9G55. Same
behavior noted on multiple Macs.
We have a music app which splices small pieces of source audio files
into memory-resident composite multi-tracks. Works great for WAV and
AIF, and ALMOST works great for M4a. The source audio tracks are
huge and we usually deliver the source tracks in compressed format.
WMA on PC, and planned M4a audio files for Mac.
ExtAudioFileSeek documentation says:
"Sets the file’s read position to the specified sample frame number.
A subsequent call to the ExtAudioFileRead function returns samples
from precisely this location, even if it is located in the middle of
a packet."
The seek does sample-accurate positioning, but with M4a there are
fixed offsets from the true requested position.
I have tested several encoders' M4a files but the current test M4a's
were all compressed from WAV by the NERO MP4 command-line codec.
Reported symptoms may vary if seeking into files made by other
codecs. I want to discover a way to accurately seek into ANY M4a
file. It would be a bummer to only be able to seek into 'special
blessed M4a files'.
The behavior is difficult to clearly explain. Apologies if I don't
explain it well.
Our code, ofter opening the ExtAudioFile, does an ExtAudioFileSeek,
and then it does a read loop using ExtAudioFileRead until the
desired number of frames have been retrieved. Each ExtAudioFileRead
returns 16384 frames until the final partial buffer read to satisfy
the request. The code appears to be correct, and functions perfectly
on WAV and AIF.
Seeking within < 1024 frames of file beginning gives different
symptoms than ExtAudioFileSeek with inFrameOffset >= 1024.
The behavior is identical on all the Nero-encoded M4a test files--
If the ExtAudioFileSeek inFrameOffset >= 1024, then the seek and
subsequent reads will actually load into memory the sample frame
(inFrameOffset + 1600).
If I seek to sample 44100, it actually starts reading at (44100 +
1600). If I ask it to seek to sample 441,000, then it actually
starts reading at (441,000 + 1600).
With the Nero M4a files, as long as I seek somewhere past 2624
frames, I can get the sample I REALLY want, by REQUESTING
[DesiredSampleFrame - 1600]!!!
Debugging with ExtAudioFileTell functions, I get results like this:
//in the case of DesiredSampleFrame >= 1024
ExtAudioFileSeek(FileRef, DesiredSampleFrame); //input
DesiredSampleFrame
ExtAudioFileTell(FileRef, ActualSampleFrame); //returned
ActualSampleFrame = (DesiredSampleFrame - 2112)
ExtAudioFileRead(FileRef, theNumFrames, FillBufList); //returned
theNumFrames = 16384
ExtAudioFileTell(FileRef, ActualSampleFrame); //returned
ActualSampleFrame = (Original DesiredSampleFrame + 16448)
The numbers almost kinda sorta make sense, but not quite.
Is there a way to reconcile Tell against Seek to somehow guestimate
the real position, and always load audio from the desired offset?
Our audio output has surprising degradation of 'musical feel' even
with 1 ms of random slop on individual file seeks. I wouldn't expect
such small slop to make an audible difference, but it is easily
audible.
For this set of Nero-encoded files, I can fudge the requested
numbers to always get the desired results, but the magic numbers may
vary with differently-compressed M4a files.
With the Nero files, I can reliably get the desired sample with this
fudging method--
If DesiredSampleFrame >= 2624, then I request (DesiredSampleFrame -
1600)
Otherwise, I request sample 512, and memory move down the returned
audio, by DesiredSampleFrame.
That is the other ugly thing-- If I request sample 0, the beginning
of the file will load to location 512 in memory.
If I request sample 512, the beginning of the file will load to
location 0 in memory.
If I request sample 1023, then sample 511 will load to location 0 in
memory.
If I request sample 1024, then sample 2624 will load to location 0
in memory. Eeek! Ack!
There is one consistent offset for all seeks < 1024, and a different
consistent offset for all seeks >= 1024. Maddening.
Thanks for any ideas.
James Chandler Jr.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden