Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: ExtAudioFileSeek and ExtAudioFileTell?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ExtAudioFileSeek and ExtAudioFileTell?

Subject: Re: ExtAudioFileSeek and ExtAudioFileTell?
From: Stephen Davis <email@hidden>
Date: Wed, 1 Apr 2009 01:20:33 -0700

[warning: off-the-cuff answer written while dosed with cold medicine so likely to contain errors but hopefully will give you enough info to stop beating your head against the wall of confusion and make some progress, or at least give you enough info to ask more questions]

Compressed files like AAC and MP3 have one or more packets at the beginning of the file that represent the internal delay through the psychoacoustic encoder and, as such, are supposed to be removed from the resulting decoded sample frames before getting sent to the destination. The decoder needs to be fed these packets to prime the pump, as it were, but the sample frames involved don't contribute to the final output. To clarify, a non-integral multiple of the frames- per-packet value should get dropped from the beginning of the file and that value varies by encoder. The packet size for AAC is most often 1024 frames per packet and I believe the "drop this many leading frames" value for the Nero encoder in question is 1600 while the Apple encoder is usually 2112. BTW, other versions of the Nero encoder have used 1024 as the value.

Historically speaking, the Nero encoder has used slightly "illegal" mechanisms (in the strict MPEG-4 sense b/c there is no official mechanism that works reliably, not to start a flame war on *that* subject) to convey this information so 3rd party software attempting to interpret the information might be getting it wrong. In this case, the AudioFile implementation is a 3rd party interpretation. Note that the iTunes gapless information storage method is also "illegal".

I don't have the headers in front of me at the moment but there is probably a property you can query to get the audio file packet table info which specifies the number of valid sample frames in the file, the encoder delay value (e.g. 1600) and a corresponding encoder "drain" value for the end of the file which is the number of sample frames at the end which should be cut off early.

Hope that made some sense. I'm not sure if what you're seeing is a bug in the implementation or if you just need to account for the additional gapless info to get what you want. You might want to try with an .m4a file generated from the WAV/AIFF source by afconvert or iTunes and see if those files behave properly with your code as it is now.

hth,
stephen

On Mar 31, 2009, at 11:15 PM, James Chandler Jr wrote:

Hi yet another dumb question. Mac OS 10.5.6 Build 9G55. Same behavior noted on multiple Macs.

We have a music app which splices small pieces of source audio files into memory-resident composite multi-tracks. Works great for WAV and AIF, and ALMOST works great for M4a. The source audio tracks are huge and we usually deliver the source tracks in compressed format. WMA on PC, and planned M4a audio files for Mac.
ExtAudioFileSeek documentation says:
"Sets the file’s read position to the specified sample frame number. A subsequent call to the ExtAudioFileRead function returns samples from precisely this location, even if it is located in the middle of a packet."

The seek does sample-accurate positioning, but with M4a there are fixed offsets from the true requested position.

I have tested several encoders' M4a files but the current test M4a's were all compressed from WAV by the NERO MP4 command-line codec.

Reported symptoms may vary if seeking into files made by other codecs. I want to discover a way to accurately seek into ANY M4a file. It would be a bummer to only be able to seek into 'special blessed M4a files'.

The behavior is difficult to clearly explain. Apologies if I don't explain it well.

Our code, ofter opening the ExtAudioFile, does an ExtAudioFileSeek, and then it does a read loop using ExtAudioFileRead until the desired number of frames have been retrieved. Each ExtAudioFileRead returns 16384 frames until the final partial buffer read to satisfy the request. The code appears to be correct, and functions perfectly on WAV and AIF.

Seeking within < 1024 frames of file beginning gives different symptoms than ExtAudioFileSeek with inFrameOffset >= 1024.
The behavior is identical on all the Nero-encoded M4a test files--
If the ExtAudioFileSeek inFrameOffset >= 1024, then the seek and subsequent reads will actually load into memory the sample frame (inFrameOffset + 1600).

If I seek to sample 44100, it actually starts reading at (44100 + 1600). If I ask it to seek to sample 441,000, then it actually starts reading at (441,000 + 1600).

With the Nero M4a files, as long as I seek somewhere past 2624 frames, I can get the sample I REALLY want, by REQUESTING [DesiredSampleFrame - 1600]!!!
Debugging with ExtAudioFileTell functions, I get results like this:
//in the case of DesiredSampleFrame >= 1024 ExtAudioFileSeek(FileRef, DesiredSampleFrame); //input DesiredSampleFrame ExtAudioFileTell(FileRef, ActualSampleFrame); //returned ActualSampleFrame = (DesiredSampleFrame - 2112)

ExtAudioFileRead(FileRef, theNumFrames, FillBufList); //returned theNumFrames = 16384 ExtAudioFileTell(FileRef, ActualSampleFrame); //returned ActualSampleFrame = (Original DesiredSampleFrame + 16448)
The numbers almost kinda sorta make sense, but not quite.
Is there a way to reconcile Tell against Seek to somehow guestimate the real position, and always load audio from the desired offset? Our audio output has surprising degradation of 'musical feel' even with 1 ms of random slop on individual file seeks. I wouldn't expect such small slop to make an audible difference, but it is easily audible.

For this set of Nero-encoded files, I can fudge the requested numbers to always get the desired results, but the magic numbers may vary with differently-compressed M4a files.

With the Nero files, I can reliably get the desired sample with this fudging method--

If DesiredSampleFrame >= 2624, then I request (DesiredSampleFrame - 1600)

Otherwise, I request sample 512, and memory move down the returned audio, by DesiredSampleFrame.

That is the other ugly thing-- If I request sample 0, the beginning of the file will load to location 512 in memory.

If I request sample 512, the beginning of the file will load to location 0 in memory.

If I request sample 1023, then sample 511 will load to location 0 in memory.

If I request sample 1024, then sample 2624 will load to location 0 in memory. Eeek! Ack!

There is one consistent offset for all seeks < 1024, and a different consistent offset for all seeks >= 1024. Maddening.
Thanks for any ideas.
James Chandler Jr.
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


Follow-Ups:

Re: ExtAudioFileSeek and ExtAudioFileTell?
From: James Chandler Jr <email@hidden>


Next by Date:
Re: ExtAudioFileSeek and ExtAudioFileTell?

Next by thread:
Re: ExtAudioFileSeek and ExtAudioFileTell?

Index(es):

Date
Thread