Reduced storage for speech quality audio
Reduced storage for speech quality audio
- Subject: Reduced storage for speech quality audio
- From: Brian Willoughby <email@hidden>
- Date: Fri, 1 Feb 2008 06:29:45 -0800
Hi Paul,
I took the liberty of changing the subject, since your question
deviates somewhat from the original questions. You deserve your own
thread here. ;-)
You could save a great deal of storage by using the AudioConverter
API to reduce the audio sample rate as low as 16-bit / 8 kHz if you
are happy with telephone quality speech. You would also cut your
memory requirements in half by storing monophonic speech instead of
stereo.
During playback, you would simply allow the Default Output Unit to
automatically handle the format conversion from your reduced-quality
storage format to whatever the device is running at. In this design,
your storage format is completely independent of the hardware sample
rate.
To further reduce storage requirements, you could even use the
compressed format options of AudioConverter if psychoacoustic
techniques are not objectionable. In this case, you might even be
able to maintain better overall frequency response than telephone
quality speech while taking less memory.
So long as you correctly specify the 16/44.1k source, and your
intermediate storage format when creating your original
AudioConverter and the Default Output Unit, then CoreAudio will
handle this translation on the fly.
I have a feeling that your users will not be concerned with the
quality of SRC, even with two conversions.
Another advantage of this design is that you do not have to take over
the hardware to minimize your storage space, and so your application
plays nicely with others. Think about it: iTunes regularly reduces
the storage requirements of audio CDs and plays them back without
taking over the hardware settings, so why should your application
need to be any more heavy-handed, especially considering that your
quality requirements are lower than iTunes.
Brian Willoughby
Sound Consulting
On Feb 1, 2008, at 04:38, Paul Fredlein wrote:
My app is designed for foreign language learning at high school. As
most schools prefer to run the software from the CD rather than
installing it on the hard disk I read all the audio, including that
recorded by the student, for each 'page' into RAM (- cause the CD
spins down) .
As it is speech there is no point in the audio being 44100 @ 24bit so
22050 @ 16bit is fine but if the hardware defaults to a high quality,
such as 'Builtin audio' then I'm filling up buffers unnecessarily. I
would much prefer to set the hardware to what I want at runtime, as I
do on Windows - and return it to was it was, but it seems that this
is undesirable on OSX - any suggestions? Teachers don't want students
wasting class time fiddling with audio settings.
I suppose for each page there would be a max of 20meg of audio in RAM
- I know it's not much today, it was 10 years ago, but it just seems
a waste of resources.
Thanks,
Paul
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden