Re: IOAudioMixerEngine.cpp
Re: IOAudioMixerEngine.cpp
- Subject: Re: IOAudioMixerEngine.cpp
- From: Bill Stewart <email@hidden>
- Date: Thu, 08 Aug 2002 00:32:31 -0700
on 8/7/02 11:10 PM, Nathan Aschbacher wrote:
>
Well I can think of one quick way to boost performance of the default
>
mixOutputSamples function. You're doing 4 scalar single precision floating
>
point additions in order. This seems like a perfect candidate for loading
>
the four values from the mixBuf into a 128-bit vector and loading the four
>
values from sourceBuf into another 128-bit vector and then just doing a
>
vector addition operation on them and storing the result back to the mixBuf.
>
At the very least you'd be moving some computation from a general purpose
>
processing unit off to a more specialized unit that sits there idle plenty
>
of the time. I may try changing the code to function like that and
>
recompile the base system audio drivers from CVS and see how it works.
There are some situations that Altivec won't help us with - ie, a small
enough chunk of data, data that is not in 4 float divisors, etc - so we
haven't done enough work with this to know when this will help and when it
will hinder - but sure, this is definitely something worth doing (but not as
clear cut as it would first seem)
>
Anyhow that's beside the point. Matt is right. What I'm looking to do is
>
be able to take load off the CPU.
Sure, understand that.
>
Whether the sound processor is faster or
>
not isn't as important to the process as taking the load of the CPU.
In this particular situation, sure
>
What I'm trying to build here is what's called "hardware accelerated audio" in
>
the Windows world. Where the audio hardware can reach into the systems
>
audio kernel buffers very early in the process and perform much of the
>
required work so the CPU doesn't have to. It has a measurable performance
>
difference on Intel PC's even in non-gaming scenarios simply because the CPU
>
is freed up during the playing of audio. So what I've been trying to
>
determine is 1) is this even possible? 2) If not, why not? What's the
>
limiter in MacOS X that prevents this kind of functionality? 3) If so, then
>
where ought I to be looking to hook into MacOS X's audio processing stages
>
to have the most success using an external audio processor to do some work
>
and free up the main CPU?
>
>
So if IOAudioEngine::mixOutputSamples is the last stage of the CPU handling
>
the audio buffers, then I'm going to want to attack this problem from higher
>
up the chain. My question then becomes, where? I'm to understand that this
>
has never been done before on the Mac, and myself and an eager group of
>
other developers are interested in seeing this happen. However where the
>
Windows driver development documentation for a DirectSound driver make this
>
process and it's purpose very clear, the lack of an obvious parallel to the
>
MacOS X sound system is making things complicated.
This really is a difference because Windows has always shipped on hardware
that has some external sound cards, and most of these are SoundBlaster type
cards with these kinds of capabilities - Apple's never shipped hardware that
can do this (and most of our CPU's don't have PCI slots to add this) - not
excusing this, just trying to explain why...
>
It also sounds like Jaguar may provide me with some better tools to work
>
with however. The native format capability ought to be very handy. My
>
concern was that the CPU's burden of running the float -> int conversions so
>
that the card (which only works on 32-bit integers) can do some of the work
>
would be more overhead added than would be saved by doing the mixing on the
>
card.
The native format (like with ANY usage of this feature) is there for Apps to
CHOOSE to use - but for those apps that don't use this, they will incur the
overhead of the float to int - but presumably they can do this because the
overhead of this doesn't kill their capability to do what they need to.
The native format then, is a nice optimisation for those that need it. Your
driver doesn't really care whether the client is dealing with floats or
ints, because the hardware engine will only see the int data (and of course
inherits the loss of headroom that dealing with int data entails)
So, what is it that your card does? We should concentrate the discussion on
this - the float->int is really the LEAST or your problems...
Lets presume that your card can take say 32 channels of input, mix them and
apply some kind of 3D sound to them - then it outputs 4 channels..
Lets also presume that your card is doing this work and outputting the data
- ie. There is NO need or capability to have the card process the samples
and then re-present them back to the CPU for further processing...
My first approach would be to publish those 32 channels as the format of the
device - you could also publish the 4 real channels of the device as an
alternative output format if you can turn off these hardware based processes
and just go straight to the DACs... (You should do this if you can)
The trick is to get the control information or parameters to the card...
One potential way to do this is to publish another non-mixable stream (with
your 32 channel version) from the card that is your control stream. It
should be a CBR stream (ie. Same number of bytes every I/O cycle), and the
maximum number of bytes that would represent the maximum number of commands
that can be accepted by the device for a max number of sample frames you can
process - its arguable here, that this format is ONLY usable when the app
has your device in Hog mode - but I'm not sure that we actually support that
semantic (and maybe we should)...
The PCM streams should be mono buffers - with Jaguar a client can turn
streams on and off on a device (for instance, if it wants to use only 8
channels for output, it can turn off the streams of the unused ones)...
Then - the data contents of that stream could be an array of structs:
Struct {
short channel;
short frameNumber; -> this is an offset into the current buffer
short controlNumber;
short controlValue:
};
The HAL will always zero out the IOProc's data before it passes it to a
client, so as long as you don't use a channel number of zero, you can
determine when the valid list ends by seeing a channel == 0 - you're done!
(This would actually also tie in nicely with the HAL's concept of numbering
channels - it reserves channel==0 to be the device, and channel1 is the
first real channel in the device's streams...)
(I'd also prefer this than publishing separate streams of control data on a
per channel basis as the density of the control data is in most cases very
sparse and there'd be too much data going over the bus if you did it that
way)
You also know (and the client knows) that the data it supplies and the
control data are paired to that I/O cycle... So there's a nice locality of
the PCM data with the control data.
Make any sense?
Bill
>
Anyhow I'm still trying to piece together a clear picture of how what I
>
desire to do fits into the MacOS X CoreAudio API's stages and capabilities.
>
Though I VERY much appreciated the thoughtful responses.
>
>
Thank You,
>
>
Nathan
>
_______________________________________________
>
coreaudio-api mailing list | email@hidden
>
Help/Unsubscribe/Archives:
>
http://www.lists.apple.com/mailman/listinfo/coreaudio-api
>
Do not post admin requests to the list. They will be ignored.
--
mailto:email@hidden
tel: +1 408 974 4056
__________________________________________________________________________
"...Been havin' some trouble lately in the sausage business," C.M.O.T.
Dibbler replied.
"What, having trouble making both ends meat?"
__________________________________________________________________________
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.