Re: IOAudioMixerEngine.cpp
Re: IOAudioMixerEngine.cpp
- Subject: Re: IOAudioMixerEngine.cpp
- From: Bill Stewart <email@hidden>
- Date: Thu, 08 Aug 2002 15:18:20 -0700
Another thought on this...
You would also not have to publish the format of your control stream if you
didn't want people messing with it directly...
So, instead you could write your own version of a HAL output Unit (that
implements the Start and Stop calls so the client can start/stop your
device) - and of course, your unit will only talk to your device..
Then, the client can use a more natural syntax for scheduling events - if
you want to use the basic AudioUnit API, then you could just use the
parameter events (and with the new V2 units that are available in Jaguar,
there is also the ability to schedule the ramping of a parameter value)...
Then, the client just schedules these events, and when you're preparing the
output for the device you would turn these parameter events into the control
format you need to give to the device...
This would work in a very similar way to how the audio units work today
(including the new 3DMixer unit that is available in Jaguar) - so the app
code becomes a simpler proposition for you to support, because there's no
tweaky customised format stuff you have to document - just parameters that
become scoped on each input of your unit and that mechanism is already
published and documented in the standard audio unit API that Apple ships.
(Just a thought)
Bill
on 8/8/02 12:32 AM, Bill Stewart wrote:
>
on 8/7/02 11:10 PM, Nathan Aschbacher wrote:
>
> Well I can think of one quick way to boost performance of the default
>
> mixOutputSamples function. You're doing 4 scalar single precision floating
>
> point additions in order. This seems like a perfect candidate for loading
>
> the four values from the mixBuf into a 128-bit vector and loading the four
>
> values from sourceBuf into another 128-bit vector and then just doing a
>
> vector addition operation on them and storing the result back to the mixBuf.
>
> At the very least you'd be moving some computation from a general purpose
>
> processing unit off to a more specialized unit that sits there idle plenty
>
> of the time. I may try changing the code to function like that and
>
> recompile the base system audio drivers from CVS and see how it works.
>
>
There are some situations that Altivec won't help us with - ie, a small
>
enough chunk of data, data that is not in 4 float divisors, etc - so we
>
haven't done enough work with this to know when this will help and when it
>
will hinder - but sure, this is definitely something worth doing (but not as
>
clear cut as it would first seem)
>
>
> Anyhow that's beside the point. Matt is right. What I'm looking to do is
>
> be able to take load off the CPU.
>
>
Sure, understand that.
>
>
> Whether the sound processor is faster or
>
> not isn't as important to the process as taking the load of the CPU.
>
>
In this particular situation, sure
>
>
> What I'm trying to build here is what's called "hardware accelerated audio"
>
> in
>
> the Windows world. Where the audio hardware can reach into the systems
>
> audio kernel buffers very early in the process and perform much of the
>
> required work so the CPU doesn't have to. It has a measurable performance
>
> difference on Intel PC's even in non-gaming scenarios simply because the CPU
>
> is freed up during the playing of audio. So what I've been trying to
>
> determine is 1) is this even possible? 2) If not, why not? What's the
>
> limiter in MacOS X that prevents this kind of functionality? 3) If so, then
>
> where ought I to be looking to hook into MacOS X's audio processing stages
>
> to have the most success using an external audio processor to do some work
>
> and free up the main CPU?
>
>
>
> So if IOAudioEngine::mixOutputSamples is the last stage of the CPU handling
>
> the audio buffers, then I'm going to want to attack this problem from higher
>
> up the chain. My question then becomes, where? I'm to understand that this
>
> has never been done before on the Mac, and myself and an eager group of
>
> other developers are interested in seeing this happen. However where the
>
> Windows driver development documentation for a DirectSound driver make this
>
> process and it's purpose very clear, the lack of an obvious parallel to the
>
> MacOS X sound system is making things complicated.
>
>
This really is a difference because Windows has always shipped on hardware
>
that has some external sound cards, and most of these are SoundBlaster type
>
cards with these kinds of capabilities - Apple's never shipped hardware that
>
can do this (and most of our CPU's don't have PCI slots to add this) - not
>
excusing this, just trying to explain why...
>
>
> It also sounds like Jaguar may provide me with some better tools to work
>
> with however. The native format capability ought to be very handy. My
>
> concern was that the CPU's burden of running the float -> int conversions so
>
> that the card (which only works on 32-bit integers) can do some of the work
>
> would be more overhead added than would be saved by doing the mixing on the
>
> card.
>
>
The native format (like with ANY usage of this feature) is there for Apps to
>
CHOOSE to use - but for those apps that don't use this, they will incur the
>
overhead of the float to int - but presumably they can do this because the
>
overhead of this doesn't kill their capability to do what they need to.
>
>
The native format then, is a nice optimisation for those that need it. Your
>
driver doesn't really care whether the client is dealing with floats or
>
ints, because the hardware engine will only see the int data (and of course
>
inherits the loss of headroom that dealing with int data entails)
>
>
>
So, what is it that your card does? We should concentrate the discussion on
>
this - the float->int is really the LEAST or your problems...
>
>
Lets presume that your card can take say 32 channels of input, mix them and
>
apply some kind of 3D sound to them - then it outputs 4 channels..
>
>
Lets also presume that your card is doing this work and outputting the data
>
- ie. There is NO need or capability to have the card process the samples
>
and then re-present them back to the CPU for further processing...
>
>
My first approach would be to publish those 32 channels as the format of the
>
device - you could also publish the 4 real channels of the device as an
>
alternative output format if you can turn off these hardware based processes
>
and just go straight to the DACs... (You should do this if you can)
>
>
The trick is to get the control information or parameters to the card...
>
>
One potential way to do this is to publish another non-mixable stream (with
>
your 32 channel version) from the card that is your control stream. It
>
should be a CBR stream (ie. Same number of bytes every I/O cycle), and the
>
maximum number of bytes that would represent the maximum number of commands
>
that can be accepted by the device for a max number of sample frames you can
>
process - its arguable here, that this format is ONLY usable when the app
>
has your device in Hog mode - but I'm not sure that we actually support that
>
semantic (and maybe we should)...
>
>
The PCM streams should be mono buffers - with Jaguar a client can turn
>
streams on and off on a device (for instance, if it wants to use only 8
>
channels for output, it can turn off the streams of the unused ones)...
>
>
Then - the data contents of that stream could be an array of structs:
>
Struct {
>
short channel;
>
short frameNumber; -> this is an offset into the current buffer
>
short controlNumber;
>
short controlValue:
>
};
>
>
The HAL will always zero out the IOProc's data before it passes it to a
>
client, so as long as you don't use a channel number of zero, you can
>
determine when the valid list ends by seeing a channel == 0 - you're done!
>
(This would actually also tie in nicely with the HAL's concept of numbering
>
channels - it reserves channel==0 to be the device, and channel1 is the
>
first real channel in the device's streams...)
>
>
(I'd also prefer this than publishing separate streams of control data on a
>
per channel basis as the density of the control data is in most cases very
>
sparse and there'd be too much data going over the bus if you did it that
>
way)
>
>
You also know (and the client knows) that the data it supplies and the
>
control data are paired to that I/O cycle... So there's a nice locality of
>
the PCM data with the control data.
>
>
Make any sense?
>
>
Bill
>
>
> Anyhow I'm still trying to piece together a clear picture of how what I
>
> desire to do fits into the MacOS X CoreAudio API's stages and capabilities.
>
> Though I VERY much appreciated the thoughtful responses.
>
>
>
> Thank You,
>
>
>
> Nathan
>
> _______________________________________________
>
> coreaudio-api mailing list | email@hidden
>
> Help/Unsubscribe/Archives:
>
> http://www.lists.apple.com/mailman/listinfo/coreaudio-api
>
> Do not post admin requests to the list. They will be ignored.
--
mailto:email@hidden
tel: +1 408 974 4056
__________________________________________________________________________
"...Been havin' some trouble lately in the sausage business," C.M.O.T.
Dibbler replied.
"What, having trouble making both ends meat?"
__________________________________________________________________________
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.