• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: IOAudioMixerEngine.cpp
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: IOAudioMixerEngine.cpp


  • Subject: Re: IOAudioMixerEngine.cpp
  • From: Jeff Moore <email@hidden>
  • Date: Thu, 8 Aug 2002 12:45:02 -0700

Bill's recommendations for a general approach to implementing hardware mixing is pretty reasonable.

Another way of presenting the controls of each channel would be to implement a HAL plug-in that provides your custom controls as properties. That way your device's controls are presented in a way consistent with the rest of the HAL's controls.

A useful feature that could be added is dynamic channel allocation. This would make it easier to keep all the processes that are trying to use the card at the same time from having to all agree on the channel allocation (which could be difficult and confusing for the user), as well as keep the amount of data moving around to a minimum when channels aren't in use.

I'm not sure how I would go about implementing dynamic channel allocation though. There may need to be some changes in both the HAL and the IOAudio family to facilitate this sort of thing. I haven't thought too deeply about it yet.

It would be worth while to file a Radar bug describing exactly what your needs and desires are in this area.

On Thursday, August 8, 2002, at 12:32 AM, Bill Stewart wrote:

on 8/7/02 11:10 PM, Nathan Aschbacher wrote:
Well I can think of one quick way to boost performance of the default
mixOutputSamples function. You're doing 4 scalar single precision floating
point additions in order. This seems like a perfect candidate for loading
the four values from the mixBuf into a 128-bit vector and loading the four
values from sourceBuf into another 128-bit vector and then just doing a
vector addition operation on them and storing the result back to the mixBuf.
At the very least you'd be moving some computation from a general purpose
processing unit off to a more specialized unit that sits there idle plenty
of the time. I may try changing the code to function like that and
recompile the base system audio drivers from CVS and see how it works.

There are some situations that Altivec won't help us with - ie, a small
enough chunk of data, data that is not in 4 float divisors, etc - so we
haven't done enough work with this to know when this will help and when it
will hinder - but sure, this is definitely something worth doing (but not as
clear cut as it would first seem)

Anyhow that's beside the point. Matt is right. What I'm looking to do is
be able to take load off the CPU.

Sure, understand that.

Whether the sound processor is faster or
not isn't as important to the process as taking the load of the CPU.

In this particular situation, sure

What I'm trying to build here is what's called "hardware accelerated audio" in
the Windows world. Where the audio hardware can reach into the systems
audio kernel buffers very early in the process and perform much of the
required work so the CPU doesn't have to. It has a measurable performance
difference on Intel PC's even in non-gaming scenarios simply because the CPU
is freed up during the playing of audio. So what I've been trying to
determine is 1) is this even possible? 2) If not, why not? What's the
limiter in MacOS X that prevents this kind of functionality? 3) If so, then
where ought I to be looking to hook into MacOS X's audio processing stages
to have the most success using an external audio processor to do some work
and free up the main CPU?

So if IOAudioEngine::mixOutputSamples is the last stage of the CPU handling
the audio buffers, then I'm going to want to attack this problem from higher
up the chain. My question then becomes, where? I'm to understand that this
has never been done before on the Mac, and myself and an eager group of
other developers are interested in seeing this happen. However where the
Windows driver development documentation for a DirectSound driver make this
process and it's purpose very clear, the lack of an obvious parallel to the
MacOS X sound system is making things complicated.

This really is a difference because Windows has always shipped on hardware
that has some external sound cards, and most of these are SoundBlaster type
cards with these kinds of capabilities - Apple's never shipped hardware that
can do this (and most of our CPU's don't have PCI slots to add this) - not
excusing this, just trying to explain why...

It also sounds like Jaguar may provide me with some better tools to work
with however. The native format capability ought to be very handy. My
concern was that the CPU's burden of running the float -> int conversions so
that the card (which only works on 32-bit integers) can do some of the work
would be more overhead added than would be saved by doing the mixing on the
card.

The native format (like with ANY usage of this feature) is there for Apps to
CHOOSE to use - but for those apps that don't use this, they will incur the
overhead of the float to int - but presumably they can do this because the
overhead of this doesn't kill their capability to do what they need to.

The native format then, is a nice optimisation for those that need it. Your
driver doesn't really care whether the client is dealing with floats or
ints, because the hardware engine will only see the int data (and of course
inherits the loss of headroom that dealing with int data entails)


So, what is it that your card does? We should concentrate the discussion on
this - the float->int is really the LEAST or your problems...

Lets presume that your card can take say 32 channels of input, mix them and
apply some kind of 3D sound to them - then it outputs 4 channels..

Lets also presume that your card is doing this work and outputting the data
- ie. There is NO need or capability to have the card process the samples
and then re-present them back to the CPU for further processing...

My first approach would be to publish those 32 channels as the format of the
device - you could also publish the 4 real channels of the device as an
alternative output format if you can turn off these hardware based processes
and just go straight to the DACs... (You should do this if you can)

The trick is to get the control information or parameters to the card...

One potential way to do this is to publish another non-mixable stream (with
your 32 channel version) from the card that is your control stream. It
should be a CBR stream (ie. Same number of bytes every I/O cycle), and the
maximum number of bytes that would represent the maximum number of commands
that can be accepted by the device for a max number of sample frames you can
process - its arguable here, that this format is ONLY usable when the app
has your device in Hog mode - but I'm not sure that we actually support that
semantic (and maybe we should)...

The PCM streams should be mono buffers - with Jaguar a client can turn
streams on and off on a device (for instance, if it wants to use only 8
channels for output, it can turn off the streams of the unused ones)...

Then - the data contents of that stream could be an array of structs:
Struct {
short channel;
short frameNumber; -> this is an offset into the current buffer
short controlNumber;
short controlValue:
};

The HAL will always zero out the IOProc's data before it passes it to a
client, so as long as you don't use a channel number of zero, you can
determine when the valid list ends by seeing a channel == 0 - you're done!
(This would actually also tie in nicely with the HAL's concept of numbering
channels - it reserves channel==0 to be the device, and channel1 is the
first real channel in the device's streams...)

(I'd also prefer this than publishing separate streams of control data on a
per channel basis as the density of the control data is in most cases very
sparse and there'd be too much data going over the bus if you did it that
way)

You also know (and the client knows) that the data it supplies and the
control data are paired to that I/O cycle... So there's a nice locality of
the PCM data with the control data.

Make any sense?

Bill

Anyhow I'm still trying to piece together a clear picture of how what I
desire to do fits into the MacOS X CoreAudio API's stages and capabilities.
Though I VERY much appreciated the thoughtful responses.

Thank You,

Nathan
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.

-- mailto:email@hidden
tel: +1 408 974 4056

_______________________________________________________________________ ___
"...Been havin' some trouble lately in the sausage business," C.M.O.T.
Dibbler replied.
"What, having trouble making both ends meat?"
_______________________________________________________________________ ___
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.



--

Jeff Moore
Core Audio
Apple
_______________________________________________
coreaudio-api mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/coreaudio-api
Do not post admin requests to the list. They will be ignored.

References: 
 >Re: IOAudioMixerEngine.cpp (From: Bill Stewart <email@hidden>)

  • Prev by Date: convince the FMOD group to go CoreAudio?
  • Next by Date: Re: inOutputTime accuracy?
  • Previous by thread: Re: IOAudioMixerEngine.cpp
  • Next by thread: Re: IOAudioMixerEngine.cpp
  • Index(es):
    • Date
    • Thread