• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Realtime AEC + VAD
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Realtime AEC + VAD


  • Subject: Re: Realtime AEC + VAD
  • From: Tamás Zahola via Coreaudio-api <email@hidden>
  • Date: Wed, 16 Oct 2024 12:56:33 +0200

I don't think it's going to work. AudioDevice operates at a lower level than
AudioUnit. In principle, there's no way for the AudioDevice to access the
filtered audio stream of the voice processing AudioUnit. Why VAD is part of the
low-level AudioDevice API is a good question, that only Apple can answer.

You'll have to use your own VAD algorithm and feed it the audio input from the
voice processing unit.

Regards,
Tamás Zahola

> On 15 Oct 2024, at 18:08, π via Coreaudio-api <email@hidden>
> wrote:
>
> Dear Audio Engineers,
>
> I'm writing an app to interact with OpenAI's 'realtime' API (bidirectional
> realtime audio over websocket with AI serverside).
>
> To do this, I need to be careful that the AI-speak doesn't make its way out
> of the speakers, back in thru the mic, and back to their server (else it
> starts to talk to itself, and gets very confused).
>
> So I need AEC, which I've actually got working, using
> kAudioUnitSubType_VoiceProcessingIO and
> AudioUnitSetProperty(kAUVoiceIOProperty_BypassVoiceProcessing, setting to
> False).
>
> Now I also wish to detect when the speaker (me) is speaking or not speaking,
> which I've also managed to do via
> kAudioDevicePropertyVoiceActivityDetectionEnable.
>
> But getting them to play together is another matter, and I'm struggling hard
> here.
>
> I've rigged up a simple test
> (https://gist.github.com/p-i-/d262e492073d20338e8fcf9273a355b4), where a
> 440Hz sinewave is generated in the render-callback, and mic-input is recorded
> to file in the input-callback.
>
> So the AEC works delightfully, subtracting the sinewave and recording my
> voice.
> And if I turn the sine-wave amplitude down to 0, the VAD correctly triggers
> the speech-started and speech-stopped events.
>
> But if I turn up the sine-wave, it messes up the VAD.
>
> Presumably the VAD is working over the pre-EchoCancelled audio, which is most
> undesirable.
>
> How can I progress here?
>
> My thought was to create an audio pipeline, using AUGraph, but my efforts
> have thus far been unsuccessful, and I lack confidence that I'm even pushing
> in the right direction.
>
> My thought was to have an IO unit that interfaces with the hardware
> (mic/spkr), which plugs into an AEC unit, which plugs into a VAD unit.
>
> But I can't see how to set this up.
>
> On iOS there's a RemoteIO unit to deal with the hardware, but I can't see any
> such unit on macOS. It seems the VoiceProcessing unit wants to do that itself.
>
> And then I wonder: Could I make a second VoiceProcessing unit, and have
> vp1_aec split send its bus[1(mic)].outputScope to vp2_vad.bus[1].inputScope?
>
> Can I do this kind of work by routing audio, or do I need to get my hands
> dirty with input/render callbacks?
>
> It feels like I'm going hard against the grain if I am faffing with these
> callbacks.
>
> If there's anyone out there that would care to offer me some guidance here, I
> am most grateful!
>
> π
>
> PS Is it not a serious problem that VAD can't operate on post-AEC input?
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Coreaudio-api mailing list      (email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

References: 
 >Realtime AEC + VAD (From: π via Coreaudio-api <email@hidden>)

  • Prev by Date: Realtime AEC + VAD
  • Next by Date: Re: Realtime AEC + VAD
  • Previous by thread: Realtime AEC + VAD
  • Next by thread: Re: Realtime AEC + VAD
  • Index(es):
    • Date
    • Thread