Re: Realtime AEC + VAD
Re: Realtime AEC + VAD
- Subject: Re: Realtime AEC + VAD
- From: Arshia Cont via Coreaudio-api <email@hidden>
- Date: Wed, 16 Oct 2024 20:34:55 +0200
Hi π,
From my experience that’s not possible. VPIO is an option for the lower level
IO device; so is VAD. You don’t have much control over their internals, routing
and wirings! Also, from our experience, VPIO has different behaviour on
different devices. On some iPads we saw “gating” instead of actually removing
echo (be aware of that!). In the end for a similar use-case we ended up doing
our own AEC and Activity Detection.
Cheers,
Arshia Cont
metronautapp.com
> On 15 Oct 2024, at 18:08, π via Coreaudio-api <email@hidden>
> wrote:
>
> Dear Audio Engineers,
>
> I'm writing an app to interact with OpenAI's 'realtime' API (bidirectional
> realtime audio over websocket with AI serverside).
>
> To do this, I need to be careful that the AI-speak doesn't make its way out
> of the speakers, back in thru the mic, and back to their server (else it
> starts to talk to itself, and gets very confused).
>
> So I need AEC, which I've actually got working, using
> kAudioUnitSubType_VoiceProcessingIO and
> AudioUnitSetProperty(kAUVoiceIOProperty_BypassVoiceProcessing, setting to
> False).
>
> Now I also wish to detect when the speaker (me) is speaking or not speaking,
> which I've also managed to do via
> kAudioDevicePropertyVoiceActivityDetectionEnable.
>
> But getting them to play together is another matter, and I'm struggling hard
> here.
>
> I've rigged up a simple test
> (https://gist.github.com/p-i-/d262e492073d20338e8fcf9273a355b4), where a
> 440Hz sinewave is generated in the render-callback, and mic-input is recorded
> to file in the input-callback.
>
> So the AEC works delightfully, subtracting the sinewave and recording my
> voice.
> And if I turn the sine-wave amplitude down to 0, the VAD correctly triggers
> the speech-started and speech-stopped events.
>
> But if I turn up the sine-wave, it messes up the VAD.
>
> Presumably the VAD is working over the pre-EchoCancelled audio, which is most
> undesirable.
>
> How can I progress here?
>
> My thought was to create an audio pipeline, using AUGraph, but my efforts
> have thus far been unsuccessful, and I lack confidence that I'm even pushing
> in the right direction.
>
> My thought was to have an IO unit that interfaces with the hardware
> (mic/spkr), which plugs into an AEC unit, which plugs into a VAD unit.
>
> But I can't see how to set this up.
>
> On iOS there's a RemoteIO unit to deal with the hardware, but I can't see any
> such unit on macOS. It seems the VoiceProcessing unit wants to do that itself.
>
> And then I wonder: Could I make a second VoiceProcessing unit, and have
> vp1_aec split send its bus[1(mic)].outputScope to vp2_vad.bus[1].inputScope?
>
> Can I do this kind of work by routing audio, or do I need to get my hands
> dirty with input/render callbacks?
>
> It feels like I'm going hard against the grain if I am faffing with these
> callbacks.
>
> If there's anyone out there that would care to offer me some guidance here, I
> am most grateful!
>
> π
>
> PS Is it not a serious problem that VAD can't operate on post-AEC input?
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Coreaudio-api mailing list (email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden