Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Realtime AEC + VAD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Realtime AEC + VAD

Subject: Re: Realtime AEC + VAD
From: Tamás Zahola via Coreaudio-api <email@hidden>
Date: Thu, 17 Oct 2024 11:22:53 +0200

You can extract the VAD algorithm from WebRTC by starting at this file:
https://chromium.googlesource.com/external/webrtc/stable/src/+/master/common_audio/vad/vad_core.h

You'll also need some stuff from the common_audio/signal_processing folder, but
otherwise it's self-contained.

> It's easy for me to get the audio-output-stream for MY app (it just comes in
> over the websocket), but I may wish to toggle whether I want my AEC to be
> cancelling out any output-audio generated by other processes on my mac.

From macOS Ventura onwards it is possible to capture system audio with the
ScreenCaptureKit framework, although your app will need extra privacy
permissions.

> It must be possible on macOS, as apps like soundFlower or blackHole are able
> to do it.

BlackHole and SoundFlower are using an older technique, where they install a
virtual loopback audio device on the system (you can see it listed in Audio
MIDI Settings as e.g. "BlackHole 2 ch"), and change the system's default output
device to that, then capture from the input port of this loopback device. But
this requires installing the virtual device in /Library/Audio/Plug-Ins/HAL,
which requires admin privileges.

> But mobile, I'm not so sure. My memory of iPhone audio dev (~2008) is that it
> was impossible to access this. But there's now some mention of v3 audio-units
> being able to process inter-app audio.

On iOS you must use the voice-processing I/O unit. Normal apps cannot capture
the system audio output. Technically there is a way to do it with the ReplayKit
framework, but it's a pain in the ass to use, and the primary purpose of that
framework is capturing screen content, not audio. If you try e.g. Facebook
Messenger on iOS, and initiate screen-sharing in a video call, that's going to
use ReplayKit.

Regards,
Tamás Zahola

> On 17 Oct 2024, at 08:04, π via Coreaudio-api <email@hidden>
> wrote:
>
> Thankyou for the replies. I am glad to see that this mailing-list is still
> alive, despite the dwindling traffic this last few years.
>
> Can I not encapsulate a VPIO unit, and control the input/output audio-streams
> by implementing input/render callbacks, or making connections?
>
> I'm veering towards this approach of manual implementation: Just to use a
> (misnamed as it's I/O) HALInput unit on macOS or a RemoteIO unit on the
> mobile platforms to access the raw I/O buffers, and write my own pipeline.
>
> Would it be a good idea to use https://github.com/apple/AudioUnitSDK to wrap
> this? My hunch is to minimize the layers/complexity and NOT use this
> framework.
>
> And for the AEC/VAD, can anyone offer a perspective? Arshia? The two obvious
> candidates I see are WebRTC and SpeeX. GPT4o reckons WebRTC will be the
> most-advanced / best-performant solution, with the downside that it's a big
> project (and maybe a more complicated build process), while SpeeX is more
> light-weight and will probably do the job well enough for my purposes.
>
> And as both are open-source, I may have the option of pulling out the
> minimal-dependency files and building just those.
>
> The last question is regarding system-wide audio output. It's easy for me to
> get the audio-output-stream for MY app (it just comes in over the websocket),
> but I may wish to toggle whether I want my AEC to be cancelling out any
> output-audio generated by other processes on my mac. e.g. if I am watching a
> YouTube video, maybe I want my AI to listen to that, and maybe I want it
> subtracted. So do I have the option to listen to SYSTEM-level audio output
> (so as to feed it into my AEC impl)? It must be possible on macOS, as apps
> like soundFlower or blackHole are able to do it. But mobile, I'm not so sure.
> My memory of iPhone audio dev (~2008) is that it was impossible to access
> this. But there's now some mention of v3 audio-units being able to process
> inter-app audio.
>
> π
>
> On Wed, 16 Oct 2024 at 19:35, Arshia Cont via Coreaudio-api
> <email@hidden <mailto:email@hidden>> wrote:
>> Hi π,
>>
>> From my experience that’s not possible. VPIO is an option for the lower
>> level IO device; so is VAD. You don’t have much control over their
>> internals, routing and wirings! Also, from our experience, VPIO has
>> different behaviour on different devices. On some iPads we saw “gating”
>> instead of actually removing echo (be aware of that!). In the end for a
>> similar use-case we ended up doing our own AEC and Activity Detection.
>>
>> Cheers,
>>
>> Arshia Cont
>> metronautapp.com <http://metronautapp.com/>
>>
>>
>>
>>> On 15 Oct 2024, at 18:08, π via Coreaudio-api
>>> <email@hidden <mailto:email@hidden>>
>>> wrote:
>>>
>>> Dear Audio Engineers,
>>>
>>> I'm writing an app to interact with OpenAI's 'realtime' API (bidirectional
>>> realtime audio over websocket with AI serverside).
>>>
>>> To do this, I need to be careful that the AI-speak doesn't make its way out
>>> of the speakers, back in thru the mic, and back to their server (else it
>>> starts to talk to itself, and gets very confused).
>>>
>>> So I need AEC, which I've actually got working, using
>>> kAudioUnitSubType_VoiceProcessingIO and
>>> AudioUnitSetProperty(kAUVoiceIOProperty_BypassVoiceProcessing, setting to
>>> False).
>>>
>>> Now I also wish to detect when the speaker (me) is speaking or not
>>> speaking, which I've also managed to do via
>>> kAudioDevicePropertyVoiceActivityDetectionEnable.
>>>
>>> But getting them to play together is another matter, and I'm struggling
>>> hard here.
>>>
>>> I've rigged up a simple test
>>> (https://gist.github.com/p-i-/d262e492073d20338e8fcf9273a355b4), where a
>>> 440Hz sinewave is generated in the render-callback, and mic-input is
>>> recorded to file in the input-callback.
>>>
>>> So the AEC works delightfully, subtracting the sinewave and recording my
>>> voice.
>>> And if I turn the sine-wave amplitude down to 0, the VAD correctly triggers
>>> the speech-started and speech-stopped events.
>>>
>>> But if I turn up the sine-wave, it messes up the VAD.
>>>
>>> Presumably the VAD is working over the pre-EchoCancelled audio, which is
>>> most undesirable.
>>>
>>> How can I progress here?
>>>
>>> My thought was to create an audio pipeline, using AUGraph, but my efforts
>>> have thus far been unsuccessful, and I lack confidence that I'm even
>>> pushing in the right direction.
>>>
>>> My thought was to have an IO unit that interfaces with the hardware
>>> (mic/spkr), which plugs into an AEC unit, which plugs into a VAD unit.
>>>
>>> But I can't see how to set this up.
>>>
>>> On iOS there's a RemoteIO unit to deal with the hardware, but I can't see
>>> any such unit on macOS. It seems the VoiceProcessing unit wants to do that
>>> itself.
>>>
>>> And then I wonder: Could I make a second VoiceProcessing unit, and have
>>> vp1_aec split send its bus[1(mic)].outputScope to vp2_vad.bus[1].inputScope?
>>>
>>> Can I do this kind of work by routing audio, or do I need to get my hands
>>> dirty with input/render callbacks?
>>>
>>> It feels like I'm going hard against the grain if I am faffing with these
>>> callbacks.
>>>
>>> If there's anyone out there that would care to offer me some guidance here,
>>> I am most grateful!
>>>
>>> π
>>>
>>> PS Is it not a serious problem that VAD can't operate on post-AEC input?
>>> _______________________________________________
>>> Do not post admin requests to the list. They will be ignored.
>>> Coreaudio-api mailing list      (email@hidden
>>> <mailto:email@hidden>)
>>> Help/Unsubscribe/Update your Subscription:
>>>
>>> This email sent to email@hidden
>>> <mailto:email@hidden>
>>
>>  _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Coreaudio-api mailing list      (email@hidden
>> <mailto:email@hidden>)
>> Help/Unsubscribe/Update your Subscription:
>>
>> This email sent to email@hidden <mailto:email@hidden>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Coreaudio-api mailing list      (email@hidden
> <mailto:email@hidden>)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden <mailto:email@hidden>

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

Follow-Ups:
- Re: Realtime AEC + VAD
  - From: π via Coreaudio-api <email@hidden>
- Re: Realtime AEC + VAD
  - From: Andy Lucas via Coreaudio-api <email@hidden>

References:
	>Realtime AEC + VAD (From: π via Coreaudio-api <email@hidden>)
	>Re: Realtime AEC + VAD (From: Arshia Cont via Coreaudio-api <email@hidden>)
	>Re: Realtime AEC + VAD (From: π via Coreaudio-api <email@hidden>)

Prev by Date: Re: Realtime AEC + VAD
Next by Date: Re: Realtime AEC + VAD
Previous by thread: Re: Realtime AEC + VAD
Next by thread: Re: Realtime AEC + VAD
Index(es):
- Date
- Thread