Lists

Terms and Conditions
Lists hosted on this site
Email the Postmaster
Tips for posting to public mailing lists

Re: Realtime AEC + VAD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Realtime AEC + VAD

Subject: Re: Realtime AEC + VAD
From: Jonatan Liljedahl via Coreaudio-api <email@hidden>
Date: Thu, 17 Oct 2024 09:18:42 +0200

It’s not possible to access the system level audio output on iOS,
unfortunately.

/Jonatan, developer of AUM
http://kymatica.com


tors 17 okt. 2024 kl. 08:04 skrev π via Coreaudio-api <
email@hidden>:

> Thankyou for the replies. I am glad to see that this mailing-list is still
> alive, despite the dwindling traffic this last few years.
>
> Can I not encapsulate a VPIO unit, and control the input/output
> audio-streams by implementing input/render callbacks, or making connections?
>
> I'm veering towards this approach of manual implementation: Just to use a
> (misnamed as it's I/O) HALInput unit on macOS or a RemoteIO unit on the
> mobile platforms to access the raw I/O buffers, and write my own pipeline.
>
> Would it be a good idea to use https://github.com/apple/AudioUnitSDK to
> wrap this? My hunch is to minimize the layers/complexity and NOT use this
> framework.
>
> And for the AEC/VAD, can anyone offer a perspective? Arshia? The two
> obvious candidates I see are WebRTC and SpeeX. GPT4o reckons WebRTC will be
> the most-advanced / best-performant solution, with the downside that it's a
> big project (and maybe a more complicated build process), while SpeeX is
> more light-weight and will probably do the job well enough for my purposes.
>
> And as both are open-source, I may have the option of pulling out the
> minimal-dependency files and building just those.
>
> The last question is regarding system-wide audio output. It's easy for me
> to get the audio-output-stream for MY app (it just comes in over the
> websocket), but I may wish to toggle whether I want my AEC to be cancelling
> out any output-audio generated by other processes on my mac. e.g. if I am
> watching a YouTube video, maybe I want my AI to listen to that, and maybe I
> want it subtracted. So do I have the option to listen to SYSTEM-level audio
> output (so as to feed it into my AEC impl)? It must be possible on macOS,
> as apps like soundFlower or blackHole are able to do it. But mobile, I'm
> not so sure. My memory of iPhone audio dev (~2008) is that it was
> impossible to access this. But there's now some mention of v3 audio-units
> being able to process inter-app audio.
>
> π
>
> On Wed, 16 Oct 2024 at 19:35, Arshia Cont via Coreaudio-api <
> email@hidden> wrote:
>
>> Hi π,
>>
>> From my experience that’s not possible. VPIO is an option for the lower
>> level IO device; so is VAD. You don’t have much control over their
>> internals, routing and wirings! Also, from our experience, VPIO has
>> different behaviour on different devices. On some iPads we saw “gating”
>> instead of actually removing echo (be aware of that!). In the end for a
>> similar use-case we ended up doing our own AEC and Activity Detection.
>>
>> Cheers,
>>
>> Arshia Cont
>> metronautapp.com
>>
>>
>>
>> On 15 Oct 2024, at 18:08, π via Coreaudio-api <
>> email@hidden> wrote:
>>
>> Dear Audio Engineers,
>>
>> I'm writing an app to interact with OpenAI's 'realtime' API
>> (bidirectional realtime audio over websocket with AI serverside).
>>
>> To do this, I need to be careful that the AI-speak doesn't make its way
>> out of the speakers, back in thru the mic, and back to their server (else
>> it starts to talk to itself, and gets very confused).
>>
>> So I need AEC, which I've actually got working,
>> using kAudioUnitSubType_VoiceProcessingIO
>> and AudioUnitSetProperty(kAUVoiceIOProperty_BypassVoiceProcessing, setting
>> to False).
>>
>> Now I also wish to detect when the speaker (me) is speaking or not
>> speaking, which I've also managed to do
>> via kAudioDevicePropertyVoiceActivityDetectionEnable.
>>
>> But getting them to play together is another matter, and I'm struggling
>> hard here.
>>
>> I've rigged up a simple test (
>> https://gist.github.com/p-i-/d262e492073d20338e8fcf9273a355b4), where a
>> 440Hz sinewave is generated in the render-callback, and mic-input is
>> recorded to file in the input-callback.
>>
>> So the AEC works delightfully, subtracting the sinewave and recording my
>> voice.
>> And if I turn the sine-wave amplitude down to 0, the VAD correctly
>> triggers the speech-started and speech-stopped events.
>>
>> But if I turn up the sine-wave, it messes up the VAD.
>>
>> Presumably the VAD is working over the pre-EchoCancelled audio, which is
>> most undesirable.
>>
>> How can I progress here?
>>
>> My thought was to create an audio pipeline, using AUGraph, but my efforts
>> have thus far been unsuccessful, and I lack confidence that I'm even
>> pushing in the right direction.
>>
>> My thought was to have an IO unit that interfaces with the hardware
>> (mic/spkr), which plugs into an AEC unit, which plugs into a VAD unit.
>>
>> But I can't see how to set this up.
>>
>> On iOS there's a RemoteIO unit to deal with the hardware, but I can't see
>> any such unit on macOS. It seems the VoiceProcessing unit wants to do that
>> itself.
>>
>> And then I wonder: Could I make a second VoiceProcessing unit, and have
>> vp1_aec split send its bus[1(mic)].outputScope to vp2_vad.bus[1].inputScope?
>>
>> Can I do this kind of work by routing audio, or do I need to get my hands
>> dirty with input/render callbacks?
>>
>> It feels like I'm going hard against the grain if I am faffing with these
>> callbacks.
>>
>> If there's anyone out there that would care to offer me some guidance
>> here, I am most grateful!
>>
>> π
>>
>> PS Is it not a serious problem that VAD can't operate on post-AEC input?
>> _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Coreaudio-api mailing list      (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>>
>>
>> This email sent to email@hidden
>>
>>
>>  _______________________________________________
>> Do not post admin requests to the list. They will be ignored.
>> Coreaudio-api mailing list      (email@hidden)
>> Help/Unsubscribe/Update your Subscription:
>>
>>
>> This email sent to email@hidden
>>
>  _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Coreaudio-api mailing list      (email@hidden)
> Help/Unsubscribe/Update your Subscription:
>
> This email sent to email@hidden
>

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Coreaudio-api mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:

This email sent to email@hidden

References:
	>Realtime AEC + VAD (From: π via Coreaudio-api <email@hidden>)
	>Re: Realtime AEC + VAD (From: Arshia Cont via Coreaudio-api <email@hidden>)
	>Re: Realtime AEC + VAD (From: π via Coreaudio-api <email@hidden>)

Prev by Date: Re: Realtime AEC + VAD
Next by Date: Re: Realtime AEC + VAD
Previous by thread: Re: Realtime AEC + VAD
Next by thread: Re: Realtime AEC + VAD
Index(es):
- Date
- Thread