Lessons from testing GPT and Gemini native audio models for voice agents

(deepsense.ai)

1 points | by Applied_AI 3 days ago ago

2 comments

ipotapov 2 days ago ago

interesting that you went with a voice-to-voice realtime pipeline for latency reduction. speech-swift (which I maintain) could complement this by adding on-device speaker diarization, enhancing your voice agent's ability to distinguish between speakers without cloud dependency. https://soniqo.audio/guides/diarize
Applied_AI 3 days ago ago

[flagged]