1 comments

  • PranayKumarJain an hour ago ago

    Nice—this is a very pragmatic “works with just TwiML” approach.

    A couple questions / thoughts from building voice agents in production:

    - How do you handle barge‑in / interruptions? With <Gather input="speech"> + polling, it’s hard to do true full‑duplex + partial ASR. Have you considered a hybrid mode where you keep the TwiML simplicity for setup, but optionally switch to <Stream> (Media Streams) when people want sub‑second turn-taking? - Twilio’s built-in speech recog is convenient, but in my experience it can be the first thing teams outgrow (accuracy, language coverage, costs, and lack of token-level partials). Do you expose an interface so people can swap STT later without reworking the call control? - For long agent responses: do you chunk <Say> / keep call alive with <Pause>? Any gotchas around Twilio timeouts while the agent is “thinking”?

    We’ve run into the same infra-vs-latency tradeoff at eboo.ai (real-time voice agents / telephony + WebRTC). If you ever want a sanity check on the lowest-latency Twilio path (Media Streams + incremental STT + barge-in), happy to compare notes.