Dalle — Premium voice agent — Cerebras · Cartesia · LiveKit · Case Study

Approach

Voice agents live or die on round-trip latency. I chose Cerebras for sub-100ms LLM inference, Cartesia for neural TTS that streams the first audio chunk in under 200ms, and LiveKit for WebRTC so audio hops server→client without TCP head-of-line blocking. Each layer is individually replaceable without touching the others.

Problem

Consumer voice agents feel robotic because latency between speech-end and response-start is over a second. The goal was sub-500ms end-to-end for a natural, interruption-tolerant conversation.

How I built it

▸Piped live mic audio over LiveKit WebRTC into a streaming transcription worker.
▸Routed transcripts to Cerebras LLM inference with early-exit streaming.
▸Streamed generated tokens into Cartesia TTS and piped audio back over the same LiveKit channel.
▸Implemented barge-in detection so the user can interrupt mid-sentence.

Outcome

→Sub-500ms measured round-trip latency in early testing.
→Clean, swappable stack — each component is a separate module.
→Foundation for press-to-talk integration on the Sentry portfolio itself.

Stack

CerebrasCartesiaLiveKitPythonFastAPI