Skip to content

/work · dalle

Dalle

Premium voice agent — Cerebras · Cartesia · LiveKit

2025·Independent·Sole Engineer·in-development

Inference

Cerebras

TTS

Cartesia

Transport

LiveKit

Approach

Voice agents live or die on round-trip latency. I chose Cerebras for sub-100ms LLM inference, Cartesia for neural TTS that streams the first audio chunk in under 200ms, and LiveKit for WebRTC so audio hops server→client without TCP head-of-line blocking. Each layer is individually replaceable without touching the others.

Problem

Consumer voice agents feel robotic because latency between speech-end and response-start is over a second. The goal was sub-500ms end-to-end for a natural, interruption-tolerant conversation.

How I built it

  • Piped live mic audio over LiveKit WebRTC into a streaming transcription worker.
  • Routed transcripts to Cerebras LLM inference with early-exit streaming.
  • Streamed generated tokens into Cartesia TTS and piped audio back over the same LiveKit channel.
  • Implemented barge-in detection so the user can interrupt mid-sentence.

Outcome

  • Sub-500ms measured round-trip latency in early testing.
  • Clean, swappable stack — each component is a separate module.
  • Foundation for press-to-talk integration on the Sentry portfolio itself.

Stack

CerebrasCartesiaLiveKitPythonFastAPI
PhantmAll workStudia