Skip to content

/work · phantm

Phantm

Real-time AI avatar & digital-twin platform

2025·SourceXAI·Sole AI Engineer·live

API endpoints

47

Inference

GPU / RunPod

Services

6+

Approach

The core architectural choice was decoupling inference from session state: each model (MuseTalk, CosyVoice, faster-whisper, Deep-Live-Cam) runs as an independent GPU-backed microservice behind a thin FastAPI gateway. This isolates latency-critical paths, lets each model scale independently on RunPod, and makes the platform resilient — one avatar service degrading can't stall the rest.

Problem

The client needed a production real-time avatar and digital-twin platform that could do voice cloning, lip sync, live camera deepfakes, and streaming speech-to-text — all behind a clean API consumable by web and mobile clients. Existing open-source stacks didn't compose, weren't documented, and had no production-grade orchestration.

How I built it

  • Designed and documented a 47-endpoint API surface covering avatars, voice, STT, live-cam sessions, and asset management.
  • Deployed each model as an independent GPU microservice on RunPod, with pre-warmed instances and autoscaling rules.
  • Built the Next.js 14 frontend against the API with streaming SSE for STT and WebSocket channels for live sessions.
  • Integrated Cloudflare R2 for cost-efficient asset storage with CDN-backed delivery.

Outcome

  • A production-ready avatar platform with full API docs, deployable on demand.
  • GPU inference orchestration tuned for cold-start minimization.
  • End-to-end session flows proven on real client use cases.

Stack

FastAPIRunPodMuseTalk 1.5CosyVoice 2faster-whisperDeep-Live-CamCloudflare R2Next.js 14
All workDalle