The same bare-metal QUIC substrate that runs realtime voice, your robots, your browser, your game, your phone call.
A single unit of your data, on the wire.
That packet is one of thousands on a single connection.
Every core runs its own connections, full tilt.
messages/second on one 8-core box — and improving. 36,732 concurrent connections at 0.4% CPU.
Every box, every modality, on the same transport.
A scrap of sound, captured the instant it's spoken.
Words on the wire, in real time.
Caller, model, agent — one pipe, both directions.
The line tells the caller from the TV, a coworker, or background noise — and listens to the right one.
On every provider, with first-class barge-in.
A single piece of the answer, fresh off the model.
Tokens leave the GPU and start arriving immediately.
Point your OpenAI-compatible client at the line — nothing in your app changes.
Same SSE wire your client already speaks — the moment a token exists, your app has it.
Explore in detail →Four modalities, in one place.
A stall in one never blocks another — no head-of-line blocking.
One line, nothing to glue together.
A carrier hands you an inbound call.
Media rides QUIC direct to the agent — no extra hops in the path.
Listen, whisper, or take over — over the same connection, without rerouting the call.
Fan it out to every supervisor and dashboard, live.
Onboard autonomy stays on its own network.
A live mirror of everything it senses.
Reaches the cloud over QUIC — observe from anywhere.
Input on its own fast lane.
Fast input and rich voice, each tuned to what it needs.
Voice and lip-sync arrive together — and it can be interrupted.
Every branch flows back into the same transport line.
payload types, one transport. Add a modality, not a stack.
Idiomatic clients for the languages your stack already speaks.
pip install telequick — async client, drop-in for the realtime API.
Browser + Node, with a helper for browser media over your transport.
Bare-metal bindings to the core. Zero-copy frame handoff.
# realtime voice in 6 lines from telequick import Session s = Session(provider="openai-realtime") async for ev in s.connect("+1..."): if ev.bargein: s.cancel() # instant play(ev.audio)
Coming from another transport? The migration guide maps your existing calls onto the substrate one site at a time.
Read the migration guide →Usage-based on egress — per GB / month — with volume tiers and committed-use discounts. We quote against the actual shape of your traffic, not a list-price grid.
New engagements start with a free production trial: ship real traffic onto the substrate, see your own histograms, then we size a contract to it. No procurement gate to start measuring.
No. The substrate exposes the realtime + telephony interfaces your code already speaks. You point your existing client at our endpoint and keep the rest. There's a drop-in shim for OpenAI Realtime, a SIP listener for carrier traffic, and idiomatic SDKs in 9 languages when you want native types.
Interruption detection lives in the transport, not in each provider's SDK. The moment caller audio crosses the speech-gate threshold we hold the agent's output stream and emit a cancel — same code path whether the model behind it is OpenAI Realtime, Gemini Live, or a self-hosted llama.cpp. The tail you hear after you start talking is one packet, not a sentence.
WebSocket sits on TCP, so a single dropped packet head-of-line-blocks every subsequent frame on the same socket — your agent's audio stalls behind your text. QUIC streams are independent: a lost packet only delays its own stream. On a 3% loss link in our DevTools benchmarks, the inter-arrival p95 widens by ~6× on WS and ~1.4× on QUIC. The histograms are on the demo page if you want to look. (For a deeper dive on QUIC's loss-recovery story, see the Google QUIC team's writeups.)
WebTransport is shipped in Chrome and Edge today. Safari and Firefox still trail, and some corporate proxies block UDP. The SDK transparently falls back to WebSocket-over-HTTPS using the same agent and the same fast-cancel barge-in path — you just lose the QUIC head-of-line-blocking benefit on that one client. No code change.
Usage-based on egress (per-GB / month), quoted against your actual workload — not a public list-price grid. Volume tiers and committed-use discounts apply at scale. The infra floor sets the math: one 8-core box carries ~5,000 concurrent sessions at ~$200/month bare-metal, and pricing scales above that with the value added per minute. New engagements start with a free production trial so you size the contract to real numbers, not a forecast.
Yes. The core is a single statically-linked binary plus a Redis. It runs in your own racks — including air-gapped deployments where the model worker is also local — and there's a hard-enforced licensing path for that. Ask us about it during your pilot; we keep our hosted tenants and self-host tenants on the exact same wire so behaviour matches.
LiveKit, Daily, and Chime are SFUs built for many-party video. Their per-participant-minute pricing assumes a meeting room. Twilio Media Streams is PCMU over WebSocket, which gives you carrier reach but not codec choice or sub-200ms barge-in. We're a single-tenant agent transport: one caller talking to one (or many) AI providers, with codec/transport/perception all owned end-to-end. Different shape, different price.
Anything that speaks a streaming-audio or streaming-text API. We ship verified adapters for OpenAI Realtime, Gemini Live, ElevenLabs Conversational AI, and a local llama.cpp worker over QUIC. For STT-only or TTS-only pipelines you can mix providers — Deepgram on input, Cartesia on output, GPT-4o in between — and the substrate handles the joins.
Python, TypeScript, Go, Rust, Java, C#, Swift, Kotlin, and C++ — same API shape, idiomatic types per language. They all wrap one C++ core via FFI, so a wire-protocol fix in core ships to every SDK by re-building. The Python and TypeScript clients see most of the dogfooding; the others are tested against a polyglot smoke matrix on every release.
On the public internet to OpenAI Realtime via our pool, first-token-after-barge-in lands in ~120-180 ms p50 from the moment the caller's first speech-frame arrives, with the long tail dominated by the upstream model not the transport. On localhost the transport itself adds <2 ms over raw UDP. Detailed histograms are exposed in the demo page's post-call summary.
SOC 2 Type II is in audit. HIPAA-eligible BAAs are available on enterprise contracts. Regional residency is supported via single-region deployments (us-east, eu-west, ap-south today) and on-prem covers everything else. We never train on customer audio. Recording is opt-in per tenant, encrypted at rest, with caller-side disclosure macros built into the dialplan.
Self-serve: minutes. Sign up, install the SDK, point it at a phone number we provision, run a call. White-glove for production traffic: a 30-minute call to walk through your existing stack, then a single-region pilot with your real numbers usually inside 48 hours. Migration off your current vendor (Twilio, Vapi, LiveKit) is typically one PR per call site.