All case studies
Just Hired AI · AI agency · 2026

Digital Workforce Platform — Voice AI, RAG, and real-time telephony

Multi-tenant SaaS that lets any business deploy a Voice AI receptionist on their own number in minutes.

3 min read
XanoVertex AITwilioGemini LiveRAG
Digital Workforce Platform — Voice AI, RAG, and real-time telephony
Multi-tenant
Tenants supported
Scalable
Concurrent calls
Self-serve
Knowledge ingestion

The problem

The agency had a strong agent product but every new client was a custom integration:

  • 2–3 weeks of telephony plumbing per onboard
  • Knowledge base hand-built per business
  • No way for clients to update their own agent

They wanted a product, not a service line. Same agent quality, but tenant-self-serve and 1-day onboarding.

The approach

The architecture has three layers:

Three-layer architecture

Why Gemini Live (instead of OpenAI Realtime)

For this client, two reasons:

  • Multimodal pricing. Vertex billing rolled into existing GCP credits.
  • Latency from EU/US. Sub-300ms audio round-trip held in our tests.

I'd reach for OpenAI Realtime instead when:

  • The client already lives in the OpenAI ecosystem
  • They want voice cloning via parallel ElevenLabs hookups
  • They need OpenAI's stronger function-calling reliability for complex tool chains

The RAG layer

Each tenant gets:

  1. An ingestion pipeline (PDF / website / docs) that chunks + embeds into pgvector with a tenant_id filter.
  2. A retrieval prompt that runs before every model turn, scoped strictly to that tenant.
  3. A tenant_id-scoped tool API the agent can call (look up an order, book an appointment).
The strict scoping rule

Every retrieval and every tool call is bounded by tenant_id. We treat tenant data isolation the same way we'd treat row-level security in Postgres — defense in depth, not just a query filter.

Tech decisions in detail

Why Xano as the control plane

A NestJS service would have given more flexibility, but the agency's existing team was Xano-fluent. Putting tenant config + routing in Xano meant non-engineers could ship tenant-specific tweaks without a deploy.

Where Xano was the wrong tool: the call runtime itself. Streaming audio + tool orchestration belongs in a proper service. So we kept Xano on the slow path (config, webhooks, billing) and put a small NestJS service in front of Twilio for the fast path.

Twilio number provisioning

Each tenant provisions their own number through the console. Twilio's number search → purchase → webhook bind is a three-call flow; I wrapped it in a Xano function stack so the UI just calls provision_number(tenant_id, area_code).

Outcomes

3 wks
Time to MVP
From contract to first live tenant
1 day
Tenant onboarding
Down from 2–3 weeks per integration
Tenant-isolated
RAG + tools
Strict per-tenant scoping at every layer

Lessons

  • Two backends are sometimes the right call. Don't force one tool to do both slow-path config and real-time orchestration.
  • Streaming audio debug needs first-class tooling. I built a Loom-style transcript replay early on — saved a week of round-tripping with the client.
  • Tenant isolation is policy, not query syntax. Every retrieval, every tool, every webhook needs to assert it.

Building something similar?

Send a quick note — happy to compare notes on the architecture.