Back to Work
Work Voice-First AI Sales Training App

Voice-First AI Sales Training App

Category

BETA · VOICE-FIRST AI

Role

Solo builder — greenfield

Timeline

Aug 2025 → Present

Status

BETA · Moving to Production

BETACurrent Status
Gemini 2.5Flash LLM
Sarvam AITTS + STT
Voice-FirstInteraction Model
01SIGNAL

Field sales representatives at a health-tech startup had no scalable way to answer product questions in real time — training was slow, inconsistent, and dependent on manager availability. A voice-first AI tool had never been attempted at the organisation.

02ARCHITECTURE
  • Designed and built the full product solo: voice pipeline architecture (Sarvam STT → Gemini 2.5 Flash → Sarvam TTS), safety response layer, privacy-preserving session design, and deployment on Vercel.
  • Started from zero — no prior AI product at the organisation.
03OUTCOME

Moved from initial demo at company all-hands (Feb 2026) to active BETA with expanded capabilities: privacy architecture, content safety response, and vernacular language support via Sarvam AI. Currently hardening for production release.

Shipped
04EDGE

Chose Sarvam AI over AWS Polly / Google Cloud TTS for STT+TTS — better cost profile and Indian language fidelity. Gemini 2.5 Flash over 2.0 for improved reasoning depth without latency penalty. Accepted BETA constraints (limited concurrent users) to ship privacy and safety features before full production.

The demo ran end-to-end at the all-hands without a slide deck. That was the validation we needed to move to the next phase.

Saathealth Team

Internal Review · February 2026

BUILD STAGES

1 · Voice UX + prompt architecture
2 · Sarvam STT/TTS integration
3 · Safety + privacy layer
4 · All-hands demo (Feb '26)
5 · BETA → Production hardening

TECH STACK

Gemini 2.5 Flash

Core LLM — reasoning + response generation

Upgraded from 2.0 for better contextual accuracy

Sarvam AI

Speech-to-Text + Text-to-Speech

Selected for quality, cost, and Indian language support

Safety Response Layer

Content moderation + boundary enforcement

Prevents off-topic, harmful, or misleading outputs

Privacy Architecture

Data handling + session isolation

No persistent storage of user voice or conversation data

METHODOLOGY

This product was built and iterated using a constrained problem-solving framework developed through repeated deployment in real-world, resource-limited environments. The 8 steps below are how I approach every AI product build.

01

Define the pinpoint

Not the surface complaint — identify the root failure. What decision is actually broken?

02

Map hard constraints

Budget, timeline, deployment feasibility. Constraints first — not as afterthoughts.

03

Stakeholder loop

Speak to every affected person. Map their actual workflow, not their stated preference.

04

Build a working prototype

Not a plan, not a deck. Something you can put in front of a real user within days.

05

Iterate after every conversation

Don't batch feedback. Rebuild incrementally — each conversation informs the next build.

06

Internal validation

Loop until the team says "this solves the problem." Consensus before external exposure.

07

External validation

Deploy to real users. Observe actual behaviour — not just survey responses.

08

Feedback spiral

Continuous improvement loop post-launch. The product is never finished — it compounds.

Where It Stands

The product moved past the demo stage into active BETA — with a voice pipeline (Sarvam STT → Gemini 2.5 Flash → Sarvam TTS), a privacy architecture that keeps voice and session data non-persistent, and a safety response layer that enforces content boundaries at inference time. Production hardening is ongoing. This is the first voice-native AI product built at the organisation, and it is setting the architecture standard for what comes next.

“If this system reaches deployment at scale, the use cases extend well beyond the current application — healthcare training, HR onboarding in organisations, compliance and skilling programmes. The architecture is flexible enough to serve each of those contexts.”