Voice-First AI Sales Training App
BETA · VOICE-FIRST AI
Solo builder — greenfield
Aug 2025 → Present
BETA · Moving to Production
Field sales representatives at a health-tech startup had no scalable way to answer product questions in real time — training was slow, inconsistent, and dependent on manager availability. A voice-first AI tool had never been attempted at the organisation.
- Designed and built the full product solo: voice pipeline architecture (Sarvam STT → Gemini 2.5 Flash → Sarvam TTS), safety response layer, privacy-preserving session design, and deployment on Vercel.
- Started from zero — no prior AI product at the organisation.
Moved from initial demo at company all-hands (Feb 2026) to active BETA with expanded capabilities: privacy architecture, content safety response, and vernacular language support via Sarvam AI. Currently hardening for production release.
Chose Sarvam AI over AWS Polly / Google Cloud TTS for STT+TTS — better cost profile and Indian language fidelity. Gemini 2.5 Flash over 2.0 for improved reasoning depth without latency penalty. Accepted BETA constraints (limited concurrent users) to ship privacy and safety features before full production.
“The demo ran end-to-end at the all-hands without a slide deck. That was the validation we needed to move to the next phase.”
BUILD STAGES
TECH STACK
Gemini 2.5 Flash
Core LLM — reasoning + response generation
Upgraded from 2.0 for better contextual accuracy
Sarvam AI
Speech-to-Text + Text-to-Speech
Selected for quality, cost, and Indian language support
Safety Response Layer
Content moderation + boundary enforcement
Prevents off-topic, harmful, or misleading outputs
Privacy Architecture
Data handling + session isolation
No persistent storage of user voice or conversation data
METHODOLOGY
This product was built and iterated using a constrained problem-solving framework developed through repeated deployment in real-world, resource-limited environments. The 8 steps below are how I approach every AI product build.
Define the pinpoint
Not the surface complaint — identify the root failure. What decision is actually broken?
Map hard constraints
Budget, timeline, deployment feasibility. Constraints first — not as afterthoughts.
Stakeholder loop
Speak to every affected person. Map their actual workflow, not their stated preference.
Build a working prototype
Not a plan, not a deck. Something you can put in front of a real user within days.
Iterate after every conversation
Don't batch feedback. Rebuild incrementally — each conversation informs the next build.
Internal validation
Loop until the team says "this solves the problem." Consensus before external exposure.
External validation
Deploy to real users. Observe actual behaviour — not just survey responses.
Feedback spiral
Continuous improvement loop post-launch. The product is never finished — it compounds.
Where It Stands
The product moved past the demo stage into active BETA — with a voice pipeline (Sarvam STT → Gemini 2.5 Flash → Sarvam TTS), a privacy architecture that keeps voice and session data non-persistent, and a safety response layer that enforces content boundaries at inference time. Production hardening is ongoing. This is the first voice-native AI product built at the organisation, and it is setting the architecture standard for what comes next.
“If this system reaches deployment at scale, the use cases extend well beyond the current application — healthcare training, HR onboarding in organisations, compliance and skilling programmes. The architecture is flexible enough to serve each of those contexts.”