Work Voice-First AI Sales Training App

Voice-First AI Sales Training App

BUILD STAGES

1 · Voice UX + prompt architecture

2 · Sarvam STT/TTS integration

3 · Safety + privacy layer

4 · All-hands demo (Feb '26)

5 · BETA → Production hardening

TECH STACK

Gemini 2.5 Flash

Core LLM, reasoning + response generation

Upgraded from 2.0 for better contextual accuracy

Sarvam AI

Speech-to-Text + Text-to-Speech

Selected for quality, cost, and Indian language support

Safety Response Layer

Content moderation + boundary enforcement

Prevents off-topic, harmful, or misleading outputs

Privacy Architecture

Data handling + session isolation

No persistent storage of user voice or conversation data

METHODOLOGY

This product was built and iterated using a constrained problem-solving framework developed through repeated deployment in real-world, resource-limited environments. The 8 steps below are how I approach every AI product build.

Define the pinpoint

Not the surface complaint, identify the root failure. What decision is actually broken?

Map hard constraints

Budget, timeline, deployment feasibility. Constraints first, not as afterthoughts.

Stakeholder loop

Speak to every affected person. Map their actual workflow, not their stated preference.

Build a working prototype

Not a plan, not a deck. Something you can put in front of a real user within days.

Iterate after every conversation

Don't batch feedback. Rebuild incrementally, each conversation informs the next build.

Internal validation

Loop until the team says "this solves the problem." Consensus before external exposure.

External validation

Deploy to real users. Observe actual behaviour, not just survey responses.

Feedback spiral

Continuous improvement loop post-launch. The product is never finished, it compounds.

Where It Stands

The product moved past the demo stage into active BETA, with a voice pipeline (Sarvam STT → Gemini 2.5 Flash → Sarvam TTS), a privacy architecture that keeps voice and session data non-persistent, and a safety response layer that enforces content boundaries at inference time. Production hardening is ongoing. This is the first voice-native AI product built at the organisation, and it is setting the architecture standard for what comes next.

“If this system reaches deployment at scale, the use cases extend well beyond the current application, healthcare training, HR onboarding in organisations, compliance and skilling programmes. The architecture is flexible enough to serve each of those contexts.”
Dr. Suvarna Patil
Project Supervisor · D.Y. Patil IMER