Sarvam AI Beats ChatGPT & Gemini: How India’s Sarwam AI Is Redefining Global Benchmarks
In February 2026, something unprecedented happened in artificial intelligence. A Bengaluru-based startup nobody had heard of published benchmark results proving its AI model outperformed Google Gemini and OpenAI’s ChatGPT. That startup is Sarvam AI, and it’s redefining what “made-in-India” artificial intelligence means globally.
Table of Contents
This isn’t hype. This is a strategic inflection point in India’s technological independence.
What is Sarvam AI? The Sovereign AI Platform Outperforming Global Giants
Sarvam AI is a Bengaluru-based AI startup founded in 2023 by Pratyush Kumar and Vivek Raghavan with a singular mission: build foundational AI systems for India, not adapted for India.
The company operates on a radical principle. Rather than retrofitting global AI models like ChatGPT or Gemini to handle Indian languages, Sarvam AI engineered proprietary vision-language and speech models from the ground up, specifically optimized for India’s unique challenges.
The result? Sarvam AI now outperforms both Google and OpenAI on critical benchmarks.
The Benchmark Victory That Shocked the AI World
On February 9, 2026, co-founder Pratyush Kumar announced Sarvam Vision’s benchmark results:
olmOCR-Bench: Sarvam Vision achieved 84.3% accuracy, surpassing:
- Google Gemini 3 Pro
- DeepSeek OCR v2
- OpenAI’s ChatGPT
OmniDocBench v1.5: Sarvam Vision scored 93.28% accuracy, among the highest documented scores for document understanding globally.
For context, Optical Character Recognition (OCR) is one of the most demanding AI tasks—reading text from images with varied quality, multiple languages, degraded scans, and complex layouts. The fact that a 2-year-old Indian startup outperformed trillion-dollar companies on this task was groundbreaking.
Tech commentator Deedy Das, who was initially skeptical of Sarvam’s approach, publicly reversed his position: “I was wrong about Sarvam. They have the best text-to-speech, speech-to-text, and OCR models for regional languages, and that’s actually really valuable.”
Meet the Founders: Pratyush Kumar & Vivek Raghavan
Understanding Sarvam AI requires understanding its founders. They’re not Silicon Valley transplants chasing trends. They’re India-first technologists with global credentials and local conviction.
Pratyush Kumar: The Architect of India-First AI
Pratyush Kumar is CEO and co-founder of Sarvam AI. His credentials are formidable:
- PhD from ETH Zurich (Swiss Federal Institute of Technology)
- B.Tech from IIT Bombay (India’s premier technology institute)
- Research experience at Microsoft Research and IBM Research
- Adjunct faculty at IIT Madras
But his most important credential is this: Kumar co-founded AI4Bharat and PadhAI, platforms dedicated to advancing language AI and affordable education across India.
Kumar’s Vision: Global AI models are structurally misaligned with India’s reality. They fail because:
- They weren’t trained on Indian document formats
- They don’t understand the chaos of degraded government forms
- They can’t parse handwritten text in regional scripts
- They ignore 22 official Indian languages as “niche markets”
Under Kumar’s technical leadership, Sarvam Vision was engineered to solve these exact problems. A 3-billion-parameter state-space vision-language model, it excels at:
- Image captioning in regional languages across South Asia
- Scene text recognition across multiple writing systems (Tamil, Telugu, Kannada, Bengali, and others)
- Chart and table interpretation from degraded, water-stained, and low-quality scans
- Complex document parsing combining mixed languages in single documents
Kumar’s public disclosure of benchmark results was bold. Publishing 84.3% accuracy directly against Gemini and ChatGPT invited industry scrutiny. But the results held. This transformed Sarvam from an experimental lab into a credible foundational AI builder on the global stage.
Vivek Raghavan: The Strategist Bridging Research to Real Impact
If Kumar is the technical architect, Vivek Raghavan is the systems thinker translating breakthrough research into national-scale utility.
Raghavan brings deep expertise in:
- AI systems architecture
- Data infrastructure
- Public sector technology
- Enterprise deployment
His influence at Sarvam AI is visible in relentless focus on usability, pricing, and deployment readiness. While many AI startups chase abstract benchmarks for academic papers, Sarvam prioritizes where AI actually meets citizens: banks, government offices, courts, schools, enterprises.
This philosophy manifests in products like Bulbul V3, Sarvam’s text-to-speech model. It’s not built for demos. It’s built for adoption. With 35 voices across 22 regional languages, Bulbul V3 works for:
- Voice-enabled government services
- Regional banking applications
- Educational technology platforms in regional languages
- Accessibility tools for people with visual impairments
The Shared Conviction: AI is National Infrastructure
What binds Kumar and Raghavan is a core belief: AI systems are becoming national infrastructure. Language models, vision systems, and speech engines will shape how governments operate, how citizens access services, and how entire economies scale.
If India doesn’t control these foundational components, India outsources its digital future. Sarvam AI exists to prevent that.
Sarvam AI’s Products: Sovereign Technology for India
1. Sarvam Vision: The OCR That Beats Global Giants
What it does: Sarvam Vision is a vision-language model capable of understanding and interpreting images with text, charts, tables, and handwritten content.
Key capabilities:
- Multilingual OCR: Recognizes text in all 22 official Indian languages
- Degraded scan handling: Works with faded documents, water-stained papers, low-quality scans
- Script diversity: Handles Tamil script, Telugu script, Kannada script, Bengali script, Oriya script, Punjabi script, Gujarati script, Marathi script, and 14 other regional writing systems
- Complex layouts: Parses government forms, bank statements, legal documents, court filings
- Chart and table interpretation: Understands visual data representations
- Historical document processing: Works with old, poorly preserved documents
Why it matters: Global OCR models like those in Google Lens or Azure Computer Vision were trained predominantly on English and European languages. Indian documents are fundamentally different—they feature water damage, handwriting, mixed scripts, and unusual layouts that global models struggle with.
Sarvam Vision achieved 84.3% accuracy on olmOCR-Bench—outperforming Gemini—because it was trained on Indian data from the start.
Real-world applications:
- Banking: Digitizing customer KYC documents, loan applications, check deposits
- Government: Digitizing land records, court documents, voter registrations
- Insurance: Processing claim documents, policy records
- Education: Digitizing school records, exam papers, historical archives
- Legal: Converting court filings, contracts, property deed documents to searchable formats
Pricing: Affordable per-page pricing with a clean, intuitive API. According to Deedy Das, Sarvam’s pricing is “very reasonable” compared to enterprise solutions.
2. Bulbul V3: Text-to-Speech for Regional Languages
What it does: Bulbul V3 converts text to natural-sounding speech in regional languages across South Asia.
Key specifications:
- 35 voices across 22 regional languages
- Linguistic authenticity: Voices span from historical dialects to modern contemporary language variations
- Multiple quality tiers: Options for different use cases and bandwidth requirements
- Natural prosody: Understands language-specific stress patterns, intonation, and rhythm
Why it matters: Global TTS (text-to-speech) engines like Google’s or Microsoft’s are optimized for English, Spanish, and Mandarin. Regional language TTS is deprioritized because the market isn’t profitable by Silicon Valley standards. Yet 900+ million people in South Asia primarily speak regional languages.
Sarvam’s Bulbul V3 fills this gap. It’s not a nice-to-have feature. It’s essential infrastructure for accessibility and inclusion.
Real-world applications:
- Voice-enabled government services: Citizens in regional languages can access government portals through voice
- Banking apps: Regional language support for financial services
- Educational technology platforms: Learning content in students’ native languages
- Accessibility: Visually impaired users accessing digital services
- Customer service: Banks and insurance companies serving customers in regional languages
Deedy Das’s assessment: “The pricing is very reasonable. The website is not only beautifully designed but very easy to use.”
Why Sarvam AI Matters: The Sovereign AI Inflection
Sarvam AI’s rise signals something larger than one startup’s success. It represents India’s transition from AI consumer to AI creator.
The Geopolitical Significance
When foundational AI models are controlled by US and Chinese companies, those companies also control:
- How information is processed
- What languages and cultures are prioritized
- Which problems are deemed important
- How governments and enterprises depend on foreign technology
Sovereign AI means India building, controlling, and evolving its own foundational AI systems. This isn’t isolationism—it’s agency.
Sarvam AI demonstrates that India can:
- Compete globally on technical merit
- Solve locally with deep market understanding
- Scale independently without relying on OpenAI or Google APIs
- Control critical infrastructure (language, vision, speech) that affects billions
The Strategic Advantage
Sarvam AI’s approach reveals a counterintuitive insight: By solving India’s hardest AI problems, you build globally competitive models.
Why? Because:
- Regional languages are structurally complex: Handling Tamil or Bengali’s linguistic richness teaches AI systems robust language processing.
- Indian documents are challenging: Water-stained government forms, handwritten ledgers, and mixed-script content are harder than pristine English PDFs. Solving for this generalizes to any difficult OCR problem globally.
- India’s scale is massive: Training on India’s data means training on 1.4+ billion people. Models become more robust, diverse, and universally applicable.
Sarvam Vision outperforms ChatGPT on OCR not because Sarvam is building for 300 million Americans. It’s because Sarvam is building for 1.4 billion Indians with vastly harder technical problems.
Sarvam AI’s Impact: Who Benefits?
Government Agencies
Digitizing legacy records—land deeds, court documents, voter registrations—requires OCR that understands regional languages and degraded scans. Sarvam Vision is purpose-built for this task.
Banks & Financial Institutions
Processing KYC documents, loan applications, and checks requires multilingual OCR and regional language support. HDFC, ICICI, and smaller regional banks are already evaluating Sarvam’s solutions.
Insurance Companies
Claim processing, policy document extraction, and fraud detection require fast, accurate OCR. Sarvam AI reduces manual processing time and operational costs significantly.
Education & EdTech
Digitizing school records and creating regional language learning content requires both OCR (for historical materials) and TTS (for accessible learning). Sarvam’s technology stack handles both effectively.
Legal & Compliance
Court documents, contracts, and regulatory filings often feature regional languages and unusual formatting. Sarvam Vision’s ability to parse complex layouts is transformative.
Healthcare & Diagnostics
Patient records, medical reports, and prescriptions often feature handwriting and regional text. Sarvam AI enables digital health infrastructure.
The Technology: What Makes Sarvam AI Different?
State-Space Models vs. Transformers
Sarvam Vision uses a 3-billion-parameter state-space vision-language model, not a standard transformer architecture. This matters because:
- State-space models are more efficient
- They handle long sequences better
- They’re smaller and faster to deploy
- They require less computational overhead
For a sovereign AI company serving India’s government and enterprises, efficiency and cost matter. Sarvam’s architecture choice reflects this pragmatism.
Training Data: India-First
Sarvam Vision was trained on:
- Real Indian government documents
- Banking and financial documents from Indian institutions
- Court documents from Indian courts
- Educational records from Indian schools
- Regional language text and handwriting samples
This India-first training approach is why Sarvam outperforms models trained on primarily English data.
Continuous Improvement
Sarvam AI isn’t a static model. The company continuously:
- Collects user feedback
- Improves performance on emerging document types
- Adds support for edge cases
- Optimizes for new languages and scripts
This feedback loop ensures Sarvam’s models improve as they’re deployed, unlike closed-source models like ChatGPT that update infrequently.
Sarvam AI’s Business Model: Sustainable & Scalable
Sarvam AI operates on a SaaS API model with:
- Per-page pricing for vision services (OCR, document parsing)
- Per-character pricing for speech services (TTS, STT)
- Volume discounts for enterprises
- On-premise deployment options for sensitive government and financial data
This pricing model is sustainable because:
- Indian enterprises can afford it (unlike $50/1M tokens for ChatGPT)
- Government agencies have budget for per-use pricing
- Banks and insurance companies see ROI quickly
- Startups can build on Sarvam’s APIs affordably
The Competitive Advantage: Why Sarvam Wins
| Aspect | Sarvam AI | ChatGPT | Gemini |
|---|---|---|---|
| Regional Language OCR | 84.3% (olmOCR-Bench) | Lower on regional scripts | Lower on regional scripts |
| Regional Languages Supported | 22 official languages native | 100+ but lower quality for regional | 100+ but deprioritized regional |
| TTS for Regional Languages | 35 voices, authentic | Limited regional voices | Limited regional voices |
| Cost for India | 10-20x cheaper | Enterprise pricing | Enterprise pricing |
| Sovereignty | Built in India | US-controlled | US-controlled |
| Data Privacy | On-premise option | Cloud-only | Cloud-only |
Recognition & Validation
Tech Industry Validation:
- Deedy Das (respected tech commentator): “I was wrong about Sarvam. The work is impressive.”
- Industry observers: Recognition across NDTV, Times of India, Forbes India
- Global media: Coverage in international technology publications
Benchmark Validation:
- olmOCR-Bench: 84.3% (beat Gemini 3 Pro, DeepSeek OCR v2, ChatGPT)
- OmniDocBench v1.5: 93.28% (among highest globally)
Government & Enterprise Interest:
Multiple Indian banks, government agencies, and enterprises are in pilot and adoption phases.
The Future: Where Sarvam AI is Headed
Based on public statements and roadmap signals, Sarvam AI is expanding into:
- Larger Language Models: Building India-native LLMs rivaling ChatGPT’s reasoning capability
- Multimodal reasoning: Combining vision, language, and speech for complex decision-making
- Domain-specific models: Specialized models for healthcare, legal, financial services
- Enterprise deployment: On-premise solutions for governments and large institutions
- Global expansion: Offering Sarvam’s regional language models globally for diaspora and international users
Why Sarvam AI Matters to You
For Founders: Sarvam AI proves that solving India’s problems creates globally competitive technology. You don’t need Silicon Valley’s approval to build something world-class.
For Investors: Sovereign AI is a multi-billion-dollar category. Sarvam AI is a first-mover in India’s AI independence.
For Enterprises: Sarvam AI offers cost-effective, sovereign alternatives to ChatGPT and Gemini, with better performance on regional language tasks.
For Developers: Sarvam’s APIs are clean, affordable, and optimized for South Asian use cases. Building on Sarvam means supporting regional AI infrastructure.
The Bigger Picture: India’s AI Independence
Sarvam AI is not alone. India is witnessing a broader wave of AI startups building sovereign, India-first alternatives to global models. This movement will define the next decade of technology in the region.
When AI is built with local context, linguistic depth, and real-world constraints in mind, it doesn’t just compete globally—it wins.
Stay ahead of India’s startup ecosystem with exclusive funding news, founder stories, and AI innovation coverage at BestStartup.India.
Related News