What AI Means for India: How Done Right It Could Amplify Hundreds of Languages Instead of Erasing Them

About Will

I run a multi-site content operation on Claude and Notion with autonomous agents — and I write about what we do, including what breaks.

Connect on LinkedIn →

Last fact-check: May 25, 2026

The previous article in this curriculum walked through what AI is doing to non-native English speakers in U.S. higher education — penalizing them for the formal English they learned in school, flagging their writing as machine-generated, leaving them to navigate policies that assume native-speaker defaults. That article ended with the suggestion that there are other relationships between AI and language than the one U.S. institutions have chosen.

This article is about the largest possible counterexample: India. A country with twenty-two officially recognized languages, hundreds of others actively spoken, more than a billion people, and a relationship to English that is simultaneously historical, hierarchical, and changing fast. India is where the question “what does AI look like when designed to amplify linguistic diversity rather than penalize it” gets answered, in real time, by a billion people testing the answer.

This is a thesis piece more than a walkthrough. The CSU rollout is one story about AI and language. India is a different story. The contrast is what’s instructive. This article does not claim to be written from inside India’s lived experience — I’m an American writing from Tacoma — and the closing sections name what I can and cannot see from here.

It’s part of Tygart Media’s free AI Literacy curriculum at tygartmedia.com/category/ai-literacy. The pillar is here.


The setup, briefly

India has twenty-two languages listed in the Eighth Schedule of its Constitution. Hindi and English are the official languages of central government, with state governments operating in additional regional languages. Beyond the constitutional list, the 2011 census recorded 122 major languages and more than 1,600 mother tongues. Some are spoken by hundreds of millions; some by a few thousand. Many have rich written traditions going back centuries. Many are primarily oral.

For decades, the language of upward mobility in India has been English. Knowledge of English correlates with access to higher education, urban employment, international opportunities, and economic class. This is not unique to India — many post-colonial countries have similar dynamics — but in India it produces a specific friction. A young person in a Bhojpuri-speaking village in eastern Uttar Pradesh might be cognitively brilliant and economically excluded, because the on-ramps to opportunity require English, and English is something acquired through schooling that village families often cannot afford.

AI changes this picture, potentially significantly. Whether it changes it for better or worse depends on which version of AI gets built, who builds it, and who it’s built for.

The version of AI that hurts Indian linguistic diversity

Start with what happens by default. Large language models are trained on text. The text on the internet is overwhelmingly in English. The text in other languages is heavily concentrated in major European and East Asian languages — French, Spanish, German, Mandarin, Japanese — with significantly less coverage of South Asian languages, and much less of regional Indian languages.

The result is that the AI tools most people use, including in India, perform best in English, second-best in Hindi, less well in the other constitutional languages, and badly or not at all in regional languages and dialects. Bhojpuri, Marathi, Gujarati, Telugu, Tamil, Bengali — these have hundreds of millions of speakers collectively. Most do better than the smaller languages. None do as well as English.

If AI tools become the default interface to information, education, government services, healthcare guidance, and economic opportunity — and they are, fast — and those tools work best in English, then the existing linguistic hierarchy in India gets reinforced and accelerated. The Bhojpuri-speaking villager can still access the AI tool. The tool will work, badly, in their language. It will work much better if they switch to Hindi, and best of all if they switch to English. Over time, this nudges everyone toward English, in a kind of soft linguistic gravity.

This is not what happens by anyone’s deliberate plan. It’s what happens by default if AI development continues to be driven by training data availability and commercial market size in English-speaking markets. The default outcome is the erasure version. The amplification version requires deliberate choices to go differently.

The version of AI that amplifies Indian linguistic diversity

The amplification version has several pieces that are technically possible right now, some of which are already being built.

Models trained intentionally on Indian language corpora. Several efforts are underway. AI4Bharat, an academic initiative associated with IIT Madras, has been releasing models and datasets for multiple Indian languages for years. Reliance Jio’s BharatGPT effort is similar in motivation, different in execution. Sarvam AI is doing related work. The Indian government’s BHASHINI mission is investing in language technology infrastructure. None of these are at the scale of OpenAI or Google. All of them are working on the right problem.

When AI is trained intentionally on a language — its idioms, its registers, its literary tradition, its everyday speech — the resulting model can serve that language with the same fluency that English models serve English. The capability gap between, say, Tamil and English in an AI system is not a law of physics. It’s a consequence of training data and intent. Both can be changed.

Voice-first interfaces for primarily oral languages. Many Indian languages have strong oral traditions and less developed written corpora. The dominant AI interface — text in, text out — is a poor fit. Voice-first AI is a better fit. A speaker of a regional language can talk to the model in their native register, hear back in the same, and never have to confront the difficulty that their language is written less often than it’s spoken.

The technology for this exists. Speech recognition and synthesis in Indian languages have improved dramatically in the last five years. The interfaces are still primarily designed for English and Hindi, but the underlying capability is there. The question is whether the products built on top of the capability will reach the people who would benefit most.

Translation as a bridge, not a replacement. A well-designed AI translation layer lets a Marathi speaker access information originally written in English, a Telugu farmer read agricultural research from a Tamil university, a Bengali student engage with Hindi cinema scholarship. The translation isn’t pushing them toward English. It’s giving them access to the rest of the world’s information in their own language. The direction of flow matters. Translation that pulls information into local languages is amplification. Translation that pushes local-language speakers to consume English-default content is something else.

Educational tools in the medium of instruction. A student learning physics in Kannada-medium schools should be able to ask an AI tutor about Newton’s laws in Kannada, get a response in Kannada that uses Kannada-language scientific vocabulary, and be able to discuss the answer with their parents in Kannada. The current default — they ask in Kannada, the AI responds in either bad Kannada or fluent English, the parents can engage with neither — fragments the household’s intellectual life. The amplification version keeps the conversation in the language the household lives in.

Preservation work at scale. India has languages that are not endangered but are under pressure — fewer young speakers, less media in the language, narrower domains of use. AI can be part of the response. Recording, transcribing, and modeling these languages preserves them in a form that future speakers can access. This is happening for some languages. It could happen for many more.

What India has that the U.S. doesn’t, in this conversation

One thing worth saying clearly: India is not approaching this question from behind. India is approaching it from a different starting position that has some real advantages.

The U.S. is wrestling with a question that fundamentally is “how do we integrate AI into a system designed without it.” Universities, classrooms, assessment models, hiring pipelines — all of these were designed in a pre-AI era and now have to accommodate something they weren’t built for. The CSU literacy gap is one symptom of this. The detector false-positive problem affecting non-native speakers is another. The question is essentially: how do we retrofit AI into a system whose defaults are English-monoglot, native-speaker-normed, and built around the assumption that writing is the primary medium of intellectual work?

India is approaching AI from a position where many of these defaults were never settled to begin with. Multilingualism is not a problem to be retrofitted — it’s the lived condition. Voice as a primary medium of communication is not a deviation from the norm — it’s how a substantial portion of intellectual life has always been conducted. The pluralism the U.S. has to graft on, India already has.

This doesn’t mean India will get AI right. There are real challenges, including some that mirror the U.S. failures (linguistic hierarchy, urban-rural divide, caste and class access). But the starting position is different, and the people working on AI for India are working on a different question than the people working on AI for U.S. universities. The question is closer to: “how do we use this technology to honor the linguistic richness that already exists?” That’s a more interesting question than the one CSU has been asked.

The specific case of education

India’s National Education Policy 2020 explicitly endorses mother-tongue instruction in primary education and pushes the medium of instruction toward Indian languages at higher levels. Implementation has been uneven. Many private schools still teach primarily in English. Many parents prefer English-medium instruction because they read it as the path to economic opportunity. The policy direction and the lived reality are not yet aligned.

This is the space where AI could matter most. A student in Odia-medium school instruction who needs to read English scientific literature for a college course has, historically, had to either become fluent enough in English to do the reading directly, or accept that the literature is inaccessible. AI translation collapses that gap. The student can read in Odia, take notes in Odia, ask questions in Odia, and engage with the original literature without abandoning their mother tongue as the medium of thought.

This is the opposite of what’s happening to Priya in the previous article. Priya is being penalized because her English doesn’t look casual enough. The Odia student is being given access to global scholarship without having to abandon Odia. Same technology. Different relationship.

For this to actually work at scale, the AI tools have to be built for it. Translation has to be good enough that Odia scientific vocabulary doesn’t collapse into approximate Hindi when the model can’t find the right Odia term. The interface has to be designed for students who may not have grown up with English-language computing conventions. The training data has to include enough Odia academic and scientific text to be useful. None of this is automatic. All of it is technically possible.

The economic stakes

A significant part of India’s economic development story over the last thirty years has been built on English-language services — IT, business process outsourcing, content moderation, customer service. The people in these jobs are disproportionately from English-medium schooling. The people not in these jobs are disproportionately from regional-language backgrounds, regardless of their underlying capability.

AI changes both sides of this. On one side, many English-language service jobs are being directly automated, which compresses the economic premium of English fluency. On the other side, AI tools that work well in regional languages could open white-collar work to populations who were previously excluded by language alone. A capable young person in a Marathi-speaking small town who could not previously work as, say, a paralegal because the work required English fluency may, with sufficiently good AI translation and assistance, be able to do the work in Marathi while the AI handles the English interface.

Whether this potential is realized depends on whether the tools get built for it. The tools built for the U.S. enterprise market won’t do this work. The tools that would do this work have to be built specifically for the Indian linguistic context, by people who understand that context, with sufficient resources to compete with the well-funded English-default alternatives.

This is one of the more genuinely consequential questions about AI in the 2020s. It’s not getting the same attention as the questions about AI in U.S. universities. It probably matters more.

What I can’t see from here

This article needs to admit what it can and can’t speak to. I’m an American who has worked in tech, read the relevant research, and follow the Indian AI conversation from a distance. I have not lived in India. I have not been a parent trying to decide whether to send my child to English-medium or regional-language school. I have not been a student trying to navigate the gap between my home language and the language of my coursework. I have not been a builder of Indian-language AI systems facing the actual constraints of doing that work.

Several things I’d want to know that I don’t:

  • How are the existing Indian-language AI efforts actually being used, by whom, in what contexts?
  • What’s the gap between the technical capability of these tools and their actual adoption?
  • What are the failure modes of well-intentioned Indian-language AI projects — where have they fallen short, who has been excluded?
  • How is the caste-class-language nexus playing out in access to AI tools? The amplification potential I described above assumes equitable access, which may not be the actual condition.
  • How do families and communities feel about AI as a presence in their linguistic lives? Are there cultural concerns that the U.S.-default discussion doesn’t capture?
  • What’s the state of indigenous language preservation work supported by AI, and what are practitioners saying about its strengths and limitations?

These questions need to be answered by people who can answer them. This article is one outsider’s framing of the contrast between the CSU story and the India story. The actual story of AI and Indian languages will be told by Indian writers, builders, teachers, students, and communities. This article is meant to point at the contrast, not to occupy the conversation.

The instructive contrast

The closing thought, which is also the connection back to the rest of this curriculum.

The CSU rollout is one possible relationship between AI and language: the institutional default treats one language as standard, treats deviation from that standard as suspect, and ends up penalizing the students whose linguistic backgrounds make them most vulnerable to false suspicion. The technology amplifies an existing inequity.

The Indian-language AI work, in its best version, points toward a different relationship: the technology treats linguistic diversity as the condition to be served, builds tools that work in many languages with comparable quality, and ends up giving access to populations who were previously excluded by language alone. The technology amplifies what was already there but underutilized.

Same technology, in some sense. Profoundly different effects, because the implementations are guided by different questions. The U.S. universities are asking “how do we keep our existing system intact in the presence of AI.” The Indian-language AI efforts are asking “how do we use AI to do something our existing systems couldn’t.” The first question produces detector false-positives on Priya’s writing. The second question produces educational tools that work in Odia.

This is not a claim that India will get AI right and the U.S. will get it wrong. Both are large, contested, unfinished projects with real failure modes. The point is that the relationship between AI and language is a choice. There is no neutral default. The version of AI that gets built reflects the values and questions of the people building it. If the values are gatekeeping and the questions are about detection, the result is what CSU has. If the values are amplification and the questions are about access, the result could be something quite different.

The CSU students filling out their AI surveys, the adjuncts redesigning their courses without compensation, the non-native speakers managing the false-positive risk — they are all paying a cost for a version of AI implementation that didn’t have to be this way. India is, in real time, demonstrating that other versions are possible. The lesson is for the U.S., not the other way around.

What this article cannot solve

This article cannot tell you what to do about any of this if you’re in U.S. higher education. The contrast between the two situations is useful for understanding, but the local situation is what it is — your students are not in Mumbai, your institution is not the Indian Education Ministry, your context is the CSU context whether you like it or not.

This article cannot speak for India or Indians. It points at work being done by Indian researchers, builders, and institutions, but it does not represent that work or speak with the authority of people doing it.

This article cannot resolve whether AI will, in the end, amplify or erase linguistic diversity in any given context. That depends on choices that have not yet been made, by builders who have not yet built, in communities who have not yet adopted. The framing offered here is hopeful about what’s possible. It is not predictive about what will happen.

What this article can do is open a conversation that the CSU-centric framing of this curriculum has so far mostly closed: the question of what AI looks like when it’s not designed for the institutional contexts of U.S. higher education. The answer to that question is currently being built, mostly outside the institutional centers of AI development, mostly by people whose work is not getting the funding or attention of the OpenAI deals and the university partnerships. That work matters more than this article can convey. The least this article can do is point at it.


About this knowledge node: This is a cluster article in Tygart Media’s AI Literacy content sprint. It’s licensed for use in any classroom, training program, custom GPT, or Claude Project as long as attribution is maintained. The pillar article that introduces the sprint is here.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *