AI and Indigenous Language Access. Magic cure or arrogance?

Episode 338 October 07, 2025 00:13:17
AI and Indigenous Language Access. Magic cure or arrogance?
Localization Today
AI and Indigenous Language Access. Magic cure or arrogance?

Oct 07 2025 | 00:13:17

/

Hosted By

Eddie Arrieta

Show Notes

By Jace Norton

The author discusses the role of AI in Indigenous language access, arguing that AI has little practical value for Indigenous language interpretation largely because Indigenous and other low-resource languages lack the massive datasets needed for AI to function effectively for interpretation and translation.

View Full Transcript

Episode Transcript

[00:00:00] AI and Indigenous Language Magic Cure or Arrogance? By Jay Snorton the main topic of conversation at nearly every language industry event these days is artificial intelligence AI. Understandably so. It is a hot topic, especially for those working in translation who are seeing fewer work opportunities as clients increasingly turn to AI for their localization needs. Strangely, the topic of AI, the uncertainty and excitement, the buzz and bluster are all relatively irrelevant to me. As an indigenous language interpreter and owner of a company focused primarily on indigenous language access, AI has almost no bearing whatsoever on the amount of work that I and my company perform, and it currently has little to no immediate practical value to add to my company's ability to perform most of the work we do. Unlike many language companies, Maya Bridge is not particularly active in trying to implement AI into our workflows. [00:01:00] Part of the difference is that in an industry predominantly driven by written translation and localization, we focus primarily on interpretation, which is under somewhat less threat than translation. Another reason is that AI has almost no presence and no practical use cases for most of the languages with which we work. Before I continue, let me first disclose a couple of things. First, I fully admit that I am an AI skeptic and cynic. Not once did I use any AI program to help me write this article. Frankly, I tend to turn my nose up at AI generated content as I can typically detect it. I almost never use AI in my personal or professional life, and I believe that in many ways, if not used with caution, AI has the ability to diminish actual intelligence. Call me old fashioned, but I'm much more of a believer in organic intelligence. Second, let me be clear in stating that in this article I am discussing language access specifically, that is Providing interpretation and translation services for critical or essential needs, for instance in hospitals and courts. I am not implying that AI has no practical use in language preservation, which can be defined as efforts to digitize, analyze, and otherwise preserve languages. That being said, let's begin. [00:02:22] Recently I attended a conference where an individual debating the practical uses of AI in interpretation and translation argued that someday soon AI will solve indigenous language access issues. [00:02:36] He argued that rural hospitals encountering speakers of lower diffusion languages would be able to utilize AI based interpretation models to handle the needs of those demographics. Many people seem to be under the impression that AI will be a kind of magic cure for all issues relating to indigenous and lower diffusion language access. The truth is that it will not. This is because of a variety of factors, not least of which is that the use of AI for language access in the United States without human intervention is not legal for federally funded institutions and organizations. But let's pretend for the sake of argument that the law did not prohibit use of AI interpretation and translation for language access, and that we were allowed to use these tools in hospitals and courtrooms. In this scenario, even if AI is developed enough to do a decent job interpreting and translating for other, more mainstream languages, indigenous languages will almost certainly be left off an AI interpreter's list of languages. [00:03:36] The main issue preventing AI from acting as an interpreter for indigenous languages is that indigenous and other low resource languages lack the massive amount of data that AI needs to function in a way that could practically be utilized for interpretation and translation, and very few entities are putting in the human based work necessary to synthetically create such data. The truth is, there is a major difference between the demand for language access and commercial language demand. [00:04:05] Commercial language demand is what drives companies to market their products, materials and services in new areas, localize movies and entertainment, and offer multilingual customer support. [00:04:17] Commercial language demand indicates where more profits could be made if other linguistic markets were targeted. Commercial language demand pays for itself and is for non essential services and goods. Think of companies that sell products, provide entertainment Etsy Language access refers to ensuring individuals have equitable access to essential services, such as in courts, hospitals, and schools, or in other federally funded institutions and programs through meaningful language services. [00:04:47] Although language access is protected by law, its growth is not necessarily incentivized by commercial interests. [00:04:55] In some limited cases, typically in healthcare, this is not entirely true. A hospital that provides meaningful language access for its demographics, for example, will presumably attract more patients to its facilities, but certainly not to the same extent as commercial sector entities. [00:05:13] And realistically, commercial entities are not incentivized to go after the indigenous language markets because they are, from a profitability perspective, insignificant. Case in point, you can watch most of your favorite Netflix shows in Spanish, but not in Keke or Chuquis. In addition, local governments are typically not all that interested in translating documents, producing news, and so on into indigenous languages, in large part because of a lack of indigenous representation in those governments. Many people, especially those in decision making positions, take the mistaken viewpoint that individuals who speak indigenous languages are fluent in more dominant languages or or if they aren't, that they eventually will be. [00:05:58] Unfortunately, they aren't completely wrong. Each generation, fewer and fewer individuals are passing on their indigenous languages. In any case, there is a demand issue that correlates with the lack of data and the lack of drive to generate such data. Thus, you have an almost complete void of data when it comes to most indigenous languages, especially those with smaller communities of speakers. [00:06:22] Because relatively few people are generating any kind of content, written or oral, that an AI model could access, Essentially you need an almost entirely altruistic approach to really be incentivized to work on producing the kind of data needed to fuel an AI model. [00:06:39] Typically, the only entities very active in translating and publishing materials into many indigenous languages, such as Keki, are religious organizations like the Church of Jesus Christ of Latter Day Saints lds. [00:06:53] While significant, even these efforts are a drop in the bucket compared with the amount of data needed to create language models for AI that would be useful for language access. [00:07:03] Although emerging small language models are creating working models for translation in lower diffusion languages, these models are extremely limited in function. Take for example, Google's models for some indigenous languages. While it is notable that Google has been able to create arguably decent SLMs for its Google Translate function in languages like Keki and Quechua, almost certainly from scraping public data from websites like the ldss, the translations are extremely literal. In languages that are particularly idiomatic and metaphoric, the data that would be needed to refine and improve the model simply isn't there. [00:07:43] And without massive undertakings by humans, that data refinement won't magically happen on its own. And aside from linguist nerds like myself wanting to look up how to say different things in a language like Quechua, there is essentially no practical use for those models in language access. [00:08:00] It seems that many people have this mental image of courts or hospitals using something like Google Translate to handle the immediate needs of a patient or a respondent. Where individuals pass a phone back and forth, typing sentences one at a time, this mindset betrays a type of linguistic ignorance of which we speakers of dominant language are often guilty. Sociopolitical issues that almost universally plague minority indigenous communities, often because of colonial entities. Intentional efforts both historical and ongoing, such as extreme poverty, lack of educational resources, political and economic marginalization, discrimination and exploitation, result in indigenous communities typically having extremely low literacy rates in their native languages. [00:08:49] So even if we did get a perfectly working AI model that could translate written materials with 100% accuracy, that would also be able to convey meaning based on cultural context, which with 100% accuracy, which is currently not happening even for dominant languages like Spanish, without both speech to text and text to speech functionality, it would still be almost useless for the people who would need it. Indigenous people who need access to language services because of limited proficiency in English or in another more dominant language like Spanish or French, will almost universally not be able to read or write in their native language. Incidentally this same kind of linguistic ignorance occurs frequently with organizations that send out requests for written translation of documents into indigenous languages, and language companies that then outsource those requests to companies like Maya Bridge, blissfully unaware of how useless that written translation on its own would be. [00:09:50] In reality, for most document translation requests we get, producing a written translation into an indigenous language will very nearly yield exactly the same result as just leaving the source material in English or Spanish for the populations who need the information. [00:10:06] For that reason, at Maya Bridge, we don't offer translation solutions that don't also include an audio accompaniment, because we know that for the target audience to truly access it, they will need to hear it. You may be asking, but couldn't we just develop an AI model that could produce text to speech and vice versa in indigenous languages? [00:10:27] Sure, it's definitely doable if someone is willing to pay for it. If there is a competent and trained organization to lead the effort, if this organization can get enough data, and if it can, then refine and improve upon the models. That's a lot of ifs. While SLMs for written translation theoretically can be relatively easily created with source text like the Bible, which is by far the most translated book in the world, the data essential for these models to have any real value to indigenous communities simply does not exist. [00:11:00] A model that could produce text to speech and speech to text in indigenous languages would require a massive amount of data that doesn't currently exist. And no one is all that interested in creating the synthetic data that would be needed, because again, there is no demand. [00:11:16] I'm no data scientist, but if that capability still doesn't even really exist for incredibly prominent languages like Spanish, which have infinitely more data compared with indigenous languages, then I'm skeptical that speech to text and text to speech models for indigenous languages will be coming anytime soon. Again, the reality is that AI multilingualism is mainly driven by commercial opportunity. In other words, money. And if there's no money, there will be much more limited organic growth of AI into indigenous languages. [00:11:50] This may differ for larger indigenous groups like those in Africa, who have more robust numbers, but for many indigenous languages, unless some billionaire lingo philanthropist emerges and altruistically invests in indigenous language development for AI models, it's not only unlikely, in many cases, it would be nearly impossible for those models to emerge. And we haven't even mentioned the fact that oftentimes indigenous languages have anywhere from two to 50 regional variants that are mutually unintelligible. In short, as it currently stands, we are a long, long way off from AI having any direct impact on indigenous and low resource language access. AI may, however, certainly be utilized as a support to augment human based efforts working on language conservation, teaching or other efforts. But if you are thinking that AI will make it so that you don't have to find a human interpreter for an indigenous language, you will be waiting a long, long, long time. [00:12:51] This article was written by Jace Norton, is a Keki Keki interpreter, polyglot and the CEO Slash founder of Maya Bridge Language Services, a unique mission driven agency focused on increasing language access for lower diffusion and non dominant languages like the Mayan languages of Guatemala. Originally published in multilingual magazine issue 244 September 2025.

Other Episodes

Episode 284

December 10, 2022 00:03:51
Episode Cover

What do the words “permacrisis,” “gaslighting,” and “goblin mode” all have in common?

As 2022 comes to a close, English-language dictionary publishers are selecting the words that best encapsulate the character and ambiance of the last 365...

Listen

Episode 203

September 06, 2022 00:03:44
Episode Cover

AI is learning how to do your linguistics homework

“Children easily acquire language from quantities of data that are modest by the standards of modern artificial intelligence (AI),” reads a paper published in...

Listen

Episode 305

January 03, 2023 00:05:20
Episode Cover

The missed chance with Catalan

Catalan has often been excluded from some social media platforms on the assumption that Catalan-speaking users would feel comfortable using the Spanish version. Recently,...

Listen