Reframing Quality in Video Game Localization

[00:00:00] Reframing Quality in Video game localization what 50 years of game History Teaches Us about Quality By Marina Aleri Ask 10 localization professionals what quality means and you will get 10 different answers. [00:00:14] Ask 10 gamers and you will get 10 more. This gap is not a flaw in our industry. [00:00:20] It is, I would argue, the most interesting problem we have. And it is one that is becoming impossible to ignore now that artificial intelligence AI can produce fluent translations at a scale and speed that would have been unthinkable a few years ago. Fluent, however, is not the same as good. [00:00:37] That distinction is at the heart of everything I will explore in this article. [00:00:41] A Brief and Humbling History Video game localization is younger than most people realize. [00:00:48] Japanese developers first started thinking seriously about it in the 1970s, not as a creative or cultural endeavor, but as a commercial one. [00:00:56] They wanted to reach the American market. The story of Pac man is a perfect capsule of that moment. The original Japanese Name, Puckman Figure 1 was changed for the US market specifically to prevent an obvious and unfortunate substitution. Even the Ghost got new names, Blinky, Pinky, Inky, and Clyde. Replacing the originals was translated roughly as Fickle Chaser, Ambusher, and Stupid. That is localization, not just translation, but cultural judgment in service of a product. [00:01:27] The 1980s gave us the first steps and the first spectacular failures. [00:01:31] Super Mario Bros. Figure 2 had its packaging and documentation translated into French, German, Spanish, Italian, and Dutch, though the in game text stayed in English. [00:01:43] That compromise coined the acronym efigs, which the industry still uses today. [00:01:48] But the era is perhaps best remembered for its gloriously bad translations. [00:01:53] All your Base Are Belong to Us from Zero Wing and A Winner Is yous from Pro Wrestling became enduring memes precisely because they were so wrong. There were no processes, no quality assurance, QA frameworks, no involvement of professional linguists. [00:02:11] Localization was an afterthought, and it showed. The 1990s changed everything in scale, if not always in quality. [00:02:18] Games began shipping with fully translated on screen text, localized user interfaces, UIs, and dubbed cinematics. [00:02:27] Baldur's Gate Figure 3 was among the first role playing games to be fully dubbed into other languages. [00:02:34] By the decade's end, gaming revenues had doubled and more than half of that growth was attributable to localization. [00:02:40] The business case was no longer theoretical. Then came the 2000s and what I think of as the industrialization of localization. [00:02:47] Sim ship, the simultaneous release of a game in multiple languages, became standard practice, which meant parallel workflows, compressed timelines, and enormous pressure on quality at scale. [00:02:59] This era gave us specialized vendors, trained native linguists, clearer professional roles, and the beginnings of structured workflows. Today, localization is embedded in the development process itself. [00:03:11] Many studios have entire in house teams dedicated to it. The tools are more sophisticated than ever. [00:03:17] And yet tools alone do not define quality. What quality actually means in Game Localization When I ask localization professionals what quality means to them, I always get thoughtful answers. [00:03:30] Linguistic accuracy, terminology consistency, grammar and fluency, style guide adherence. These are all real and important, but they are also incomplete. Here is the question I would push further. [00:03:45] Why does any of that matter? It matters for the game. It matters for the players. [00:03:50] It matters for the business behind it. Players do not evaluate localization, they experience it. When quality is high, the player forgets entirely that they are playing a localized game. When it is not, immersion breaks instantly. A poorly translated piece of dialogue can pull the player out of the world the developers spent years building, and a truncated UI string can make the game unplayable. The business impact is equally concrete. Good localization improves reviews and ratings, increases player retention, strengthens brand trust, reduces support costs, and expands market reach. Poor localization does the opposite. A 2025 study published in the L10N journal surveyed 209 Persian speaking players about localization quality. Nearly 60% opposed modifications to female dance scenes in games and over 40% opposed changes to violent the finding that stayed with me was Localization quality is judged by how well it aligns with player expectations, not just linguistic accuracy. Players are not passive recipients of our work. They have clear expectations and they notice when those expectations are not met. Defining quality for your context the DEM framework if quality is contextual, then we need a process for defining it on a project by project basis. [00:05:08] At the AI Localization Think Tank conference, Martin Nieto Kywela, Senior Localization Quality Manager at Reply, shared a simple framework she calls DEM Define, Evaluate, measure. I have borrowed the framework here because it captures the idea in a way that is both simple and powerful. Define means asking what good looks like for this specific content with a specific audience at this level of risk? [00:05:33] Not in theory. In practice, evaluate means choosing the right methodology to assess whether you have achieved it. Measure means creating benchmarks and tracking whether you are actually meeting expectations over time. The framework sounds obvious until you notice how many organizations skip the first step. They go straight to evaluation, running linguistic quality assessment checks, producing multidimensional quality metric scores without ever having agreed on what they were trying to achieve. [00:06:03] The result is data that does not drive decisions because no one defined what the target was. In video game localization, DEM applied to different content types looks quite different for narrative content, cinematics, story driven games. Quality means emotional impact, dialogue must feel natural and immersive, character voices must be consistent and cultural references must resonate with the target audience. You evaluate through in country review, full scene context checks, never isolated segments and MQM weighted towards style and fluency. You measure through player feedback and reviews that mention story or immersion for live ops and UI strings. Quality means clarity and functionality players need to understand in game text instantly and everything must work correctly on the interface. You evaluate through LQA focused on functional errors, automated checks for placeholder handling and string length and spot checks in UI context. [00:07:00] You measure through bug counts, error rates per thousand words and time to fix for marketing content trailers, social media launch campaigns, quality means engagement, brand voice must be consistent, cultural relevance must land and emotional impact must survive the translation you evaluate through transcreation, review and a B testing where possible, you measure through click through rate conversion and sentiment. Not all content is equal. [00:07:29] Not all quality decisions should be made the same way. The Three Pillars of a Modern Quality Framework Once you have defined what quality means for your context, you need a framework robust enough to evaluate it. In practice, this means three complementary pillars working together. LQA is the foundation for human translation. [00:07:49] This is where structured error categorization, vendor scorecards, and quarterly business reviews come in. The goal is to actively manage quality over time by identifying patterns, providing actionable feedback to vendors, and building a proactive quality culture before content reaches the testing phase. A perfect translation, it is worth saying, can still be a bad player experience. [00:08:12] LQA helps us understand what matters most, not just what is wrong. AI quality estimation is increasingly essential for machine translation and post edited outputs. Automated metrics can identify performance patterns at scale, flag underperforming segments, and help determine which content requires human review and which can flow through automatically. The critical caveat is that these metrics are directional, not definitive. They tell you something is off, not what exactly or why, and they are only as reliable as your underlying data. [00:08:44] Without clean terminology, well maintained translation memories and consistent training sets, your automated metrics will mislead you. International user experience is the pillar most often missing from quality programs and arguably the most important. [00:08:58] This means collaborating with user research and community teams to understand how players actually experience localized content in real world conditions, adding language and culture questions to player surveys, correlating bug spikes with community feedback, and running focus groups with LQA testers and player communities. [00:09:15] The goal is to connect localization quality to player behavior, monthly active users, revenue per region, retention per locale. This is the data that speaks the language of the business. [00:09:27] Quality in the Age of AI the mindset Shift Here is the challenge that changes everything. AI can now produce content that scores well on every traditional quality metric and still fails the player. [00:09:40] Fluent content can be wrong. Grammatically correct output can carry inconsistencies and hallucinations. Linguistically accurate translations can be culturally tone deaf. And the errors that AI introduces, such as subtle inaccuracies, out of character dialogue and IP inconsistencies, are often harder to catch than the errors a tired human translator might make precisely because they do not look like errors. This forces a fundamental shift in how we think about quality. [00:10:08] For a long time we equated quality with linguistic correctness. In the AI era, that equation is broken. A new one is needed. Quality equals content performance. [00:10:19] Does it engage the player? [00:10:21] Does it drive retention? Does it resonate? In this market, the implications extend to content strategy. Not every piece of content needs the same level of human oversight. High impact customer facing content such as narrative dialogue, UI strings and marketing copy benefits enormously from human expertise, cultural judgment and in context review. [00:10:43] Large scale lower risk content can be handled efficiently with AI assisted workflows. The conversation should not be human. Translation is too expensive. [00:10:53] So let's use AI. [00:10:55] It should be how do we combine AI efficiency with human expertise to preserve quality where it matters most? [00:11:01] Hyper personalization adds another layer of complexity. Fortnite's AI driven Darth Vader experience Figure 4 is a compelling example. A character that adapted dynamically to each player's behavior, creating something genuinely personalized. As an avid Fortnite player myself, I can attest to how fun this interaction was. But that experience was available only in English. [00:11:24] International players received no subtitles, no localized version 1 quality standard that did not fit all players Speaking the language of the business I want to close with something that I think matters as much as any framework or methodology. [00:11:39] The way we talk about quality to the people who fund our work. Most business leaders do not think much about localization. [00:11:46] They think about revenue, customer acquisition, market share, and risk. If we want localization quality to be taken seriously, we need to stop talking in localization jargon and start talking in the language of business outcomes. Miguel Seploveda, industry leader and author of the blog Yo Localizo, offers a reframe that every localization professional should have in their back pocket. Stop talking about translation accuracy and start talking about business risk. Replace language tone with customer trust and cost center with growth driver. The formula is straightforward. [00:12:20] Connect your operational metric to its effect. Map that effect to a business key performance indicator and be explicit about the decision it should support. If LQA bugs decrease, quality improves and users experience less friction, which in turn leads to higher conversion and retention. Thus, we should invest more in high impact content quality. [00:12:39] That chain of reasoning is available to every localization team, and yet too few are applying it. [00:12:45] Quality is about impact on the player who forgets they are playing a localized game, on the business that grows because international audiences feel genuinely seen, and on the industry that proves decade after decade that taking a game global is as much a creative act as making it. This article was written by Marina Aleri. She is an ATA Certified Translator and CEO of Terra Localizations. She has two decades of expertise in the translation industry with a focus on video game localization. [00:13:14] She also serves as an adjunct professor at New York University, where she teaches audiovisual translation. Originally published in multilingual magazine ADO 252.

Show Notes

Episode Transcript

Other Episodes

Episode 267

The Magic of Machine Translation and the Future of Translation | November 2022

Episode

Sovereignty Is Not a Product Feature

Episode

Women’s Writing