Towards Meaningful Localization Quality Metrics

[00:00:00] Towards Meaningful Localization Quality Metrics how to Move Beyond Error Rates and Focus on Value By CODY CONNELL At 7:45 in the morning, my phone rang. It was an emergency Slack Huddle invitation. [00:00:16] This is just fantastic, I thought. Not exactly how I wanted to start my day. A French field marketing lead was on the other end of the call, and she was furious. [00:00:27] Our French social media campaign for asna, a project management software company, had gone live and the ads were incoherent. We need to take them down immediately, she said with an urgency I could feel through my iPhone from the other side of the planet. I smiled with my face and also through my voice, then promised I'd fix it and sprinted into my next call, all while clutching an espresso with a white knuckled grip. [00:00:54] The name of this video call was almost mocking Monthly Localization Quality Report Something inside me tried to revolt as I hit start meeting. By the time my vendor's quality manager began sharing slides, whatever had tried to revolt had already left my body entirely, along with my soul. The espresso wasn't helping as much as I wanted it to. Charts appeared EPT errors per thousand OTD on time, delivery, and linguist engagement. All perfectly formatted, all perfectly useless for answering the only question that mattered to Asana that morning. [00:01:32] Why were our French ads nonsense? The cherry on top French was on target in every metric. [00:01:39] We were told that most clients dream of getting EPT to such a level so early in the partnership. [00:01:45] I was all but thrilled. Whether it was anxiety, the urge to people please, or the caffeine, something clicked. A quiet, angry epiphany. [00:01:56] These metrics didn't explain anything about real world impact. We were managing quality for our vendors comfort, not our stakeholders experience. [00:02:05] When the spreadsheet cracked, I spent the rest of that day replaying the irony. [00:02:11] Our localization team was supposed to be the voice of nuance, yet our reports had none. Our French field marketing lead was angry because she cared. She wanted content that sounded bespoke, not repurposed. And while bespoke wasn't scalable, her frustration was valuable data if I could learn how to measure it. So I opened figjam, a digital whiteboard that connects with Asana software. [00:02:38] In true Asana fashion, I decorated the board with sparkling unicorns and heart clutching otters, then began sketching boxes and arrows. With trepidation, I started to build a scientific framework to dismantle complaints, one that would prove what actually mattered in localization. Quality. From doodles to data, the first problem was figuring out what to quantify grammar wasn't the villain. Indifference was. We were constantly taking down French campaigns, each one eroding trust with long standing allies. [00:03:13] Our new French field marketing lead was the first tier. One stakeholder who truly cared about localization quality. [00:03:21] Her complaints, though exasperating, pointed to something deeper. A gap between linguistic accuracy and cultural credibility. [00:03:30] I started mapping the things she valued persuasion, resonance and tone against what the business tracked engagement, conversions and brand metrics. The two columns barely overlapped. That disconnect became the outline for what I would later call the Language Health Score. Building the Language Health Score, I broke quality into three measurable pillars, each representing a way language could succeed or fail in the real world. [00:03:59] Linguistic 50% this element was still vital, but not everything. [00:04:06] We tracked EPT while adding a coaching layer. Reviewers were scored on whether their feedback actually made the linguists feel like they were able to improve. Instead of merely documenting and tallying mistakes, we were rating the quality of the lead linguists leadership. It seems obvious saying that out loud, doesn't it? [00:04:26] User experience 25% I added this because a flawless translation faithful to the source in every way can still make a terrible button label. We began surveying real users on navigation ease, cultural fit of imagery and clarity of calls to action. [00:04:44] Does this make sense? Turned out to be a stronger metric then Is this grammatically correct? [00:04:51] Stakeholder confidence 25% this is the most unpredictable pillar. We converted vague comments like it feels off into a 110 rating across three dimensions. [00:05:04] Contextual understanding implied meaning loss and cultural nuance. [00:05:09] Suddenly feedback became data instead of drama proof through Tokyo and Paris A few months later, we premiered a value proposition video at the Work Innovation Summit in Tokyo. The Japanese version used custom B roll Tokyo skylines instead of New York, and Asian actors instead of the diverse American cast. I didn't get an invitation, but I imagine the audience stood and drowned out the world with thunderous applause. I say this because engagement soared quarter over quarter, metrics jumped, and stakeholder surveys reflected high satisfaction. France, by contrast, had received the same generic United States footage. No Eiffel Tower, no local color, and engagement was abysmal. There wasn't even a trend line to plot. Apparently during rehearsal, the presenter translated the phrase are you excited? [00:06:06] Literally into French, blurting it out on stage. [00:06:10] So much for fostering cross functional trust. When I compared the two markets side by side, the through line was obvious. Higher stakeholder confidence predicted better business performance. [00:06:22] The Language Health Score wasn't just a pretty diagram. It mirrored reality. That insight transformed localization from a service desk into a strategic leverage. For the first time. I could walk into a meeting and say, quality scores correlate with revenue, and no one would blink. Ok, maybe a few blinks, but fewer than before. [00:06:46] Making the invisible visible I formalized the framework as both a shared document and an Asana project because, of course, it had to live where collaboration already happened. Each quarter, I presented findings to the head of marketing operations and to our vendor's quality team. [00:07:05] first, the vendor was wary. They were used to pass fail audits, not philosophical metrics about meaning. I remember pitching the new program. [00:07:15] When I asked whether there were any questions, the vendor's quality manager looked at me the way you'd look at a dentist approaching with a chainsaw. But as the data stabilized and complaints faded, skepticism softened. The French team stopped requesting emergency takedowns. The Japanese market kept outperforming. The conversation shifted from who made this error? To how do we replicate success? [00:07:41] That was the moment localization earned a seat at the grown ups table. [00:07:46] Lessons from the Unicorn Board in hindsight, the Language Health Score wasn't revolutionary. [00:07:54] It was common sense. With good branding, it said what everyone already knew but couldn't prove. Linguistic quality alone doesn't create trust. [00:08:04] We learned that counting typos is easy connecting words to outcomes is hard. Vendors respond better to partnership than punishment. Executives don't care about your metrics until your metrics care about their goals. [00:08:18] Most importantly, the framework gave localization a credible voice in company strategy. Instead of showing up with slides no one read, we started showing up with insight that shaped go to market plans. [00:08:32] Looking forward When I eventually started managing localization for Anthropic, I brought the same philosophy that quality isn't about policing language, but rather understanding what language makes people do. [00:08:47] The Language Health Score taught me that localization is invisible until it fails, and that my job is to make its impact visible through metrics that measure meaning, not minutiae. I still think about that morning call from France and the frustration that the ads didn't make sense for the client. [00:09:05] That's what quality really is, the moment when language stops being just technically correct and starts making people feel what we want them to feel. [00:09:15] This article was written by Cody Connell. He is a Senior Technical Program Manager, Localization at Anthropic. He specializes in driving international growth, crafting high impact localization strategies, and ensuring that brands resonate across markets. Originally published in Multilingual Magazine, Issue 250, March 2026.

Show Notes

Episode Transcript

Other Episodes

Episode 200

Localization a key part of gaming industry’s recent, future growth

Episode 53

Microsoft introduces updates to Microsoft Translator

Episode 349

Alice Mazzilli and the Calligraphy of the Wall