Guarding Quality at Scale

August 11, 2025 00:47:42
Guarding Quality at Scale
Localization Today
Guarding Quality at Scale

Aug 11 2025 | 00:47:42

/

Hosted By

Eddie Arrieta

Show Notes

Amir Kamran and Amir Soleimani from Taus unpack Quality Estimation (QE) as the missing link between machine translation at scale and human-quality outcomes. They frame QE as an automated “second opinion” that flags what can ship, what needs a quick AI touch-up, and what should go to a linguist—shifting human effort to the tricky, high-risk bits.

In this conversation, we explore where this matters most (high-volume content, real-time chat, regulated use cases), how QE reshapes workflows after MT, and why language access still demands human oversight in sensitive domains. They close with what’s next: lighter, task-focused models, more agent-style automation, and a broader definition of quality that favors fitness-for-purpose over perfection.

View Full Transcript

Episode Transcript

[00:00:03] Speaker A: Hello and welcome to Localization Today where we explore the cutting edge technologies shaping the translation and localization industry. I'm Eddie Arrieta and today we're diving into the secret behind quality estimation. The models, the engineering and the innovation that power automated assessments of translation quality. Joining us are two experts from Taos. Amir Soleimani, senior NLP engineer, and Amir Cambran, solutions architect. Welcome and thank you for being here. [00:00:38] Speaker B: Thank you. Thank you for inviting us. Nice to meet you. [00:00:41] Speaker C: Thank you for having us. [00:00:42] Speaker A: And I have to say that this is a very interesting topic. It's the first time I'm actually discussing with it with experts other than conversations at conferences and side conversations. So I think it's very interesting for our audience are multilingual because it dives into a very particular element where humans are protective, where we believe we are still needed. And we want to believe it, I think it's true, but we also want to deeply believe it. We deeply believe that we are the guardians of quality, that we are the guardians of the essence of what language should be, of what knowledge should be, and that it comes from us. Right. So I think this touches upon that and I'm very excited to get to talk to you on a more technical perspective to see what are the challenges there, how close can we get understanding? From my perspective, of course, that technology is a very human thing. So I see it as part of our legacy, just like I see art and music and other things. But with that all said, I would love for you both to help us frame this conversation and let us know a little bit about your definition of quality estimation. So how would you define QE or quality estimation? And why should every translation professional care about it? Once again, Amir and Amir, welcome and we'd love to hear your opinions on this. [00:02:08] Speaker B: Sure. Maybe I can start. So, yeah, basically quality itself is a very subjective thing. So different companies who are in localization and also who are the consumer of translations, for them, the meaning of quality is very different. So what kind of things they allow, what kind of things they don't allow, like errors. So in the current workflow there is always humans are involved who are, as you rightly said, safeguarding what quality is and trying to produce the right translation. But you know, things are changing in with the generation of social media, the content is growing. There is now so much content that it's not possible that humans alone can do that work. So in that context, when we say quality estimation, we are talking about models, AI models that can actually take the source side of the content. The output that is coming out of machine translation. And then based on the two, it can judge whether this translation that is coming out of a machine is good or not. So it can provide you different signals that can tell you whether there are mistakes in the translation, whether it's ready to go, ready to ship, and give you some clues about the quality. So when we say quality estimation, we are actually talking about a AI model that can do it on, on your behalf. So not replacing the humans, but maybe helping the humans do the work more efficiently and ease the pressure on humans to actually do all the work. [00:03:45] Speaker A: We will edit out the long pause. I was waiting for Amir to maybe chip in if he had a version of the definition of Q quality estimation. But we'll leave it at that. As these systems are evolving, is there going to be. What does the future of it look like is in terms of human involvement? Just to get that kind of out of the question, how will humans continue being involved in these systems? [00:04:13] Speaker B: I would say in the current translation process. So MT is used, but not to the fullest capacity because there are some reservations of using mt. So the majority of the content is still verified by humans even when you use machine translation. Basically every sentence that is coming out of mt, a human is reviewing that because of this limitation, the localization is a time consuming task because of course humans are involved. So it's slow. But also it's the fact that only fewer amount of volume can actually be go through this, this pipeline. So there is still a lot of content that is untranslated, a lot of, you know, Internet content, there's a lot of documentation, you know, manuals, there is a lot of content that is untranslated just because of the reason that humans are involved. And somebody has to verify what is coming out of mt. So with a technology like this, where machine can actually judge this, you are actually enabling humans to do more work. So rather than doing the tedious work of whether this is right or wrong and where to focus, a machine generated model, machine trained model, can actually give you a clue whether to focus on this sentence or not. And then that sentence can go to the production right away. And whenever the machine will say, okay, there is a fault, looks like there is some fault here, only then humans will get involved. So it's actually helping humans. And we are not saying that this will replace humans, it will just make sure that the energy of the human workers are actually going into the right place. So the linguists, the people who are doing post editing, the are focused on the Right work, rather than just sitting on a computer and saying this is good, this is bad. They will just focus on the errors and then they will do the more challenging linguistically oriented work. [00:06:06] Speaker C: Yes, I can also add to that that looking into future, I can imagine human can also focus more and more on edge cases and specific domains that machine translation are not good at it right now or maybe in the future. And also the sensitive domains, for example medical or finance, that you want to be 100% sure that everything is okay, then for sure you need human. But if you make them busy with all the content, it is not actually possible. And QE actually helps that you filter some good quality translation and send the rest to human to expert for reviewing. [00:06:47] Speaker A: And that's really great. It sounds like a future where there will be more enjoyment and less of those. I don't think there is any farmer who is like, oh, I wish I was like plowing the by hand. I really wish there were no tractors. And I think for all of us, a lot of like things that are very repetitive, very boring, usually the tedious elements of it separate us from the fulfillment in life. And it's very interesting how this is done through, through engineering. Could we, like we call it internally, we look under the hood. Can you walk us through the QE systems that tells you when a translation is good, like what's the data, what's the architecture, what are the signals that make all of this possible, which is probably very, very human at its heart. [00:07:42] Speaker C: So basically we could have two types of quality estimation model. So I go with the simplest one. The easiest one is classification. You can assume that you have a model that say the quality of the translation is good or bad. And as input we could have several inputs. But the most practical case in our scenario is to have the source and the machine translation sentence or phrase. And then the model say it is true or it is false, but it is just a classification. But what we offer at TAOS is a regression model. So instead of just saying true or false, the model gives a score between 0 to 1 and the higher the score, the better the quality. So you can decide or the user can decide if, for example, passing by threshold of 0.9, we can assume the quality is perfect or very good. And this is the way that you can use and of course for making that it is more complicated. Definitely we need some labeling and also some inputs that we can train our model and also a benchmark that we can see how good our model is. [00:08:56] Speaker B: Yeah, if I, if I can add maybe some Little bit more mathematical. So it's a very interesting field. And the knowledge that is in language, which is like words and how the words connect with each other, when we are training a model, even when we are doing machine translation, or basically the basis of large language models are also the same that how you can represent words and the connection of words in mathematical notation. So you can actually create vectors. And those vectors based on the words and based on the sentences and based on the text can be represented. Their meanings can be represented with sequence of numbers. So this is a very unique idea. And if you can do that in a way that you can really express the meaning in these vectors, in these numbers, and you can provide enough data that all the different connections of different words can be, you know, represent by these numbers, then you can go in different languages and you can, you can represent the same ideas with the same kind of vectors, same kind of numbers. And if that is possible, then basically you can say that okay, if two things are meaning the same, that means you can connect those two things in two different languages. And that's the whole idea of machine translation, that's the whole idea of quality estimation and the basis of what LLMs can, can do now. So this is very, very exciting and interesting that you can convert a linguistic, linguistical problem into a mathematical problem and then apply simple mathematics on that. And that could give you interesting observations or signals that you can then use back to judge the quality. One interesting part is, or maybe you can also thinking about that, that if the models of machine translation and quality estimation are the same, then why machine translation cannot judge whether it's producing a good output or not. And the answer to that is basically the data. So when you are trying to train mt model, you are just training a model how to take one language side and how to produce the other side. So it's a generation model that can write stuff in a different language. So you are actually providing the data that is based on translations and then you are teaching the model how to translate. But when you are doing quality estimation, you are providing more information. You are providing, are actually providing that this is a good translation, this is a bad translation, and these are the labels of these. And then the model is not learning how to translate, but learning when you give a translation, how to judge that translation, whether it's the same meaning represented by the source and the target. So that's how it's different. And basically it's based on what objective you are trying to train the model for and what kind of data you're providing. [00:11:50] Speaker A: Thank you for the context. I'm sure the listeners and some of the viewers will have to go back a few times on the recording just to kind of like ingrain it into our system. Right. These are the type of things that come very natural. It's like when you've seen a car, right? Like, you know, I could probably describe the elements of the car. This is probably the type of things that humans are not yet used to describing. So when it comes to this type of models, it's very interesting that we are getting more used to that. And I think that could be a very interesting exercise for high school, even for primary school probably. But as we start understanding this, a lot of us had not heard about artificial intelligence or machine translation or any of this before ChatGPT. And then as ChatGPT came, then it became popular. But many of you have been working on this for so many years and that's what equips you really well to kind of like get onto the wave. I think the industry is in a really great position because it's right in the middle of it. It's not very common, I've said in other interviews, it's not very common for a sector to be at the center of the, of every revolution in the past. You know, if you were making, if you were in industrial work and you know, the industrial revolution was great because you were in the center of it, but other industries were not, other sectors were not. So I think, I think this is like a once in a probably century, I don't know, maybe two century. I will have to check history. But things are changing. This whole awareness of artificial intelligence is changing the conversations. Both QE and MT rely on neural models. Just referring to this history that we were just talking about. How can a QE engine reliably detect failures in an empty system that's also AI driven? Just in this context of the new, of the new era, I guess, I. [00:13:43] Speaker C: Guess Amir is already, already explained that a bit, but I can elaborate more on that. So for machine translation model, we can call it a generative model because it generates something for you. You give it, for example, an English sentence, my name is Amir, and then you expect it to form for the model to give you this sentence, for example, in German, in German or Dutch or any other languages. So it's generate another language. And how we train this model, actually we give it a lot of samples, thousands or millions of samples, and the samples are in pairs. So we give the model English pair, English sentence and its corresponding Dutch sentence and we lots of other things like this. And the model actually through the training process learns how to translate and generate Dutch translation of your English input. And this is the way how machine translation work. But quality estimation is actually somehow different. First, it is not a generative model. It doesn't generate you something, it just gives you a score. So it is much simpler compared to translation model. As Amir also mentioned, the translation model is trained only on good samples. And the quality estimation needs to detect the failure of the machine translation output. So we train the quality estimation model with bad samples as well. So it already knows what is bad and what is good. And it's also good to mention that different machine translation models can generate a different batch translation. So the style of batch translation is quite different. So for example, one can make grammatical error, the other one is about the fluency and the quality estimation. A good quality estimation model is actually a model that can catch all types of errors, all types of mistakes, both fluency and all other categories. [00:15:53] Speaker B: Yeah, I think the interesting part here is that the main algorithms of all these models are almost the same. So you are just, you know, you must have heard the term transformers. So these are based on transformer models where you are actually trying to represent context and the data, the text in some form in the, as I said, mathematical notations. And then based on that, you are trying to generate something. For example, in MT case you are trying to generate a translation. In large language model case, you are just trying to generate maybe a chatbot where you are actually giving a question and it's giving you an answer. So in the case of qe, it's a simpler kind of output where you just want to categorize something or you want a regression model where you want a 0 to 1 score, but the actual difference comes in the data engineering part. So in all different kinds of task, where you are trying to teach model something, you need to see what kind of data you have to provide the model to learn from. And I think it's the same for humans as well. So when humans are trying to learn something, they, they need that kind of data to actually train, train themselves. So just like a post editor, when they are looking at translations, they, they need to know exactly what kind of terminology there, they need to check what kind of domain they're working in. They need to understand that this is the meaning of this word in this context. And once they know all of these, they can easily detect what is wrong in the translation and how to correct that. So on the same principle, if you can provide enough data to a machine to see these kind of signals that, okay, when this word appears in the source side but the translation does not appear, then this is a mistake, or maybe there is a wrong spelling used or in a particular context this word is wrong or the grammar is not correct, the fluency is not correct. So all of this signals can come from the amount of data that you are trying to provide to the model. And once it learns, then it can easily detect whether there is a error in a unseen data or a new kind of translation. So it can really help. In a lot of different use cases. You can train models that are generic, that you can use in different scenarios. But of course, generic models can only be reliable to a certain extent. The moment you go into domain specific translations, you need to fine tune the model again to make sure that it understands the nuances of that domain. Or for some kind of companies, they have a specific style of writing, they have a specific way of expressing themselves. So you have to fine tune the model based on how they translate something or how they express themselves in their content. So model can learn all the different clues from the data. [00:18:51] Speaker A: And you've mentioned it, right? It's something that can be presented across different sectors and it probably has very different use cases, very different implementations. It seems like whatever happens in terms of, you know, when you have a bad result and where you have something that's low quality, that's flagged in that binary way, like this is good or this is bad, like what happens next is going to, it's going to be determined by the use cases that, that, that are presented. How, how is the evolution of that remediation evolving right now? Right? We have now more resources to just basically automate some of those. Before probably humans had to do everything. Like it's flagged that it's bad. It's like, okay, let me even put it back into the system now probably there are so many other ways to deal with that. How do you see in the different use cases this being applied and how do you think it's going to evolve in the future? [00:19:48] Speaker B: I think one of the major questions that popped up in the last maybe year or two since the introduction of ChatGPT and large language models, whether large language models will going to replace machine translation and maybe the whole workflow or pipeline of the translation. So what we can see that it's not that easy to just replace the whole workflow that is very efficient right now. So you cannot just take machine translation out of the scene and then say, okay, now use Large language models. So there is a lot of legacy things, legacy tools, legacy setups that you have to change. So it will take a lot of time when you can fully rely on completely new system like large language models. So for time being, you can only add things that can work with the current workflow. So the current workflow is that you have translation memories. You are still relying on that, which is already giving you a lot of benefits of whatever is already translated. If it's coming back, then you don't have to translate that again. You can just take it out of the memory and use it. Then you can use machine translation. Machine translation is capable now. It's doing a good job for many different domains and many language pairs. But there is a problem of trusting every output. So you are actually asking humans to verify that. And in that capacity you can add more AI there, which is much easier to just replace the whole workflow. So right after machine translation, we are now saying that there is a post empty production pipeline. So once empty is done, you can actually add more tools. Like you can use large language models to do something, or you can use a QE model to verify it. And once you have these tools, you can have different clues about the quality and what to do next. And based on that, you can decide on different workflows. So a very easy workflow could be that if a QE system is saying that this is a good translation already, you don't have to do anything, just send it directly to the production. And whatever is marked as erroneous, that only can go to the humans. And what we have seen in our experience and experiments, that most of the time 50% of the data is actually already good out of empty. So only the 50% that is left should go to the human. So that's already 50% of saving and 50% of reduction of work. So that's very good, very positive. But then do we really have to send everything to humans or can we do something more there? So actually LLMs are a good source to rewrite something. So let's say if QE model is identifying that there is an error in a translation, then rather than directly sending it to the humans, we can send it to LLMs to do automatic postsecting so they can see whether it's a very simple mistake that can be corrected and then it can rewrite it. So now the human part of doing post editing can be done by LLMs and then we can apply the QE again and see whether the output of LLM is good or bad. Because we cannot currently fully trust whether LLM is doing a good job. So we really have to do the quality estimation part more than once to make sure that whatever is coming out of all these post production, post empty production pipeline is good enough. And once we will say, okay, this is very good and now we reach the targeted quality score that we needed, we can send it to production and then whatever is left again can go to humans. So naturally you will ask, that means we are replacing humans. But we see it in a different perspective, as I mentioned in the beginning as well, that there is a lot of content that is currently not translated just because of the current workflow where humans are involved in everything. So if we can automate some part of the pipeline and introduce AI there, that means we are easing the current pressure on humans. That means you can actually do more translation. So they will be still working the same kind of hours, the same kind of work, but there will be more volume that is passing through the whole workflow and the humans will be only managing that part where there is serious errors. And even LLM cannot handle that, that will go to humans. So that's, I think, win, win for everyone. [00:24:19] Speaker C: I can add something to the last part. So right now, if you don't use qe, you send everything to human and human can must decide that whether it is true or not, and if it is not true, it is not correct, then try to improve it. So it is two types of decision making and it is not fun. But if you filter, let's say 50% of the translation and you just send that quality to human, then human can already know that it is probably a very bad sample and then he or she can just improve it. Work on that. [00:25:00] Speaker A: And this is very interesting also from a perspective, from the conversations that we've had. So one of the things that we're also trying to dig into within multilingual is the different ways in which companies globalize their content and localize their content for that matter. And some of course use one standalone solutions, they use, you know, one entire, whether it is an app or internal solution where they do everything and they don't use any external help, they just use whatever it is within that suite. Then you have the orchestration approach where you bring in different tools. How does QE change in that process? Architecturally speaking, are we in a moment where integrations are so perfect that it doesn't matter whether you orchestrate or you use a standalone? Like what, what, what differences can we expect to see there in qe? [00:25:54] Speaker B: So I would Say not just QE but all this new technology like AI based technology that is now very useful in different parts of the workflow. So, so that's why already companies are actually moving away from this one solution, one black box kind of thing. So they are already introducing, let's say AI in, in identifying source problems, for example. So there is a whole new dimension where companies are looking into how they can improve their source content. So most of the time, or actually a majority of the errors in translation are actually coming from the source mistakes. So you can already introduce some kind of AI there to verify whether the source is good enough or some changes are needed. Then as I said that there is a whole new dimension of post MT production where automatic post editing is one and quality estimation is one. But at Taos, what we believe is that all of this is now available. You can use whatever mechanism you want. If one company is providing all the solutions, maybe you can use that. But internally they are also using different pieces together and then providing you the output. We are using machine translation neural models most of the time. But you don't know whether the machine translation companies are already actually shifting to LLMs. So you are just sending a request, you are getting an output, you don't know what they're using. So it's possible that they are already using LLMs. But then for different kind of tasks you have to use different tools, different AI in your workflow. So this new approach is already becoming more interesting and more, more and more important that how you are going to use different tools and what those tools can give you. So you know, quality is one aspect. You, you, you have to improve the quality and this is the goal of everyone. But, but time is another constraint. So you have to improve the time of the localization. Currently we are relying too much on humans and that is restricting us to do more translation. Then cost is another factor. You have to reduce the cost. When you involve a lot of humans, that means your cost is going up. So within the same cost, if you can do more, that's always better or you can save more money, that's always better. So there is a lot of different areas where you can actually use these different tools and different technologies and then that can help you. Quality estimation specifically is interesting in a way that it can be the, the guide rails of the whole workflow. So when to decide to send something to automatic post setting, when to decide something to send to humans, when to decide something good enough, even at the source side, if you can add a quality estimation model that can Decide whether to do something with the source or not. So quality estimation models can be the guide rails which can actually give you decision making power and different steps, what to do. And the way you actually, the way we are actually training quality estimation models, we are keeping it light. So we want to train models that are very fast and just doing one specific job. So because this is another valid question and a lot of people are asking whether, whether we can use just LLMs directly, why we need a quality estimation model, why we cannot just directly use LLM. I think the answer is very simple. Because LLMs are these very huge computationally expensive, cost wise, expensive, time wise, expensive models. You cannot just put it, you know, everywhere. So you have to use them smartly. You have to introduce smaller models that can take work off the bigger models. And then if needed, and only when it is required, for example, for automatic post editing, only certain part of the content is going to LLMs where you can get the most benefits out of that. So it's very similar that if I have, let's say a tank, I will not take my tank to go for grocery shopping. So I have to do smaller models that can do a specific job. [00:30:02] Speaker A: Thank you, Amir. And before we move on with some of our questions, I'd like to pause for a second and just let everyone know at Multilingual, those that are listening to this that this content is available on all different formats. If you're watching this on YouTube, you can listen to it in Spotify, you can listen to it on Apple Podcasts. We also have other shows, Lang Talent, which covers the future of work in our industry, and Echoes of Meaning, which covers the essence of our industry. We also do the C Suite hot seat. So if you're familiar with that, please check it out. And remember that we are releasing an issue every month. Last month we did gaming game localization and every month we are covering different types of topics. We also cover the Canadian ecosystem and we're going to be covering many other aspects. We're very happy to have with us today from Taos, Amir Kamarang, Solutions architect, and Amir Soleimani, Senior NLP Engineer, both from Taos. And my other question on QE is something that I've asked others which is, okay, how is the nuance on quality estimation changing? How is quality estimation evolving? Is it including and give us a little history, Right. I assume it was very primitive, we could say very basic in what it did. And then it's evolving over time with the pairs, with the phrasal pairs probably. And all the Contextual elements that we can add and all of the systems that you are talking about right now. So what does that look like right now? [00:31:40] Speaker C: Yeah, if we look at history, maybe not very old, but machine translation or generally machine learning used to be very statistical. So I can imagine that in past years, couple of years ago, all the models are depending on very classical machine learning and it's statistical. But right now everything is neural based and most, almost all models are transformer, transformer based, which allows you to train the models on very, very large data and make your model very, very big. And this actually have shown, we have seen that this improves the performance not only in machine translation and quality estimation, but in all other fields. And this is also true for our case, quality estimation. And I can also imagine, I can also explain that these models are GPU friendly, so it is good for hardware. Then you can train them on more data and then performance will go higher and higher. [00:32:47] Speaker B: As you mentioned that what kind of things now we can include in this. So definitely all these different kind of nuances. As I, as I mentioned earlier as well that different companies have different style of writings, so they provide style guides to humans as well, that okay, we want this kind of content and it should be written in this way and branding should be highlighted in this way. And also of course, when we do localization and it's going across different cultures, so you have to add all those nuances as well. And the good thing is that we can learn all this from the data and then introduce this into the model and the model can actually learn from the data that, okay, I have to use this kind of nuance or maybe for a specific company, I have to use the branding in this way. So this is all possible. And because of the power of, as I explained in the beginning, the power of the mathematics, how you can convert a linguistics problem into a mathematical model. And then if you can represent all the knowledge that you want to represent properly, if you train that part of the model properly, then it can, you know, give you all sort of these signals. It can, it can pick from the data all the nuances. And of course you can the way you generate the data, that is very important, how you are generating the data or how you are collecting the data, how you are introducing different errors. So if, for example, if we are talking about nuances, cultural nuances, then I can introduce in the data that this is a correct way to write a thing in, let's say Europe. And if you go to America, then it will be like this. And then based on where the model will be used, I can change the labels. So I will say, okay, here you cannot say this, so let's mark this as an error. And here you can say this in this style, so mark this as correct. And then the model can learn from this. And the more data you can provide to the model, the better it can become in understanding different kind of clues on top of that. Now, one of the biggest advantage of using transformer models and all the new models like, based on the technology of large language models, is that you can, you know, increase your context. So previously you were looking for a very small context. So sometimes it's very difficult to teach the model that, you know, you have to look not just in this sentence, but beyond this sentence and see beyond that. And now you can actually provide larger context to the model. So based on that, you can actually teach the model much more like the connection of different words and different themes and the domains and how the words are being used in a specific domain. So you can actually generate the data in a way that model can learn from that. [00:35:29] Speaker A: And like I've stolen the phrase from engineers as well, they always say already, it always comes back to the training data. So I assume for quality estimation that there is an element to be considered about the evolution of language. And even if machines at some point just dictate everything we consume, well, the human is so unpredictable the way it's going to start interacting with whatever it's given to us and manipulated so that we are manipulated, still we'll have a human output that's unpredictable. How do QE models today, like make terms with that to make sure that there is constant improvement. [00:36:11] Speaker B: I think the first thing is, and the most important part is that because any AI model is not 100% accurate, so there is always a risk involved when you are using AI. So whoever is going to use quality estimation or even machine translation, they understand this, this, this bit that we are doing things faster, we are doing things more efficiently, but it comes with some risk. So, you know, you can minimize that risk by providing domain specific data or by providing the data from a specific client and then train the model according to their vocabulary, the kind of glossary they're using. But things are changing every day, especially with with nouns. There are new nouns added, you know, every day in the vocabulary. And if the model does not know this data, it cannot really take a decision on that, whether it's correctly translated or not. Especially with this whole AI, you know, social media generation, every day there is a new meme coming out and the meaning of the meme is so different that just words cannot explain that. So that means there is a constant retraining needed. So we have to see in different contexts when is the time to actually retrain the model. Or maybe with newer models, can we add outer context directly into a trained model? Just like in LLMs, you can use rag based models where you can actually provide external information in the prompt. Then the model can actually take that into account and you know, change the generation capabilities. So with QE also now at Taos, we are also experimenting with a lot of different ideas, a lot of different new kind of technology and we are looking into how we can introduce different kind of data, different kind of training techniques and different kind of nuances. The results are very encouraging. So we are I think in the in the right direction. [00:38:09] Speaker A: And I'm assuming we're like weeks away from an end to end quality translation without human intervention, right? Is that what we should expect? How far are we to have or is it realistic to expect machines to manage end to end translations? And we are we mentioned of the recording that there must be an answer that relates somewhat to agents and how agents behave. It just seems like this is the right place where a quality estimation agent with a very specific task could really thrive and evolve over time. [00:38:47] Speaker C: So maybe in the future models would be much more, much stronger and then you can rely with more sensitivity on them. But even in the future, all these models learn from something and some information. And if they don't access to that information, if it is a new information, then the performance would drop. If there is a field or there is an edge case or a domain that they haven't seen their data, let's say medicine and new medication or something, then you, you cannot expect they perform very well then human, a human that have that knowledge can intervene and improve the quality. Yes, in the future maybe this happens less and the quality would go up and yes, we will see. [00:39:38] Speaker B: Yeah, I would extend to that. Basically it really depends on the use case and it also depends on the content. So the first thing and most important thing is how sensitive the content is. If it's very sensitive, then you will always rely more on humans. Although humans also make a lot of mistakes. But still there is a more they are more trustworthy and other humans trust more on humans. So that's the thing. I think a good example is for example aeroplanes. Most of the time they are autopilot, they're just flying themselves. But there is always a human sitting there. If it's not there, maybe a lot of people will be very worried that something is going to happen to the plane. So in the same way, any technology when it's very sensitive and you're producing something that is very sensitive, you always need someone to supervise that. Maybe all these tools or quality estimation, automatic post editing that can help the process go faster. So currently humans are doing something, maybe we can provide them more signals that can ease their work. On top of that, if there are quality estimation models, then you can do much better in LQA process. So you know when in any localization framework in workflow there is a part of LQA where you ask linguist to analyze whatever has happened in the past and see if there is any improvement. Can we improve the MT models? Can we, you know, do something about the content? If there is a quality estimation model, then that LQA process will become much efficient because then you will be looking at the error part. You will only see that okay, quality estimation was saying that these are the error that came out guarantee. Now let's just analyze all of these and see if there is a pattern. Can we improve the MT based on that? So that's another interesting part that you can improve, let's say machine transition models or other models based on the scores that are coming from quality estimation. So I don't see that there will be in the near future there will be a magical model that will just do everything and we will trust fully on that and humans will be completely replaced. But what I can see that there are cases and scenarios where the content is just not sensitive enough and then you can just use end to end machine based translation where machine translation, quality estimation, Automatic post editing, LLMs, all are doing their role and then no humans are involved in that. And there are tons of examples of such kind of content. When you are just scrolling social media platform, you don't need really every sentence to be properly translated. You can just read something in a different language and if you can get the gist of it then it's good enough. So you don't need a human to actually intervene there. [00:42:36] Speaker A: Yeah, and probably it will come down to accountability. It does come down to accountability. For example, in multilingual magazine there is content that it's, you could say semi autopilot. There are press releases that go out there but. But you do need a human to make sense that this is not a fake story, that the story actually it's a real company and not just something that a crawling system picked up somewhere. Because usually there will be tons of mistakes and you're right. Humans also make a lot of mistakes. I definitely feel a change in the conversation in the industry. People now are more excited than they were perhaps a year ago, where there was much more fear about replacement, where now people are just kind of like looking at the tools and being like, why are they not moving? It's like, oh, I'm okay. Got it. So I see a lot of translators, interpreters opening up to the possibilities. I see a lot of companies also jumping on the conversation and having at least an opinion and then proposing solutions and services. Before we go, I'd love to give you a chance to let us know what excites you particularly about this conversation. What are you looking forward to in our industry? And you know, both of you, it'd be great to hear your thoughts on that. [00:43:59] Speaker C: Yeah, I guess I can add that so far maybe we just imagine the offline use case of quality estimation. I mean that you have an English source sentence and you send it, let's say, to Google Translate, you have the translation and then you, you can decide to rely on that or just send it to human to verify that. And then you can decide if you add quality estimation in between to save costs or just improve the quality. So let's call it offline process. But there are online applications that to me, quality estimation is very, very necessary. For example, you are chatting online with someone in another language and let's say you, you have an automatic translation and then you want to check online with a very fast model to see if the translation is accurate or not. And qe, our QE can do that right now. And then if the quality is bad, then you can add the third person to verify. And this is an application that would be very, very useful for many sensitive cases like banking systems. So I would say we would have many new applications that requires quality estimation. [00:45:17] Speaker B: Yeah, for me, couple of things are very, very exciting. So the first one is that now LLMs are being almost accepted now that, yeah, this is a interesting technology, we have to use it, but they are these big giant models. So I think one exciting part is that how we can train smaller models for specific tasks. So this is actually very exciting field where I think quality estimation can also grow that how we can, you know, train models that are sufficient enough, fast enough, reliable enough, and then they can be deployed on a very simple kind of architecture. They don't need, you know, large GPUs and all that. So that's one. And the second part is how the industry is changing. So this is a very provocative thought that whether we even need translation in the future. So maybe the models will be generating content in every language directly. So then the notion of quality will be very different. So if a company is generating content in, let's say, English and Spanish directly from a model, then what is quality? And then can we add a notion of quality? There can be trained models that can judge whether the true spirit of the content was there or not and how to do that. And I think that's a very exciting area as well. And I can see that that's already happening now that some companies are actually generating monolingual content in many different languages directly based on their instructions. And a quality. There could be that how good a model or a light language model has generated this content based on the instruction that is provided. So the quality, the meaning of quality will change there. [00:47:01] Speaker A: Thank you both for your time. And is there anything else you'd like to mention before we go and say goodbye to our dear audience? [00:47:10] Speaker B: If you would like to know more about quality estimation, automatic post editing or Taos, just ping us. We are here. We would love to have a conversation about that. [00:47:22] Speaker A: Great. So thank you both of you, Amir Kamran and Amir Soleimani. And thank you to our wonderful audience for being here with us today. That wraps up our deep dive into quality estimation. Like I said, with Amir Soleimani and Amir Kamran from Taos. Thanks for listening to Localization today. Be sure to subscribe, rate us on your favorite platform. And until the next time, goodbye.

Other Episodes

Episode 277

November 30, 2022 00:04:09
Episode Cover

Training AI to read your lips — in multiple languages

While widely used speech recognition tools like Siri or Otter generally analyze audio alone, researchers have also made progress in developing visual speech recognition...

Listen

Episode 287

December 15, 2022 00:02:37
Episode Cover

Linguistic diversity is increasing in the US. What does that mean for language access?

In the United States, the number of people who speak a language other than English at home is rising faster than those who mainly...

Listen

Episode 165

July 18, 2022 00:03:47
Episode Cover

User feedback: the only quality measure that counts in localization

User feedback — direct information from those who use your product — is both the most important aspect of any good methodology and also...

Listen