Neural Machine Translation Versus Large Language Models

Episode 181 June 05, 2024 00:28:20
Neural Machine Translation Versus Large Language Models
Localization Today
Neural Machine Translation Versus Large Language Models

Jun 05 2024 | 00:28:20

/

Hosted By

Eddie Arrieta

Show Notes

By Jourik Ciesielski

Prior to the introduction of LLMs, NMT defined the computer-assisted translator’s toolset. And to some degree, it still does. But many in the industry have recently taken steps to promote LLMs to the new default. In this article, leaders from six of the world’s most influential language companies share their perspectives on the best approach to automated translation.

View Full Transcript

Episode Transcript

[00:00:00] Speaker A: Neural machine translation versus large language models which technology will drive the future of automated translation? [00:00:08] Speaker B: By Yuric Chesielski a new age of. [00:00:11] Speaker A: Language technology has dawned, but in the. [00:00:13] Speaker B: Halflight, linguists and business leaders alike are. [00:00:16] Speaker A: Still forming a vision of what exactly that age will look like. Prior to the introduction of large language. [00:00:23] Speaker B: Models, neural machine translation NMT defined the computer assisted translators tool set, and to some degree it still does. [00:00:33] Speaker A: Thats why the advent of LLMs hit the language industry with the force of a nuclear blast. Everyone knew the world was changed forever, but no one could say exactly how. One could make an educated guess about who might define a conversation though. [00:00:48] Speaker B: And sure enough, eyes turned toward the. [00:00:50] Speaker A: Worlds largest and most influential language companies for clues when considering the question of best practices vis a vis NMT and LLMs. Multilingual reached out to several of these major players and received responses from Bureau. [00:01:04] Speaker B: Lilt, Lionbridge, memic, Pangenic, and translated. [00:01:08] Speaker A: Their combined responses led to some fascinating insights. [00:01:12] Speaker B: Generally speaking, the whole language services industry agrees that adaptive machine translation is the. [00:01:19] Speaker A: Quickest and easiest way to implement customized MNI solutions. Since it adapts in real time to. [00:01:25] Speaker B: Legacy data, it gives the quality of. [00:01:27] Speaker A: Highly customized models without the need for training and maintenance. Despite those strengths, the adaptive MT market is small, but recent changes to the. [00:01:37] Speaker B: Market have been dynamic. Lilt is an established player, RWS has. [00:01:41] Speaker A: Revamped language, Weaver Systron's fuzzy match adaptation is integrated into memic and XDM products, and modern's evolution toward maturity includes a human in the loop feature widely hailed, even bilinguists for its efficacy. [00:01:56] Speaker B: However, MT model training has proven cumbersome. [00:02:00] Speaker A: Training on bilingual dating is expensive, time. [00:02:02] Speaker B: Consuming, and difficult to control. [00:02:05] Speaker A: What do you do if your model still performs poorly after an extensive training. [00:02:09] Speaker B: Round, and MT glossaries can do more. [00:02:11] Speaker A: Harm than good if implemented recklessly. That's why many companies were eager to leverage LLMs translation management system. TMS providers rushed to add LLM driven translation features from bureauworks and Crowden to smartling and transfx, while Memic is gradually. [00:02:29] Speaker B: Rolling out adaptive generative translation, Google partnered. [00:02:33] Speaker A: With Wellacalize and a few other companies to evaluate its adaptive LLM translation solution. [00:02:39] Speaker B: Systron was acquired by Chaps Vision, claiming that in this new AI era, it. [00:02:44] Speaker A: Is more difficult for small players to. [00:02:46] Speaker B: Keep the pace, and so the best. [00:02:48] Speaker A: Option is often to aggregate with other actors to get bigger and thus stronger. And unbabble announced the release of Tower, a multilingual LLM based on Meta's Llama two for translation specific tasks. Pangenic followed suit with its eco LLM. What's more, IBM announced the deprecation of. [00:03:06] Speaker B: Watson language translator, its NMT service encouraging users to migrate to guess what? [00:03:13] Speaker A: Watson X LLMs. This move establishes IBM as one of the first tech giants to sunset its NMT efforts and focus on LLMs for automated translation purposes. [00:03:25] Speaker B: Clearly, our industry has taken a big. [00:03:27] Speaker A: Step to promote LLMs and their flexible adaptation techniques to the new default for automated translation. While one can draw their own insights. [00:03:35] Speaker B: From the state of language technology, its. [00:03:38] Speaker A: Worth listening to the leaders in the. [00:03:40] Speaker B: Industry for their perspectives and happily they. [00:03:43] Speaker A: Have no shortage of torts on the matter. The localization industry has seen many types of customized MT models strained on bilingual. [00:03:51] Speaker B: Corpora glossaries, adaptive MT and now prompting. [00:03:55] Speaker A: Based MT through LLMs. What do you think is the best approach to customized MT? [00:04:00] Speaker B: Euroworks founder and CEO Gabriel Fireman the. [00:04:04] Speaker A: Best approach is what we call context. [00:04:06] Speaker B: Sensitivity, which uses LLMs analytical and predictive capabilities. [00:04:11] Speaker A: We work with a retrieval augmented generation. [00:04:14] Speaker B: Framework that examines the text and looks. [00:04:17] Speaker A: For relevant context in the translation memories. [00:04:20] Speaker B: Tms, glossaries, MT repositories, work unit and preferences. After we retrieve the context, we have. [00:04:27] Speaker A: A dynamic system that ranks this context according to relevance, using a wide range. [00:04:32] Speaker B: Of metadata, including author creation, date, times. [00:04:36] Speaker A: Confirmed in the past, and semantic plausibility. We then feed this context a cluster of LLMs that work as arbitrators to suggest the most likely outcome of all of this context. This suggestion then goes through a formatted filter and is returned to the translation editor. This is the approach that is most. [00:04:54] Speaker B: Likely to create a translator digital twin. [00:04:57] Speaker A: And is therefore most dynamic and effective. It's also easy to scale and manage as all knowledge is stored in TMS and glossaries and does not require fine tuning instances. [00:05:08] Speaker B: Tech evangelist Kurti Vashi the business objective. [00:05:12] Speaker A: Of using translation technology that enables an enterprise to be multilingual at scale is to improve the global customer experience and drive international market success. The technology used needs to be scalable. [00:05:25] Speaker B: Responsive, reliable and cost effective, all while. [00:05:29] Speaker A: Producing high quality output across a large number of language combinations. Adaptive MT technology has shown itself to be the most capable enabling technology to date. More recently, weve seen evidence that LLMs. [00:05:42] Speaker B: If properly implemented, can improve fluency and. [00:05:45] Speaker A: Raise the overall quality for a subset of languages. [00:05:48] Speaker B: However, we have yet to see this. [00:05:50] Speaker A: Scale to the other production needs mentioned above. We anticipate that soon LLMs will become a viable enterprise solution for translation. This will likely come when we move towards task specific LLMs trained specifically for translation. These models will be smaller and more practical to deploy and maintain than today's massive foundational models. [00:06:12] Speaker B: In the interim, both LLMs and classical. [00:06:15] Speaker A: MT approaches may be useful in parallel. Still, most enterprises would likely prefer a single integrated solution unless there are significant advantages for key languages by using two different production pipelines. In general, the choice of technology will always be secondary to the positive, measurable impact on global customers. MT quality differences must be balanced with. [00:06:37] Speaker B: Latency, throughput and cost realities. [00:06:40] Speaker A: The preferred solution will probably be the. [00:06:42] Speaker B: Technology that provides a reliable, consistent, high. [00:06:46] Speaker A: Quality and cost effective deployment in production scenarios. [00:06:49] Speaker B: Lilt VP of growth Allison Yarborough Lilt combines all of these, and we believe this approach is best. [00:06:57] Speaker A: We train on bilingual corpora for adaptive MT on both TMS and online. [00:07:02] Speaker B: While the translators are working, we utilize glossaries in the translation algorithm and we. [00:07:07] Speaker A: Integrate translation samples akin to LLM prompting into the empty system. Each method has its advantages and disadvantages, but we found that the combination provides the best results. [00:07:18] Speaker B: Lion Bridge CTO Marcus Kosol while there. [00:07:22] Speaker A: Is no one size fits all approach to customized empty, new methods exist to improve its results. Traditionally, customization involved training a base model. [00:07:32] Speaker B: For specific brands, domains, or other use. [00:07:35] Speaker A: Cases, but there was limited demand for. [00:07:38] Speaker B: This level of specificity. [00:07:39] Speaker A: With the rise of LLMs. Were identifying a new approach using the LLM to improve the output of a base MT engine rather than customizing the engine itself. [00:07:50] Speaker B: Essentially through a well tuned strategic prompt flow, we can prompt the LLM to. [00:07:55] Speaker A: Check the quality of the translation and refine it based on specific requirements like glossaries and audience. We have found a lot of value in a two step process that combines baseline MT engines with highly targeted LLM prompting strategies to achieve both accuracy and fluency in customized translation. [00:08:13] Speaker B: And of course, this is a prompt. [00:08:15] Speaker A: Flow with iterative prompting ranging across Personas and source target bilingual language to achieve the desired outcome. Chief evangelist Florian Sachs promoting based MT through LLMs combines a strong language encoding. [00:08:32] Speaker B: Structure, grammar, tone of voice with a high level of relevant domain information, which. [00:08:37] Speaker A: Can be provided in a prompt. LLMs will still need to improve for certain languages by providing more data, but for many first higher languages, LLMs can generate fluid and grammatically correct content. Increasing the correctness of the generated content will depend on prompt engineering and the. [00:08:55] Speaker B: Context information, which in our case typically. [00:08:58] Speaker A: Comes from TMS and terminology. Improving the translation quality will not work. [00:09:03] Speaker B: Through retraining the LLM, but by improving. [00:09:06] Speaker A: The prompt, which is much more predictable and controllable. [00:09:09] Speaker B: If predictability and repeatability continuous workflows are. [00:09:13] Speaker A: Key, this is the most efficient approach. [00:09:16] Speaker B: Pangenic founder and CEO Manuel Herons in 2024. [00:09:21] Speaker A: The best approach to customized MT continues to be NMT. We have achieved a level of parallel corporate availability that allows for the creation of MT engines at very economic costs. It scales well, and adaptation can happen in several ways. [00:09:36] Speaker B: At Pangenic, we provide the ability to. [00:09:39] Speaker A: Inject data to a baseline model with three levels of aggressivity, which customizes models in minutes. Other companies do it on the fly. [00:09:48] Speaker B: A very attractive concept, but also a. [00:09:50] Speaker A: Way to accumulate and propagate on the fly errors. Serious and professional workflows always require a human verification of the TMX file before it is injected into the adaptive NMT engine for retraining. NMT is much cheaper to run than LLM based translation as well. It is more controllable for specific objectives such as ecommerce subtitling with a lot of conversational expressions, software and healthcare. Prompt based translation is proving very popular and it has advantages and disadvantages. The largest disadvantage is the lack of. [00:10:25] Speaker B: Control in the output. [00:10:26] Speaker A: Let's not forget that LLMs are generative AI. [00:10:30] Speaker B: In science and engineering, we are used. [00:10:33] Speaker A: To having the same results if we. [00:10:34] Speaker B: Apply the same formula well, we all. [00:10:37] Speaker A: Know that asking the same question to an LLM does not necessarily guarantee the same translation result. That's not bad if you have occasional translation needs, like translating in a mail. But try to incorporate LLM based translation consistently at scale while fully respecting terminology and styles. And the LLM seems to have a mind of its own. All independent empty companies as well as TMS companies are working to incorporate Genii into their workflows, but with no guarantee or customization. Prompting is not enough. There is a temptation to assume that after getting ten results right, all translations are going to be fine and that LLM based translation will work just like NMT translation does. It doesn't. We have tested pure prompt based LLM translation. Unless you have a specific model trained. [00:11:26] Speaker B: For the translation task, clever and tried prompting and an established workflow, it will. [00:11:32] Speaker A: Generate free versions and not accurate translations. In short, models trained on bilingual corpora. [00:11:38] Speaker B: And glossaries are very effective and relevant. [00:11:41] Speaker A: And sufficient data is widely available, at least in major languages. Adaptive MT can further enhance the quality if there is sufficient and regularly updated training data. However, prompting based MT using LLMs offers. [00:11:55] Speaker B: More natural and contextually relevant translations, especially. [00:11:59] Speaker A: When domain specific training data is limited or non existent. LLM translation is great for off the cuff Japanese less than greater than Spanish, or polish less than greater than Mandarin. I do see the value there. [00:12:12] Speaker B: So how long will we hold on to NMT not long, I dare say. [00:12:17] Speaker A: I envisage Genai systems that at a. [00:12:19] Speaker B: Similar or higher cost, offer a lot. [00:12:22] Speaker A: More automation from a single application programming. [00:12:25] Speaker B: Interface API connection, benefiting from gennais fluency and post editing in context at scale. [00:12:32] Speaker A: Even though tms are often perceived as. [00:12:35] Speaker B: Obsolete, they are still the primary linguistic resource in many localization programs. [00:12:40] Speaker A: What will be the role of tms. [00:12:42] Speaker B: In the Genai era? Euroworks tms are great sources of context. They will continue to be relevant, but. [00:12:50] Speaker A: They will become easier to maintain and expand. [00:12:53] Speaker B: Translated tms will continue to play an. [00:12:56] Speaker A: Important role in adapting model output to the user's needs. Tms have been mission critical for all. [00:13:02] Speaker B: Data driven approaches to MT. From statistical mt onwards, however, the quality. [00:13:07] Speaker A: Of the data is important. The maintenance of TEM has received much less attention than matching leverage maximization. More attention is needed on data cleansing and date optimization for prompting, rack and other processes that are useful for LLMs and other technologies. [00:13:24] Speaker B: Beyond AI learns from data and more. [00:13:27] Speaker A: Relevant data will usually produce better results. All TEM is not equal and human reviewed and quality certified TEM will always. [00:13:35] Speaker B: Be more useful in the future. [00:13:37] Speaker A: Metadata that is not widely used or available today. [00:13:40] Speaker B: Source quality domain may become more important. [00:13:44] Speaker A: As optimizations will be based on utility and relevance to a downstream AI process rather than just a simple string match, as is the case most often today. Synthetic data creation is also likely to. [00:13:57] Speaker B: Become more important, and this is highly. [00:13:59] Speaker A: Dependent on the quality and categorization efficiency of the seed data. [00:14:03] Speaker B: Look tms provide training data and the. [00:14:06] Speaker A: Accuracy of models is highly dependent on. [00:14:08] Speaker B: The quality of data used to train them. If high quality data is used to train the model, it generates better outputs and if poor quality data is used to train the model, it will generate bad outputs. [00:14:21] Speaker A: The same principle applies with TMS. If the TM quality is high, it is a useful data source to train. [00:14:27] Speaker B: A model, and if a TM quality is poor, it should not be used. [00:14:32] Speaker A: To train a model. Lilt fine tunes a bespoke model per customer per language, and a customer's TMS are a data source in that fine tuning and customization for the customer's preference. [00:14:43] Speaker B: Tone and terms, there will likely be. [00:14:45] Speaker A: An expansion of the data source types that are used to fine tune models. Notably, there are already application layers and tools that capture human feedback in real. [00:14:55] Speaker B: Time, such as Lil's platform to create a real time model training cycle. [00:15:00] Speaker A: TMS also provide value and consistency, particularly with exact matches and near exact matches. [00:15:07] Speaker B: Over time, we may see low fuzzies. [00:15:09] Speaker A: Devalued as compared to suggestions from Genaid. Assuming that the linguist can simultaneously access both as low fuzzies are often already. [00:15:18] Speaker B: Overvalued in perceived usefulness, we do not. [00:15:21] Speaker A: Perceive this as a bad shift. [00:15:23] Speaker B: Lionbridge while TMS excel at reducing translation. [00:15:27] Speaker A: Costs and maintaining consistency within specific domains, the future of localization lies in their synergy with MT and LLMs. Particularly exciting is the potential to leverage LLMs to enhance TEM quality, which can boost the value proposition of TMS. Genai can help address some of the challenges with traditional tms, like outdated content. [00:15:50] Speaker B: For example, formal modes of address might still be present in a TM even if they are no longer used. [00:15:56] Speaker A: Fixing these inconsistencies manually is time consuming and expensive. But with sophisticated language prompts and an. [00:16:03] Speaker B: Iterative approach, Genai can reliably and cost. [00:16:06] Speaker A: Effectively update that TM. [00:16:08] Speaker B: However, this requires deep expertise in both. [00:16:11] Speaker A: The languages and the domain being translated, and I think this is just the beginning. Genai has the potential to supercharge existing. [00:16:19] Speaker B: Language assets, unlocking their potential and making. [00:16:22] Speaker A: Localization more efficient and effective. [00:16:24] Speaker B: Memic TMS will be the key to providing good translations. [00:16:28] Speaker A: Our research shows that TM enriched prompts can get translations out of LLMs that are on par with customized empty systems. This approach is bringing back more control and value to linguists to care about the tms they maintain and allows any. [00:16:43] Speaker B: Linguist to benefit from empty they just need to properly do their homework. Maintaining TMS and terminology pangenic prompt based. [00:16:53] Speaker A: Models can be a strong choice when domain specific training data is limited. This is where I see a very good value to a technology and workflow that is otherwise becoming obsolete by the day. This may not be a very popular. [00:17:06] Speaker B: Statement, but unless tmss include something really revolutionary, there is little value in systems. [00:17:12] Speaker A: That are designed to receive files, which. [00:17:14] Speaker B: Are then processed by DM based systems. [00:17:17] Speaker A: With the added costs of project managers to save on translation costs as a result of translation matches. The TMS we have built over the. [00:17:25] Speaker B: Years are excellent resources, parallel corpora fit for machine learning, so we can tame. [00:17:31] Speaker A: Not just fine tune LLMs to produce the desired results. Language data management was anticipated to become. [00:17:39] Speaker B: A central and perhaps lucrative service after. [00:17:42] Speaker A: The big NMT push of 2016 to. [00:17:45] Speaker B: 2017, but it never really got off. [00:17:48] Speaker A: The ground in the language industry. Will LDM revive to fine tune LLMs. [00:17:52] Speaker B: And slash or enrich rag mechanisms? Euroworks in the past, greater corpus was. [00:17:59] Speaker A: Synonymous to greater quality. [00:18:01] Speaker B: Now we think smaller, more well built corpus produce better results. [00:18:06] Speaker A: I'm confident that LDM will continue to. [00:18:08] Speaker B: Grow, but we don't believe it's critical to results anymore. Translated while RaG and prompt engineering are currently very much in vogue, how we. [00:18:18] Speaker A: Steer LLMs to perform more effectively for specific tasks is likely to evolve further. Recently, there has been success with linking knowledge graphs to produce more relevant and contextually accurate results from LLMs. [00:18:31] Speaker B: This area of expertise could grow as it involves logically connected data concepts, better. [00:18:37] Speaker A: Contextual relevance, and some basic semantics, small elements that are close to skills prevalent in our industry. LDM can only grow when the platform for which these services are performed is more stable and not evolving as quickly as LLMs are today. We have already seen that early advice on prompt strategies is now outdated and less relevant. [00:18:58] Speaker B: Look while the concept of LDM was strong, it was limited by the weak tooling available to support it. [00:19:05] Speaker A: Older systems are generally limited to receiving linguistic data solely in the form of a TM, and many MD focused systems don't expect bring your own data. So most companies ultimately defaulted to TMX files. The newer focus on LLMs makes the need for a vertically integrated platform that can seamlessly tie the production of high. [00:19:25] Speaker B: Quality content to linguistic data, training and. [00:19:28] Speaker A: Tuning tasks that much more essential. As companies begin to deploy and operationalize LLMs at scale, they will begin to understand the importance of fine tuning for. [00:19:38] Speaker B: Content quality, brand alignment, and use case specificity. As a result, they will likely allocate more time, resources and consideration to LDM. [00:19:49] Speaker A: The game changer is that LDM can focus on the domain specific part. [00:19:54] Speaker B: There is no need to also capture. [00:19:56] Speaker A: The general language aspects. [00:19:57] Speaker B: Also, retraining of the whole MT system is not needed. [00:20:01] Speaker A: This reduces training effort and increases predictability. LDM can become much more efficient, better chances to get it off the ground. [00:20:10] Speaker B: Pangenic I see a bright future for rag translation. [00:20:14] Speaker A: Buyers do not typically care how the magic is worked out in the background. That is a discussion that is left for us developers. A workflow that has in domain NMT engines and runs rag based pay is something we are already testing in production at Pangenic. I envisage a good future for it and results are very promising. Its a challenge to make a vector. [00:20:35] Speaker B: Database behave like a TM. [00:20:37] Speaker A: Its not designed to be a typical. [00:20:39] Speaker B: Database looking at fuzzy matching, but once you master the process, the system produces. [00:20:44] Speaker A: Amazing quality translations at scale. [00:20:47] Speaker B: Now the point for LDM systems is. [00:20:50] Speaker A: That MT providers can run the necessary. [00:20:52] Speaker B: Data management, including keeping a copy of. [00:20:55] Speaker A: The edits for the system to continually. [00:20:57] Speaker B: Improve with more parallel data, not necessarily just TMX files. [00:21:02] Speaker A: So unless companies offering LDM or tmss move into the empty or automatic pay. [00:21:08] Speaker B: Arena, there is little added value to pure management. [00:21:11] Speaker A: Quality evaluation has always been a delicate discipline in the empty space. Automated quality metrics are primarily intended to. [00:21:19] Speaker B: Measure the impact of model training, whereas. [00:21:21] Speaker A: Human quality evaluation is time consuming and not entirely unbiased either. Will MT quality evaluation be redesigned? [00:21:29] Speaker B: By Genai Euroworks lifinc MT quality evaluation. [00:21:34] Speaker A: Has already been redesigned. Semantic verification is a lot more powerful. [00:21:38] Speaker B: Than MT quality estimation percentage scores. [00:21:42] Speaker A: Not only can potential errors be flagged. [00:21:45] Speaker B: But many of them can also be. [00:21:47] Speaker A: Remediated prior to translator inspection. Our entire paradigm of quality has long. [00:21:52] Speaker B: Been due for ANOVA call, and I. [00:21:54] Speaker A: Think we'll be able to connect quality to content performance as opposed to purely technical prowess or how it is perceived. [00:22:00] Speaker B: By a small number of technical people. Translated for certain. [00:22:05] Speaker A: One thing that is not going to change is the ultimate need for human validation of model output. Were already seeing the next wave of LLM development being powered by very high quality human annotations for supervised fine tuning. [00:22:19] Speaker B: SFD, and reinforcement learning from human feedback. [00:22:24] Speaker A: Theres only so far we can go by allowing one model to assess the output of another. [00:22:29] Speaker B: That said, there will be an increasing. [00:22:31] Speaker A: Role for automation with Genai to assist and enhance the process. Quality evaluation metrics provide a quality assessment of multiple versions of an empty system that may be used by the system developers to better understand the impact of changing development strategies. [00:22:46] Speaker B: Commonly used evaluation metrics include blue comet, TEr, and CHRF. [00:22:52] Speaker A: They all use a human reference test set to calculate a quality score of each MD systems performance and are well. [00:22:59] Speaker B: Understood by the developer. On the other hand, quality estimation or. [00:23:04] Speaker A: MTQE scores, are quality assessments made by a model without using reference translations or actively requiring humans in the loop. [00:23:12] Speaker B: It is in a sense an assessment. [00:23:14] Speaker A: Of quality made by a model itself on how good or bad a machine. [00:23:18] Speaker B: Translated output segment is. [00:23:20] Speaker A: MTQE can serve as a valuable tool for risk management in high volume translation scenarios where human intervention is limited or impractical due to the volume of translations. [00:23:30] Speaker B: Or speed of delivery. [00:23:32] Speaker A: LLMs have the potential to play a. [00:23:34] Speaker B: Larger role with QE going beyond the. [00:23:37] Speaker A: Simple rating of quality to provide richer, more actionable data. LLMs have a massive database of reference. [00:23:43] Speaker B: Text to determine whether a sentence or text string is linguistically correct, at least. [00:23:48] Speaker A: For the high resource languages. LLMs can be trained to identify translation error types and thus could be useful to perform automated QE of machine output and increase efficiency in high volume translation scenarios. Genai can provide useful assistance in rapid error detection and error correction scenarios. [00:24:07] Speaker B: Also, the worst quality in a large data set, such as 5000 out of. [00:24:12] Speaker A: 1 million sentences can be extracted and cleaned up with focused human efforts to. [00:24:17] Speaker B: Improve the overall quality of the corpus look. Yes, it is likely that in the future, QE models will be trained for. [00:24:26] Speaker A: Specific domains, as has been done with MT systems, and there will still be. [00:24:31] Speaker B: A human in the loop to train. [00:24:32] Speaker A: The system and verify and audit system output. Lionbridge Genai has the potential to revolutionize MTQe. Sophisticated prompting techniques will enable us to. [00:24:43] Speaker B: Move beyond just measuring accuracy, offering a. [00:24:46] Speaker A: More nuanced assessment that considers factors like fluency and target audience resonance. [00:24:52] Speaker B: However, Janai's true potential lies in its. [00:24:55] Speaker A: Ability to automate translation that doesn't just. [00:24:58] Speaker B: Create replicas, but generates fluid, targeted content. [00:25:02] Speaker A: That truly connects with users. Traditionally, language quality evaluation has been dominated by metrics focused on accuracy, fidelity to. [00:25:10] Speaker B: The source text, adherence to grammatical rules, and consistency. These are all important, but they fail. [00:25:17] Speaker A: To capture the emotional impact or user experience. [00:25:20] Speaker B: Now, with Genai, we can create impactful. [00:25:23] Speaker A: Content in a way that was never possible before, and improve how language connects with users at both intellectual and emotional levels. [00:25:30] Speaker B: So while the core aspects of quality remain vital, the exciting part is that. [00:25:35] Speaker A: The conversation can now expand to include the impact and emotional value of content, going beyond just mirroring the source text. [00:25:43] Speaker B: To me, this is a game changer. [00:25:46] Speaker A: Quality evaluation will stay relevant. [00:25:49] Speaker B: I think the domain adaptation through prompting. [00:25:52] Speaker A: Still needs a systematic approach, but on. [00:25:54] Speaker B: The other side, there will be new opportunities for AI quality estimates like model. [00:26:00] Speaker A: Front or taws are offering. Either LLMs will be able to do the job as well, or the higher predictability of LLM based translations will allow to identify outliers with even simpler models, making AQ more affordable. All in all, LLMs are developing fast because of funding our industry would never. [00:26:19] Speaker B: Be able to provide, and we will. [00:26:21] Speaker A: See some more surprises for sure in the future. [00:26:23] Speaker B: Still, I believe that though these innovations. [00:26:26] Speaker A: Will continue to have a strong impact. [00:26:28] Speaker B: On the localization industry, they will not disrupt it. Pangenic yes, completely. [00:26:35] Speaker A: The blue score was never that good. [00:26:37] Speaker B: Or accurate, but at least it provided. [00:26:39] Speaker A: A kind of measure for system improvement. [00:26:41] Speaker B: You could fool blue completely, and it. [00:26:44] Speaker A: Could give you arguably modest improvement percentages. [00:26:47] Speaker B: When human evaluators appreciated better fluency, for example. [00:26:51] Speaker A: Sometimes blue would even penalize that fluency. The trouble with current evaluation systems is that we all want to use Genai in translation because of the way it handles context. That is the added value proposition of language service providers and translators. And yet we are measuring quality by the segment or the number of edits. [00:27:10] Speaker B: But in a context aware translation world. [00:27:13] Speaker A: Edits can come from humanization of the. [00:27:15] Speaker B: Context, from making it truly relevant or. [00:27:18] Speaker A: More current to an audience, and that is being penalized by current systems. MTQ offers some advantages because the system itself may or may not be very. [00:27:27] Speaker B: Confident in the output it has produced, which is fine. [00:27:30] Speaker A: We need some sort of confidence score at least to route the lowest confidence. [00:27:35] Speaker B: Segments to humans, but we are pushing for document level or at least chapter level translation, which is making use of. [00:27:43] Speaker A: The full attention window context. An LLM can offer, say, 32k tokens or double that in some systems. [00:27:51] Speaker B: If we take 32,000 tokens, we are. [00:27:53] Speaker A: Dealing with some ten to twelve or 15 pages. We definitely need new metrics and to leave the segment level mentality behind. This article was written by Yurik Chesilsky. He is the co founder of CJ International, chief technology officer at Jamagata Europe, and a Ninja Insights consultant and researcher. [00:28:14] Speaker B: Originally published in multilingual magazine, issue 228, May 2024.

Other Episodes

Episode 155

March 27, 2024 00:12:12
Episode Cover

How Climate Conscious Is the Language Industry?

A group of language industry organizations surveyed localization professionals about sustainability. Allison Ferch uses the preliminary results to argue that efforts to mitigate the...

Listen

Episode 26

February 16, 2023 00:03:42
Episode Cover

The number of language service providers in the world

How many language service providers (LSPs) are there in the world? In some ways, this question is similar to asking how many grains of...

Listen

Episode 89

September 18, 2023 00:01:42
Episode Cover

The Week in Review: September 18, 2023

In this week's #WeekInReview, we dive deep into the mysterious world of the 'foreign language effect' and the unexpected ways it can influence decision-making....

Listen