Seventy Years of Machine Translation The legacy of the Georgetown–IBM experiment

Episode 174 May 02, 2024 00:07:42
Seventy Years of Machine Translation The legacy of the Georgetown–IBM experiment
Localization Today
Seventy Years of Machine Translation The legacy of the Georgetown–IBM experiment

May 02 2024 | 00:07:42

/

Hosted By

Eddie Arrieta

Show Notes

By Rodrigo Corradi.

This year marks 70 years since the first public demonstration of MT, which arguably sparked the language AI revolution that we see today. On January 7, 1954, a team from Georgetown University and IBM automatically translated 60 Russian sentences into English.

View Full Transcript

Episode Transcript

[00:00:06] Speaker A: This is localization Today, a podcast from multilingual media covering the most relevant daily news in the language industry. [00:00:15] Speaker B: 70 years of machine translation the legacy of the Georgetown IBM experiment by Rodrigo Fuentes Coratti in 2024 the possibilities of artificial intelligence in the language industry seem endless, but the AI future that we. [00:00:34] Speaker C: See so clearly today actually began in the middle of the last century. This year marks 70 years since the. [00:00:40] Speaker B: First public demonstration of machine translation, which arguably sparked the language AI revolution. [00:00:48] Speaker C: The first undertaking to solve what became the Mt challenge was a response to. [00:00:52] Speaker B: The Cold War led by Leon Dostert. [00:00:55] Speaker C: A Georgetown University professor and pioneering linguist who developed interpreting systems at the Nuremberg. [00:01:01] Speaker B: Trials, and Kufberg Herd, head of the. [00:01:04] Speaker C: Applied science department at IBM. The Georgetown IBM experiment aimed at automatically translating about 60 russian sentences into English. The carefully chosen sentences were derived from both scientific documents and general interest sources. [00:01:20] Speaker B: In order to appeal to a broad audience. On January 7, 1954, the team gathered. [00:01:26] Speaker C: At IBM's New York headquarters to demonstrate their progress. According to John Hatchins article on the. [00:01:32] Speaker B: Topic, though the experiment was small scale. [00:01:35] Speaker C: With an initial vocabulary of just 250 lexical items and a set of only six rules, it was ultimately able to illustrate some grammatical and morphological problems and to give some idea of what might be feasible in the future. [00:01:50] Speaker B: The original AI hype cycle the event. [00:01:53] Speaker C: Was reported on by the press both. [00:01:55] Speaker B: In the US and abroad, and it. [00:01:57] Speaker C: Garnered considerable public engagement. Given both the growing interest in computers. [00:02:02] Speaker B: And the political backdrop. [00:02:04] Speaker C: Us government funding for more experimentation was. [00:02:07] Speaker B: Soon made available, with a prediction, partly based on the excitement generated, that MD. [00:02:12] Speaker C: Systems would be capable of translating almost. [00:02:15] Speaker B: Everything within five years. However, the reality was that it would. [00:02:19] Speaker C: Take a whole lot more patience than first anticipated. The subsequent years were to prove bumpy due to the complexity of the russian language and the technological limitations. [00:02:30] Speaker B: According to Hutchins, after eight years of work, the Georgetown University MT project tried to produce useful output in 1962, but. [00:02:39] Speaker C: They had to resort to post editing. The post edited translation took slightly longer to do and was more expensive than conventional human translation. [00:02:48] Speaker B: Government funding came under increasing scrutiny, culminating. [00:02:52] Speaker C: In the creation of the Automatic language. [00:02:54] Speaker B: Processing Advisory committee and its 1966 report, languages and machines, computers in translation and linguistics. The report highlighted slow progress, lack of quality, and high costs. [00:03:09] Speaker C: It noted that research funding over the. [00:03:11] Speaker B: Previous decade amounted to $20 million, while. [00:03:14] Speaker C: Real government translation costs stood at only. [00:03:17] Speaker B: $1 million per year. [00:03:19] Speaker C: More damaging were the criticisms that the. [00:03:21] Speaker B: Methodology of early experiments, perhaps in the enthusiasm for attention and investment, was not credible. [00:03:28] Speaker C: The small scale demonstration did not robustly. [00:03:31] Speaker B: Test the MT system, as the selected. [00:03:34] Speaker C: Test sentences were expected to perform well. The report stated that there was no immediate or predictable prospect of useful machine translation. [00:03:42] Speaker B: The initiative, born in the Cold War, was placed on nice. [00:03:47] Speaker C: Most MT proponents were understandably disappointed. [00:03:51] Speaker B: The foundation for Future Solutions the AlpAC. [00:03:55] Speaker C: Report pulled the rug out from under. [00:03:56] Speaker B: MT efforts for the next three decades. However, while the report effectively paused investment. [00:04:03] Speaker C: It also shone a light on a. [00:04:05] Speaker B: Potential hybrid solution, one that reintroduced humans into the equation. [00:04:10] Speaker C: The report described a system of human. [00:04:12] Speaker B: Aided machine translation, relying on post editors. [00:04:15] Speaker C: To make up for the deficiencies of. [00:04:17] Speaker B: The machine output, which set the stage for MT Post editing. Eventually, computer processing power increased, allowing a. [00:04:27] Speaker C: Resurgence of MT innovations. MT quality started to improve as research shifted from recreating language rules to applying. [00:04:34] Speaker B: Machine learning techniques through combinations of algorithms, data, and probability. This became known as statistical Mt Smt. In the mid 2010s, deep learning and artificial neural networks enabled neural MT, resulting. [00:04:51] Speaker C: In dramatically improved translation accuracy and fluency. NMT models have now facilitated the widespread use of MT in the language industry. [00:05:02] Speaker B: The Georgetown IBM legacy the Georgetown IBM. [00:05:06] Speaker C: Experiment and subsequent ALPAC report laid the foundation of MT technology. It also clarified the importance of human. [00:05:14] Speaker B: Led translation and MTP, which has emerged. [00:05:17] Speaker C: As a credible response to global enterprise content challenges. [00:05:21] Speaker B: Chiefly, these early experiments managed to push. [00:05:24] Speaker C: The theories of the early pioneers into the realm of practical and public demonstration. [00:05:29] Speaker B: Thus illuminating their value, if not their actual viability. [00:05:34] Speaker C: Even if the topic was to remain dormant for a few decades to come, the Georgetown IBM experiment played a key role in the development of MT as. [00:05:42] Speaker B: We know it to date 70 years on. [00:05:45] Speaker C: From those initial attempts to solve the. [00:05:47] Speaker B: Language challenge, the emergence of large language models, llms, and generative AI. Gen AI has caused another stir. [00:05:56] Speaker C: Llms and their future productization will now drive innovation with features such as mt. [00:06:01] Speaker B: Quality estimators, content, insight capabilities, and summarization. As we enter this new era, what. [00:06:08] Speaker C: Exactly can we learn from the Georgetown IBM experiment? [00:06:12] Speaker B: Well, to kick start the language AI story, those early pioneers needed to engage with the public imagination, draw attention to their cause, and find support and benefactors. [00:06:23] Speaker C: With current public discourse around Genai mired. [00:06:26] Speaker B: In uncertainty, outreach programs will likely contribute. [00:06:29] Speaker C: To changing attitudes and increased usage. [00:06:32] Speaker B: Moreover, persistence and patience are indispensable. [00:06:36] Speaker C: Successes in early AI language experiments proved. [00:06:39] Speaker B: Delusive, to say the least, but look where we are now. [00:06:42] Speaker C: Empty optimization and Gen AI advances will depend on determination and growing human expertise. [00:06:49] Speaker B: The challenge is not new for translators and linguists, who seem to have technology. [00:06:54] Speaker C: Adoption and innovation in their professional DNA. Only by being proactive in the face of AI driven changes will we achieve. [00:07:02] Speaker B: The next level of progress? [00:07:05] Speaker C: This article was written by Rodrigo Fuentes Coratti. He has worked in the language industry for the past 25 years, specializing in machine translation technology and human processes and capabilities. [00:07:19] Speaker B: Originally published in multilingual magazine, issue 227, April 2024. [00:07:26] Speaker A: Thank you for listening to localization today. To subscribe to multilingual magazine, go to multilingual.com subscribe.

Other Episodes

Episode 170

May 02, 2024 00:19:06
Episode Cover

Putting Globalization Strategy Into Practice. An interview with leaders at Emerson.

By Karen Combe and Mimi Hills. After reading the Globalization Strategy Playbook, Inés Rubio San Martín and Satoko Yuda were inspired to utilize data...

Listen

Episode 241

November 01, 2022 00:09:25
Episode Cover

The Stonehenge Syndrome Writing as Pattern

How about switching between scripts? Endangered Alphabets Project founder Tim Brookes examines a series of seemingly disparate structures — Stonehenge, the Congolese Mandombé script,...

Listen

Episode 134

June 03, 2022 00:04:37
Episode Cover

Kingdom of Characters: How linguistic innovation turned China into a modern powerhouse

Kingdom of Characters, a fascinating new book by Yale University professor and historical scholar Jing Tsu, argues that China’s most daunting challenge after its...

Listen