The Future Of LLMs in Translation, with João Graça

Episode 197 July 09, 2024 00:39:09
The Future Of LLMs in Translation, with João Graça
Localization Today
The Future Of LLMs in Translation, with João Graça

Jul 09 2024 | 00:39:09

/

Hosted By

Eddie Arrieta

Show Notes

In this interview, João discusses his career path and the guiding principles behind Unbabel's innovative technology, including their recent launch of Tower LLM.
 
We talked about the future of AI-powered language solutions, the unique collaboration between humans and AI at Unbabel, and the secret sauce that makes Tower LLM a multilingual translation powerhouse.
View Full Transcript

Episode Transcript

[00:00:02] Speaker A: First of all, a linguist is someone that loves language, has attention to detail. I think for them to succeed at Unbabel, they have to like AI and believe that AI is part of the future that is coming. Because if you're on denial, then you won't be happy. But understand the limitations of AI and that AI is only as good as the knowledge that we give them and that knowledge comes from them. So kind of. CaI has an extension? Yeah. And again, it has to be some, someone curious. And one thing I thought about linguists, but mostly linguists, and also translators that work with unbabel, not the freelance at unbel. They have to be willing to change the status quo. I remember I always had a lot of discussions with people at unbeabble that say, guys, if you're telling me that the solution to get more quality is to hire better translators, then you're not doing me a favor, because everyone knows that. And yes, that is a solution. What we're trying to do is find another solution that can also work, because guess what? You can scale the amount of translation that is needed just by relying on professional translators. That has been kind of like the mantra since the beginning. [00:01:08] Speaker B: Following is my conversation with Joao Graza, CTO of Unbabble. We talk about how his individual values align with those of unbabbel and how that has helped them build a company with over 400 collaborators worldwide and incredible innovations like Tower LLM. Enjoy. [00:01:32] Speaker A: I'm Jean Grasse. I'm the proud father of the three girls. I'm from Portugal, Lisbon, Portugal, and I guess my background. So I did my undergrad in computer science and I did my PhD in natural language processing and machine learning. Working a lot with supervised learning machine translation almost like 20 years ago. And then after that I was doing that in the US, in Philadelphia, Tupan. Then I came back to Portugal and one of the things I did is I started the machine learning summer school. So the idea was me and the other two co founders wanted to kind of raise the level of people in machine learning in Portugal. It was still kind of like a niche area back then. And that has been going very well. It's still going after 13 years or 14 years. And at the same time I met Vasco, the CEO and co founder of Unbabel. And we start working together on another startup that was basically doing news recommendations. So we're still doing the machine learning part and lp, but on a different topic that ended up not working very well. So we left, we joined another startup, basically also in recommendation space, doing AI always early startups, we were always part of the initial team where we met the other co founders of Unbevel. That didn't work out as well, but we realized that we really liked to work together. So we decided, okay, we want to do something together. And we had this passion for language AI. How can we bring these two topics together? And I remember, like, at the same time that we're discussing ideas, we had this friend that was renting a bunch of houses in Airbnb, and he was struggling. Whenever he had people coming from Germany, he didn't know what to do. He couldn't communicate by email. He didn't trust Google translate to just translate and send it. And we start thinking about the problem and say, well, why don't we just take machine translation? Because the quality is getting better and we get some people to actually look at the transaction and making sure, is this okay or not? We were not even thinking about correcting the transaction. Just like, can I say this or not? And then it kind of involved, okay, they can also correct. But one of the things that was different was, well, but instead of relying on professional translators, because we want this to be very fast and has to be cheap because it's like low friction, can we use bilinguals so people that speak the two languages? Like, I wrote my PhD in English. I'm not a professional translator, but I could do it. I wrote a bunch of papers, and for that, what kind of tools we need to provide to them. So we start looking a lot on research that was being done back then on option based mt. So we had a machine translation, and you can click a word and see some options and. Yeah, so this continuation of, like, research, academia, getting into the industry was what led to unbabel. And I've been the CTO of unbabel ever since. So my biggest challenge professionally has been to grow as a CTO from a five person startup to a 400 person organization. And it's been very, very fun. [00:04:28] Speaker B: And that's a very, very exciting story, the one that you're mentioning, because in between, from four to 400, a lot of things happen. And I imagine there are lots of app and app ups and downs. And we also imagine that a big part of what helps you move or keep going is the understanding of your personal purpose. And that's why that question is really important to us in this conversation. And of course, you're talking about 20 years ago, working on things that today people are just realizing they exist. Could you tell us a little bit about your personal purpose and how that align with the more technical aspects that you have chosen for your career path and how that all ended up aligning with unbubble and being now the company that it is and the professional that you are at today. [00:05:20] Speaker A: Yeah, I mean, I love solving problems, like, hard problems, and I love AI. I think I love the promise of what AI can do, like, how we can have systems that interact with us. And this was kind of like, what led me to this situation. And during my PhD, I kind of got in love with translation. I mean, language is one of the things that set us apart from other animals. It's a very complex thing because it goes from the concepts that we have to verbalizing them. And so it's kind of like, for someone who has problems, NLP has a lot of very hard problems that need to be solved. And I kind of like, one of the things that fascinates me as a babel is the ability to solve a really complex research problem, which is translation. But at the same time, in the actual consumer scenario, where people are consuming it. And if we succeed, we can actually build a translation layer of the world. We can help people communicate despite their language, which is a huge problem that I would love to solve. So it's kind of this mixture of, like, I can still wear my researcher hat in a problem that I love that has actually big impact for society if we manage to solve it. So it's kind of, like, very hard to not be engaged and wanting to be part of it. [00:06:37] Speaker B: And I imagine, of course, that in the process of growth for mbabble, you have been attracted to other people that love solving problems. Can you tell us a little bit about your team and how some of those values that you have also translate into the unbabel philosophy? To put it that way, yeah. [00:06:58] Speaker A: I mean, I've been lucky to work with amazing people over the story of unbabel. Some of them are still at unbabel. Some of them have left to do other things. I think, like, both me and Vosku fans were very curious. We love deep understanding of things, so we try to gravitate to people like, come from academia, work on academia. I remember that we're a very small team of ten, and we hired Andre Martin, which is the VP of research of Unbevel, an amazing researcher by his own merit, that has, like, a bunch of ERC grants. We got Paul Dimash, which is also, like, an amazing person, always thinking about the future. So we start getting this group of people, like, super curious and kind of thinking, how can we solve this problem from a different angle, like what's the first principles? And I guess that part of our naivety about the industry and also our background and our curiosity led us to go to some different solutions from what the ones that existed in the industry, not of them fully successful, but it brought a fresh new air to the industry. [00:08:08] Speaker B: And this answers a little bit our next question, of course, and I can read between the lines what it is that you really like about enbubble. And we're digging a little bit deeper into what the organization is about. Can you tell us about how is it inside? And you tell us there is curiosity, there is naivety, there is an openness to solve problems, there are no preconceptions, there is a lot of originality added to this. Is this, of course, I assume this is what you like about it, but are there any other things that you like about it? And more interestingly for me, is there something about this whole process that you discovered that you were surprised about as you were doing it? You are like, oh, wow, I did not expect this was going to happen, but it's a good or bad surprise up to you at this point. [00:09:01] Speaker A: Yeah, I mean, so to answer the first part, there's a lot of values of that I like a lot. One of them is like, no politics. I would not be able to work on a company that has a lot of politics, is completely against my principle. And the other thing, which is we care about each other, so we go out of our way to help each other inside of the company. And that, you know, at times that might not be the best thing to do, just in terms of, like, the company performance. But I think it's what we as founders are, and me and Vasco, and we push to the company. And I think it's something that makes me very proud the way. So we're very human in terms of learnings. I mean, I think I haven't stopped learning, like, in the beginning. We're very naive. You know, we kind of thought, well, you have machine translation and you put a human to look at the errors, well, now you're going to have quality. That was obviously not true. The other thing is, translation is the reality of translation is not what you learn in academia. When you're doing research where you basically have a dataset with sentences that you translate. Every different type of translation is a different problem. So if you're doing marketing, content has a very different type of challenges, and translating an email there has a lot of different challenges if you're translating subtitles. One of our learnings in the beginning was we start saying, well, we have an API, translate API, and now we're going to do everything and realize, well, if we're going to take this path, it's impossible to have a product that solves the problem. So you ended up being a service company because there's a lot of manual work and scaffolding because it's literally this very different. And that's kind of led to a significant change of unbabel, which is, okay, let's pick one vertical and focus. And we start focusing on customer service emails in the beginning. And so that kind of narrowed the problem, but allow us to have a product that was super efficient. And so not only could we be super efficient in terms of cost, quality and speed compared with everyone else, but even the operational work, like the project management work that we had was much smaller because we deeply understood what were the integrations, everything that was going through. And then after that, okay, well, let's go to the other one. Is it marketing, is it subtitles? And kind of this approach allow us to be very kind of efficient on the way that we do things. So that was also a big learning. I mean, there are several difficulties of learning. With markup was a very hard learning. But when I think it's kind of like the most important one is what is quality? No one really knows what's quality. It's an industry that everybody claims they have the best quality in the world, but nobody measures it properly. You know, it kind of took us out of this long route of like adopting MQM as a framework. Early days of unbearable, hiring professional linguists to do annotation of MQM, working on a way to predict MQM with MQMQV. So we can scale this at scale to basically have a way to talk about quality because it's very difficult in the industry. Everybody claims that they have the best quality in the world. Nobody shows it. It allows for all kind of marketing that everybody can see whatever they want. And because of the origins of unbabel that we started with freelancers, not professional translators, and we ever since kind of have different crowds of different levels. People would look at themselves, oh, they're just post editing. Hence post editing cannot deliver quality, which is not true. I think there's this assumption that if you want quality, you have three humans in one is a specialist versus if you don't. If you only have one human, you don't have quality, but you actually don't measure the quality. So there's kind of like these templates and with the advance of AIH, that's not the case for a lot of use cases. Machine already has quality for the other ones with a specialist based case by case. So this learning about quality has been a long process over the years, and. [00:12:46] Speaker B: Thank you for that. That's a really good nugget for us to talk about quality and briefly touch upon how humans are involved in this conversation. I'm interested in understanding a little bit more about that, especially the topic on how humans are integrated into the tech solutions that you, you have, Adam Babel, and you have 400 people, so humans are involved everywhere. Could you tell us a little bit what the approach was? And you mentioned there was a bit of naivety at the beginning. So I assume in the past 20 years, the way in which you have approached or seen how humans are involved has evolved. Could you tell us a little bit about how it works right now out on Babel? [00:13:27] Speaker A: Yeah, definitely. So an interesting thing is really early on net unbabel, one of our first employees was Elena Munish, who's like a linguist by education. She did a PhD with me. She's now the head of the European Machine Translation association. And we started having a bunch of internship with linguists. So we had linguists from the get go at in Babel because we're using them to help us understand where the AI was failing, what needs to be improved, generating data. And one of the things that really annoyed us was when we're going to talk with the customer. And this is what happened in customer service. So we did some translations, and then the person did not understand the language, so he went to someone on the customer service desk that kind of spoke the language. It doesn't have to be native, just like kind of spoke the language, and they will show the translation. And he'd say, like, it doesn't look that good, or it doesn't look that native. Like, wait, but what is the problem? Like what? And it was that it was a super ambiguous discussion. And so this can always have a lot of drawbacks. Like if the person was afraid of losing his job because they were starting to use unbabel, obviously there's going to say there's no quality and there was nothing that we could do about it or improve it. We set on this road to, ok, we need to have a proper way to discuss quality. It was interesting because at the same time we developed this kind of MQMT, which was still not MQM, that we're using with a different typology that we're using to train our AI algorithm to detect errors. We're already working on a sort of qe back then and we can say well, if instead of we change the typology to the MQM typology and we actually add the number, we get the score and we can go to the customer and say listen, this is the score of the document, this is all the errors. Feel free to point where are other errors? But we can now start rationalizing. And soon that became part of our culture. So we talk with customers and show them reports of what we translated and say ok, this is how we measure quality. Do you agree? But then we realize ok, we need to scale this and this is not something that the translator can do. Translators might not be equipped to do this kind of more deep linguistic annotations. And so since the get go we kind of have this different community. We had one that did the translation, like the freelancers, we had one which is more the professional translators that we use for some content types. And then we have one with the linguists which originally were mostly PhD students or people that already had a PhD that went through our personal networks that would work as freelancers as well doing annotations for us. And this is still the case, this is still how we're developing the different communities at unbevel. And I don't think that anytime soon we're going to stop needing them because they're the ones generating the data that allows to train the models. They're the ones creating the data actually, they're the ones giving us a lot of ideas of how to improve the processes. And interestingly enough, and I know we're going to probably talk about this in the future of our conversation with the advent of LLMs, you stop needing a research engineer to do a model because building a new model might be write a really good prompt with a lot of domain knowledge that does a task. And guess who's building a lot of interesting models at unbevel? The linguists. So you know, they kind of like they're the new cool kid in the block somehow, which is super interesting. So yeah, so that's definitely a lot of parts of mbabble. We have a lot of linguists on our in house team, but we rely on freelancers, linguists to actually do the annotations. So there's a difference between the people who do the tasks, translation or annotation or glossary creation, which are always freelancers, and the linguists we have in house to basically our own quality assurance, generate test sets, help us understand the problem better. [00:17:05] Speaker B: Yeah, and I think I can read between the lines. You're saying a good linguist is someone who can help understand the problem better. What are some of the other traits that you, and you're a linguist yourself. I mean, you've studied as a linguist. I wouldn't say a linguist, you tell me, actually. But what makes a good linguist for unbabble? [00:17:27] Speaker A: I mean, so a couple of things. So I studied machine learning because I was an NLP, I was exposed to linguistics. And because where I did the PhD, I did a PhD at Upenn, at Istan, and in both places there was this kind of like computational linguistic side, that mixture in a lot of events, linguists with a more technology background and machine learning more from NLP. And so we worked a lot together and we learned from each other. And I think that is super powerful. I think so. First of all, a linguist is someone that loves language, has attention to detail. I think for them to succeed at unbabel, they have to like AI and believe that AI is part of the future that is coming. Because if you're in denial, then you won't be happy. But understand the limitations of AI and that AI is only as good as the knowledge that we give them and that knowledge comes from them. So kind of. CI has an extension. Yeah. And again, it has to be someone curious. And one thing I thought about linguists, but mostly linguistics, and also translators that work within Babel, not the freelance at Babel, they have to be willing to change the status quo. I remember I always had a lot of discussions with people at bevel that say, guys, if you're telling me that the solution to get more quality is to hire better translators, then you're not doing me a favor, because everyone knows that. And yes, that is a solution. What we're trying to do is find another solution that can also work. Because guess what? If you can scale the amount of transaction that is needed just by relying on professional translators, that has been kind of like the mantra since the beginning. [00:19:04] Speaker B: Excellent. Thank you so much. And this is a perfect segue to talk about where unbabble is today. So we have been discussing lately how unbabble has released Tower LLM. And I say it so slowly because for me, as a non english native speaker, it's so hard to say. Llmdeh LLM, Tower LLM recently released. And beyond the marketing jargon, right. I want you to tell me about what makes this empty the best in the market. That is the claim that of course, we are hearing from Babel, and I love to hear more about the details that you see there that make this happen. [00:19:43] Speaker A: Yes. I mean, so the reason why we say it's the best in the market right now, because this market is evolving very fast. And so I think that we all going after each other is we compared it on a huge test set and it's the one that performs the best across domains, across language pairs, intent to do a huge benchmark with the public version of Tower. So Tower has two versions, the public one that we open source, it's on hugging places, you can go there and play, and then has an internal version that has more data and a slightly different training regime, which is the one that we use internally that has better performance. And it also got really good on Nintendo. And then with internal model, basically, if you look at the results in zero shot, it's the best in a lot of cases, second best in another ones, the best on average. And then when you add like a few shots or a rag, it gets the best performance. Now, why do I believe it became so good? I think it's a, LLMs are super strong. So LLMs have a lot of advantages. I'm going to talk about them. I think the first one is if you just look at the zero shot, because then there's all the other classes that I'm going to talk about. It's already very good. And I think it's very good because it's able to get these really powerful abstractions of words and translation in sentences. And so if you compare chat, GPT, lemma two, GPT four, they're already very good at translation. Now what we did was we intentionally trained it to be more multilingual. So we provide a lot of data from different levels, languages that, for instance, lambda two doesn't have. And then we had a highly curated instruction set that we've done that basically makes it behave much better. So we're basically specializing like we're giving an overwhelming percentage of instructions super high quality for translated related tasks. The other thing that we noticed was because we trade for several tasks like grammar correction, source correction, name entity recognition, translation, qe actually become better on all of them than if you just train each one individually. That for me is the biggest boost. The reason we have this highly corrected dataset is part because of the linguists that's been working with us for edges, all the MQM data, but also because we have QE. Even when we did the continuous pre training, we could take a lot of data and just, ok, keep this one, throw this one away because the quality is not good. I think that was the secret sauce to get toward as good as it is, because what tower does right now is a 7 billion model that is beating GPT four, which is a 1 trillion model on these tasks, not on the other ones, but on these tasks, it's actually beating all the other models. I think that's a recipe that we're going to keep using. And I think it's the advantage for anyone working on localization is we have the data, we have the know how, meaning we have the people that not evaluate quality, predict quality, and we have the know how to think, okay, how can we query the model in the best way possible? And then after that, and this is what I think it's coming next. And I think this is where a lot of times the discussion of LLMs versus LLMT is just focusing on the zero shot. I think what people are not discussing is like LLMs have one characteristic that I think it's amazing, which is you can pass it information in the declarative way, meaning, I'm just saying translate for customer a. And he. The style guides of customer a's are a, b, c and daughter. And his terminology is like dvd. You can actually just write this and then the model follows it and it basically becomes customized to the customer in the way that it couldn't do before. Because with neural MT, what you could do is you could fine tune on customer data. It's a much more indirect way, much more expensive. There's no way to pass style guides like, you know, it's. You just couldn't do it. And with LLMs you can do that very simply. Now we're also learning what's the best way to prompt, like what's the site? There's still improvements going on, but as it is, it's amazing. And it opens the door to be much better than the models were before. Much more. And this is just for translation because I think there's much more than that. [00:23:44] Speaker B: And excellent that you're mentioning all these processes because you're also really. Open my microphone, Mila. It's looking orange. I don't know if you can hear me. All right. I think you can hear me right as well, Joao. [00:23:58] Speaker A: It's good? [00:23:59] Speaker B: Yeah. Okay. Okay. It just looks orange. It's weird. But it's very interesting that you're mentioning these processes because it also shows that there is like, it's ongoing, it's always moving forward. You didn't mention the multilingual aspect of it. Could you tell us a little bit more about the difficulties that you have encountered. And of course, I assume this is what sets you apart from other solutions. In fact, yesterday at the presentation that we, at the conference that we were talking, or that we were present ad, they were mentioning how most models are distinctly not multilingual. And they are distinctly really clear to let you know very quickly that they are not multilingual, because it probably sets certain difficulties that others are not willing to tackle. You've said earlier that you love to solve complex problems. Could you tell us a little bit more about this multilingual element of it, of tower LlmDh? [00:24:57] Speaker A: I mean, I think that for neural MT, it was just hard to make it multilingual. There are some multilingual models, which was hard. So a lot of times what we used to at unbevel is to have bilingual models per customer, like transformer models. That was kind of the best performing model LLMs have the ability to be, because they have much more capacity to be multilingual. The reason I think that they're not very multilingual is not because they have a limitation. It's because the people who build them didn't care about multilinguality that much. So the data set that they were gathering was mostly English, like 95% English, like the training. So basically, the models got to be multilingual, kind of by how it has a side effect, because in the training data, there was some parallel there, and it learned some representations, which is different than saying, no, no, we actually want to pass information to the model in a way that he knows that these sentences are translation of each other, and then train a lot on that. So a lot of the continuous pre training is take a model that already exists, like a large language model, like lemma, two mistral, and then provide, like, a huge amount of data and start training on that data again. But on this data, you have data from different languages, and you're explicitly saying, this is a transition of this, which is something that the other models haven't seen unless they saw it by mistake. And sometimes it's like, imagine that you have two huge parallel data, like a row corpus. There's one thing to say, okay, here's, they're all part corpus in English. Here's the one in French. Then, for each sentence to say, okay, this is the transmission of this one. This is this one. So the mologist learns better that particular task. And so that was kind of like the building a model that was multilingual, intentionally. So from the get go, from the data preparation, how we feed the model, the order that we give the data. Everything was made intentional to be multilingual. And this is described like the tower public. They actually have like a research paper published that's going to present it on September. That is an archive so people can go there and see. Mel has always been very committed to advance the state of the art. So most of our research is open source and published, and you can see there, like, what we did to make it multilingual and all those things. It was really more on the intention then, actually, the models are the same. They're all kind of like a transformer based model. [00:27:09] Speaker B: Excellent. And we should be able to get some visibility on that paper once it comes out in September. Tell me a little bit more about the relationship with clients. And how has that relationship helped Tower LLM be where it is right now? [00:27:29] Speaker A: Yeah, I mean, that's a good question. I mean, we start with a lot with customer service clients. We've now been expanding to, like all other verticals, and they basically, we're launching Tarl and making it available to them to start working, but we're not making it available as a large language model that they can interact with, like chat with. It's mostly we're building applications on top of Turlm so they can use MT that is supported by Turlm. We have their data to do, like we have their TMS to use on a few shots, so it can query the TMS. So basically, instead of having a customized model for them, which is what we used to do, we use their direct to fine tune the model. We can actually just do that via prompting and put their glossaries and they can use them all like that. Pretty soon you'll be able to use the model for automatic processing, grammar correction and. Yeah, and I'm, and I'm excited to work with the customers in two ways. First of all, to see what other kind of information can we use to make Mt better. But also, one of the things I think is like, we can use tower to do other tasks. The question is, what are the other tasks? There's a couple of examples that customers that we're working on automatic transcription, which is a really cool, it's a really cool challenge because it's something that was impossible to do before LLMs. So to give an example, we work a lot on customer service still. And if you're doing, if you're answering customer service for a japanese customer, there's a lot of cultural guidelines that you have to follow that you have to adhere to that are completely different from if you're an english speaking person. So if you ask an agent to write in English and then translate and send it to the japanese audience, it's going to be really incorrect, and it's like, it's not going to make them happy. Now, what we used to do before is we had these style guides, really complex, that we used to train the customer service agents that were replying in English and say, okay, if you're replying to a japanese person, you have to write like this. And now they had to read it, they have to understand it, they had to abide by it. So it was not the perfect solution. Right now, what we can do is just, we have our LLM and we say, okay, you write in English normally, and then we have a button, say, rewrite. It provided the same information, but now in a way that's cultural aware to Japanese, and then it translates. And so this is kind of like a new use case that our customers can take from the model and what we're working with the customers. Okay, what are other problems that you have that LLM can start solving for you? [00:29:55] Speaker B: I think my next question switches gears a little bit, and it's more related to history of unbabel and tower LLM. Of course, we all know about the Tower of Babel, and then we have Tower LLM, and then we have on Babel. Could you tell us a little bit about the play of words that we have in there? Where do these words come from? And what's the intention behind all of that? [00:30:24] Speaker A: Yeah, I mean, it's actually an interesting story. So when we started in bevel, again, you saw the small group. We were working on the office with another group of people working for a different company. And there was an afternoon that we spent entire afternoon trying to come up with names for what's now on Babel. And the best we could come up with was Vykrit, which is like the nordic goddess of language, which, in retrospective, sucks, has a name for a couple, at least. We don't like it that much. And we won. We didn't have a name. And the other morning we got there, it was written on the board, working like, unbelievable. Like, whoa, this is amazing. Because if unbelievable wants to build a translation layer, this is basically reverting the effect that the bevel tower did. So the bevel tower was built by God to make people speak different languages so they couldn't cooperate, and bevel was undoing that. And that guy was Ukuma Cel, who turned out to work with us for six or seven years as our VP of Martin still a very good friend, but unfortunately, we cannot claim credit for that. It was all his. He came up with the name. And I think when we're thinking about the name to give to Embel, LLM Tower makes sense because we talk a lot about our offices and the company has a tower, so we have. One of our principles is respect the tower. So the Tower of unbabel, if you actually go to our office in Lisbon, it's a building called the Tower of Unbabel, and that's where the name of the model come from. [00:31:48] Speaker B: We'll definitely have to go to the Tower of unbabble in Lisbon. I'm sure our team will be happy to do that. I'll be happy to do that next time I'm around. No one can tell the future. And I understand these questions are tricky, but the industry is changing very rapidly. We see a lot of fear, especially from the side of the translators, the interpreters, the linguists. In fact, yesterday I met a company that does instant translation and instant interpretation. They have no idea that our industry as a whole exists, meaning they do translations and interpretation. And they have no idea that there are conferences on the industry, that the industry is moving towards AI, that there are many advancements in technology, so ignorance is bliss for them. But there are many that do know what's going on and they are very afraid of what's going to happen. Most of us, of course, believe that humans that use technology and companies that use technology are going to have an edge. And that doesn't mean the linguists are going or translators or interpreters are going to disappear from the ecosystem. That's just very naive to think, and your example is very critical and clear. You're mentioning that your team on Babel has linguists that are involved in the task sections of it as freelancers, but also others that are involved in the more complex tasks. Added to this, how do you see the future evolving in this whole tech, human interaction and the industry as a whole? [00:33:32] Speaker A: I mean, I think it's so, first of all, there's a lot of content that hasn't been translated yet, so I don't think we run the fear that there won't be work. I think the industry is going to become more AI first. I think the models are good enough that some content types don't require human involvement to translate. And when I mean to translate, I mean like on production, on the, on the. On the go, which doesn't mean that there's are, there aren't other works for humans. And I think that the amount of content that is going to be kind of machine translatable is going to keep increasing. And when I say machine translation, I actually like to refer as AI translatable because I think that what we're going to see is that there's much more AI than just machine translation. So one of the things that we're playing with is, okay, you have machine translation, you have QE, then you can do automatic posterity, but instead of doing a blind, automatic post editing, why don't tell it just like these are the errors that QE found, now solve them and you can iterate on this a little bit. You can definitely correct the source text before you set the translation. So you're going to start seeing these pipelines. They have no human involvement, they're only aihdenite, but kind of solve the problem. I think there's more of AI. So for start seeing a lot of like AI, generating the linguistic resources, for instance, like gathering, gathering an updating terminology by reading what's new from that customer. So you have like, you know, the website, everything, and you keep updating that once the terminology changes, it automatically triggers the curation of all the resources you can have. AI that basically understands what are the connectors, all the integrations of and for each content type decide, okay, for this one I think you need this pipeline. For this one I think you need this pipeline. So a lot of that work is going to be automated and you're starting to see the beginning of that. I think humans, I think there's going to be another trend, not just because of this, which is decentralization of content. And you're seeing some companies like comparable with the language operations platform, but also Luke talking about this, which is the place where all the content of a customer comes. Because right now if you think about the company, they have some content on HubSpot for marketing, for instance, some of them on Zendesk, some of them on whatever, but theres no centralized way that you can translate it. But even more than that, just guarantee you have the same style, the same style, the same thing. This is a big problem. Having a centralized content platform powered by AI is going to facilitate a lot of these. I think humans are still going to be needed for the training, the development. But I think one thing you're going to start seeing, and this is a new trend, is basically a lot of agents. So instead of thinking about this module, you think about agents that interact with each other. So there's an agent that is a translation. It can be super specialized for something. There's an agent that goes there, and there's like a quality estimation for linguistic quality. There's another agent that does a quality check for regulatory style guides, for instance, for the medical domain. And then there's another. It is all these different agents. And I think something that is really cool is kind of think about the professional translator, that he no longer has a translation, but he manages a group of agents, so they stick agents. And basically what he does is he looks at the output of the ages are producing, and he trains them. And by training, mean he gives provision, he redefines the prompts, he launches the agents. It's kind of like, I don't know if you used to see Pokemon where they have like these different things that they train them. You can think about the translator to have a set of them. And so I think that for the ones that want to adapt to the new reality, the world is going to be amazing. For the ones that are still discussing that machine translation, cannot do translation because of all the different reasons we've been hearing. There will be more corner to, like, very specific niche places which still exist, but they won't be part of this next wave, because it's kind of like, you need to use this technology. It is too good to not be used, but I don't think it's. And I started by saying there's a lot of words to be translated because I think, like, the volume of words that are human translator hasn't decreased over the last three or four years, although machine will be much better. What's happening is something that we always said, cSIM babble, that the amount of words that gets translated is going to grow exponentially, and the need for humans is going to grow linearly. And that is the only way that you can get everything that should be translated to have an actual translation layer translator. So, yeah, that's why I think. I think there's a couple of fear, but I think it's more like people need to be willing to adapt to the new reality. If they're not. Yes, then they're going to be overcome by us, by people that just joined the industry right now, that, you know, the good thing about coming from outside of the industry, and that's what happened to us ten years ago, is that you're not constrained by the self impulse constraints that we put on the industry, which is really annoying. So that's a good thing. You'll see more good ideas coming. [00:38:16] Speaker B: That's fantastic. Thank you so much, Joao, for sharing your insights with us. And I think we'll need to have another conversation in the future. That's just a fact. Is there anything that you think we're missing in our conversation about Tower LLM, about the other technology solutions that unbabel has that might be worth mentioning before we go? [00:38:38] Speaker A: No, I think we cover most of it. I think it's like QE is the. I think QE is the really important thing for this transition task, like the ability to understand data, because most data sets are very polluted. I would love to have another conversation and see how the world involves, because I do believe that we'll see significant changes over the next year. But I think we covered a lot of material. [00:39:04] Speaker B: And that was our conversation with Joao Graza, CTO of unbabble. I want to thank Joao for his time and insight. My name is Eddie Arrieta, CEO of multilingual magazine, and this was localization today. Thanks for listening. Until next time, goodbye.

Other Episodes

Episode 168

July 20, 2022 00:15:06
Episode Cover

The Evolution of Airbnb’s Localization Strategy

In 2018, Airbnb began a major transformation in its localization strategy and operations, developing and executing countless decisions, projects, products, processes and improvements —...

Listen

Episode 8

January 14, 2022 00:02:49
Episode Cover

DeepScribe lands $30 million investment in transcription AI

Tech Crunch reports that DeepScribe has landed $30 million in Series A funding, primarily from Nina Achadjian at Index Ventures. Existing investors Stage 2...

Listen

Episode 72

April 19, 2022 00:05:17
Episode Cover

Bet you didn't know that much of your English is actually Dutch

Although Dutch might be spoken by only around 28 million people worldwide, the official language of Belgium and the Netherlands has left a considerable...

Listen