Translated’s DVPS: Context Meets Cognition

[00:00:02] Speaker A: All right. [00:00:03] Speaker B: And we are here in Rome, Italy, with the one and only Marco Trombetti, CEO of Translated. It seems that we're on vacation. Marco, can you tell us something about. About this amazing place? [00:00:22] Speaker A: You know, it's in a certain way, it's vacation work for me is vacation is. I like it. But here you are at the PI Campus, the headquarters of Translator that is made of eight villas in the center of Rome that we converted into a nice office space and with pools, massage, personal trainer and everything. Young talent need to express their potential. [00:00:48] Speaker B: To express their potential and have no excuses, had to have amazing work done. And I've heard some great stories about this campus. How did it happen? How did we get to be here? [00:01:00] Speaker A: Well, no, the campus has been designed. We use a design and architecture to solve one problem. And the problem I wanted to face in Europe and in Italy also is the fact that here people tend to have more fear of failure compared to Silicon Valley. And so I wanted to create a new culture. And so I said, you know, I'll design a place and I call it PI Campus that looks like more a university rather than an office. And also it looks like more in house rather than an office. And these two things to architecture, you influence really the thinking of people. So this removed the fear a little bit. [00:01:43] Speaker B: And of course the concept of the campus is very interesting. In America, they use it very often. What was the thinking behind having a campus? So that it's a learning place, it's place to incubate as well. [00:01:57] Speaker A: Sure. It's the idea that it looks like more a research center, a place where you can try things and you can fail and you will not be stressed by the failures, because the failures are always well accepted. And also there is also part of our culture, you know, we in the us, in Silicon Valley, the right place to do startup is a garage. We're Italian, we have higher ambition and quality of life. And so we do it in luxury villas. [00:02:31] Speaker B: And you see, the great element of being here with you is seeing some of those researchers actually meeting today, July 4th. A big hello to our American friends who are celebrating their independence. But right now we're in the middle of a conversation about dupes in here in translated here in Rome, yesterday we had the kickoff of the DUPES project. Can you tell us a little bit about dupes? Those that would read it will read dvps, but we are pronouncing it dupes. Could you tell us a little bit about that? [00:03:07] Speaker A: Sure. In Latin, the V becomes a U. In the Latin, Alphabets. And so there is 70 researchers today here in Rome. I'll say some of the best researchers in artificial intelligence in Europe, from Oxford, ATH Zurich, apfl, FBK in Trento and many more universities. We raised very recently 29 million Euro of funds, research grants, just to start this new project about the next generation foundational mobile. Not trying to attack some of the limitations that LLM have today. And the project name, DUPES stand for Diversibus vis plurima solvo that in Latin. And we wanted not to get credit to Latin for this, because we are in Rome, means in different ways I solve different problems. But also the funny things behind the name is that we added a little glitch into it. So diversibus, in fact is not the right translation in Latin should be diversis. But there was one person in history that made that mistake, and we celebrate mistakes here, and it was Dante Alighieri. They used diversibus all the time. So, you know, we embrace the error. We, we love Dante and we wanted to add that little glitch to our make. [00:04:43] Speaker B: And many around the world scores love the great work of Dante Alighieri. And it's a great little reminder of what you are just mentioning. Mistakes will get you there. In fact, we were having a conversation yesterday with Sebastian, your director of AI here at Translated, and one of the things that came out of that conversation was the understanding of what this project means for language. Could you tell us a little bit about what you see is actually going to do to language not only for Europe, but for the world as a whole? [00:05:17] Speaker A: Yeah. Let me remind everyone that human language is the most profound manifestation of human intelligence. It's the most human thing we have, the hardest thing for machine to learn. So by solving the language problem, we are approaching artificial general intelligence faster language expose you to the real problem. You cannot make mistakes, especially in translation. The model needs to truly, deeply understand the reality of the world. The representation of the world needs to be perfect in order to generate a translation. And that force you to go dig, dig, dig in detail. And so here, with many years in translated, we added very big competitive advantage. We do see problems before the others, because the linguist, they interact with AI so much and they are so professional and they cannot tolerate errors that allow them to see things that other researchers in the world typically don't see yet. And so we identify certain problems that LLM have today. And we want to solve those problems. And know in Lara, our previous project, the current project, great technology, but we started to create the best possible Outcome on top of LLM. And we see already that if we want to make progress again in three years, we will need to solve other things, you know, and there is multiple of these things that if you want. [00:06:47] Speaker B: We can go and we can definitely get into that. This is a great example of cooperation and collaboration. You talked a little bit about the organizations that are involved in these amazing projects. 70 researchers. Can you tell us a little bit about the institutions and those that are coming together to make sure that this is a success? [00:07:07] Speaker A: Yes. So, I mean, same thing that what we did with the MATECAT project and the Model MT project. They both started as research program. The idea we have at translated is very simple. Now at that time when we started MakeCat, there was a thousand researchers working in machine translation. And so I said, even if we create a team of 100 people or 10 people, what's the probability that we will solve the problem? It's 1 10%. So let's partner instead with the research institutions and create a very collaborative environment so that we can partner with many people, we can collect the ideas and then implement the idea of the entire community. And the best way to do it is that we've raised money for research and we gave back this money to researchers. And with a very precise role, translated was able to work super hard to try to define the problems very well. And the researchers were working hard to find a solution to those problems. The opposite would be a disaster. But this cooperation in this way was very, very effective. And this is what we did in Dublin. We look at in Europe to see who has been publishing the best research papers in artificial intelligence in the field of language to what we needed. And we went one by one and asked them if they wanted to join one of the largest research projects in Europe on this topic. And obviously when you team up with great people, more great people want to join. And if you have the best people, very often things happen. [00:08:46] Speaker B: And that snow effect is something that we can definitely see. I've had the opportunity to meet amazing people from all around Europe that are part of this project. And you were rightly so mentioning that a big part of it was properly defining the problem, which saves a lot of time for you. If you are saying don't define the problem, then you have many other issues. Could you tell us a little bit about the problem definition then? How that translates into the use cases where you're looking to also have an impact? [00:09:15] Speaker A: Okay, so let me say that the, the biggest use case for dupes is obviously language translation. Okay. And obviously dupes will also work in cardiology and space observation and robotics. Just because if we solve those problem in language, you can also do these other things. But obviously no, the main focus for us is all around language deflation and the problems that we see today that we want to solve. And it is based research. So it's not that we have 100% chance of success in this, but the problems that we see and we would like to try to solve are a few. Number one is still we're very limited in context. And yes, LLM went to 1 million token context. But we human, we don't rely on text as context, we rely on all the senses as contest. So if you take, you know, I have this friend is a great voice actor, best dubber here in Italy. When he's dubbing movies, he needs to look at the scene. He needs to look at the face of the actor, the voice of the actor, the background noise, the context in which everything is happening and also what happened in the movie all before the scene in order to try to interpret the scene them in the right way. We cannot expect machines just with a small string of a sentence to try to give you that output. If you don't provide the proper context and to manage the context. We need to invent new techniques. We need to invent universal tokenizers. We need to invent machines that are able to capture information at a byte level. It means by the stream of data is coming. And you don't have to pre process the stream of data as we humans do. It's not possible that I have to teach machine how to process an image, a video and a sound and a sensor. The machine needs to learn these things independently. So we need to provide more context and we need to invent the technology behind also we need to invent a new way of reasoning. So today reasoning works in a sequential way for LLM. So the best performing model are called a chain of thought model. They will do something that they interact with themselves and then they output again and output again and you hope to come to a conclusion. This is very inefficient. And I give you an example. This is the example we always do with Renato. Bernato said if I say in Italian, tre parole non sei solo and then you want to translate this into English, you will say three words, you are not alone, these are four words. And if you ask an LLM to do this translation, it will always make the mistake. And if you use chain of thought, it will try to correct that. But just by chance it Will get to the right result. It does not is not able to really use multiple models in the same time to try to converge as we do. As I'm translating for you, I'm counting the words and adapting my translation. I will say there are four important words. You are not alone. So I can do that because my brain is able to do two things at the same time. And today we don't have that. The most similar thing we have to this is something called mixer of expert, which is just a routing. So basically the model is deciding that he uses certain part of the brain of the artificial intelligence to do something and certain other to do something. But these two things don't interact while decoding. So the reasoning needs to go from sequential reasoning, agentic reasoning, if you want to truly parallel reasoning inside the same brain. And that's one thing we need to solve and the last thing. And it's probably the most important thing. And it was I think why we were successful with LARA is that because 15 years ago we understood what data would become important 15 years after. And that was the interaction between professional translators and AI collecting those corrections. The amount of time, what they changed that rich information at document level, not a sentence level, we knew was going to become important. And that's what we did. We collected the data. And in dupes today we know that that data, the past data learning is becoming no longer the important data to train on. We need it. It was important to reach a certain level. But if we get want to get to the next level, we need to implement a totally new form of learning for machines. Not supervised, not unsupervised, not reinforcement learning. We now need learn by doing. As machines are approaching the capability of performing tasks, they need to learn while doing the task. They try to do something, they fail. They learn what not to do. They try to do something, they succeed, they learn what to do next. And strangely enough, we humans learn that way mostly. I mean, we read books and it's an important part of our educational process. But we learn by interactions with other people and with the physical world. And we need to give that capability to the machine. And this is the last most important point for dupes and is so important because basically this is allowing us to generate the data that can improve AI over the long term, Potentially surpassing human capabilities. Because if we teach the machine only of what we did in the path, you can become as good as the collective knowledge of human. But you can never ask a machine to solve something that the collective intelligence is not able to solve. If you Want to discover more about the origin of the universe? If you want to solve a map problem that no one in the world is able to solve, we need the level of reason of intelligence that is bigger than the capacity, the collective intelligence. And that is the only path for machine to achieve that. The past will not help. Is the machine working by themselves, trying to make mistakes and improving on that? [00:15:52] Speaker B: And Marco, surpassing humans sounds like a very scary bond for a lot of people, not only in our industry, but in many different sectors around the world. But as we can all see from what you're mentioning and what we can learning from all the coverage that we are doing on dupes, humans are at the center of this conversation. Could you tell us a little bit about the opportunities that humans are going to find as the DUPES project continues to evolve? [00:16:23] Speaker A: Oh, first, I have to say, in translation industry, we're probably one of the luckiest plays in this revolution. We are in a market with an incredible big Latin demand. Latin demand means that if you make translation faster, cheaper, better quality, people will buy more. There is way, way more content to be translated in the world than what we are translating right now. And so as we're improving the system, as we're helping professional translators to be more accurate and faster, people want to translate more. And the more they translate, the more they interact with other people, the more they want to do more things. And because language is so human, you know, the human intermediation for many kind of content is still there just because it's a human to human interaction. And so it's evolving with us. And we represent the gold standard. And so there is a lot of opportunities that are happening in our industry because of that big lot on demand means big opportunity for the future. And obviously the jobs will change a little bit. But I think that those translators, those translation companies that truly embrace artificial intelligence will have a very interesting future. But I also think that every change like this one is generating fears in people, is generating doubts. But I think the people that don't give up will actually succeed. [00:18:01] Speaker B: And I definitely see, and we definitely see from what we're covering, multilingual, that there are companies that instead of punishing and bleeding in panic, which we internally call it, there are some dare to be great and dare to take on the opportunity. This is a massive initiative led of course by translated. And in the conception, what we are seeing is that, yes, the element of what you mentioned, of learning by doing and the multimodality. Could you tell us a little bit about how human evolution is going to not only help understand how these systems should learn to learn, but also how it's going to adapt in the future, because we understand that humans are dynamic, and we will learn from the learning process of the machines what things we could do better. How do you see those dynamics playing out? [00:18:55] Speaker A: Yeah, first, I think we need to. We need to be more humble as humans. Okay. And so in history, you know, Earth was the center of the universe. It took us a while to understand that it was not, and not even the sun. And then for many years, we started seeing everything around us, around humans. Now, centrality of humans. It was pushed as an idea that it's a great idea, make us feel good, but we need to be more humble. We need to understand that we are a part of a big ecosystem, the universe. We are a very, very important part, but a small part of the universe. And if we understand and see this in this context, then I think we can interact with everything that is around us in a more effective way and we can be more happy about. What makes us human is the genicity of us as humans. That is truly what we should care about. It's not the amount of information we know about something or the capacity of reasoning. The best characteristic of humans. There is way, way more. And we're beautiful, we're evolving. We're never the same. And that's the beauty of humans. And that creates an incredible opportunity. Incredible, but also market opportunity. And I want to give you an example about this. So think about Usain Bolt, the fastest man on the planet ever. So the reason why Volt is so interesting, what he does, is that because he's a human that is trying to prove that by working super hard, you know, you can surpass what other people thought it was impossible. It's about, you know, moving the barrier. What is possible one step more. And that thing is not only creating emotions in all of that. And obviously we can create a car that goes faster than Bolt, even a skateboard, the bicycles, whatever. But Bolt, what he's doing in running the 100 meters is proving that we're doing something that other people thought it was impossible. That is creating a market. Not only is giving us emotion, sport entertainment is a market. And so people really confuse artificial intelligence in. In something that is trying to remove the pleasure in our work and our jobs. What we really want is AI to remove the fatigue of our work. We want to keep the sense of purpose, the emotions, and the accomplishment, the sense of accomplishment that comes with great work. And there is many, many, many things that the more AI makes progress, the more will become important and will be bigger market in the future. Usain Bolt. The Olympic Games, as we make progress with artificial intelligence, will become bigger and bigger and more interesting. So that's what I see, the evolution of human beings in front of us. And I think there is too much hype on artificial intelligence. People are thinking that, you know, is going to replace anything. It's just going to replace tasks that have economic value. Everyone is working on that because it's about economy. You will only invest in artificial intelligence that can actually solve a problem that has economic value. And that is only a small, small portion of the economy and also a small, incredibly small portion of what makes us happy as humans. [00:23:11] Speaker B: And I'm really great to hear from you talking about it from a human perspective. It takes away some of the tags, some of the let's put some of the boundaries to our thinking and definitely there is something to say about the evolution and maturity of such humility. Dupes is a European project and there's been a huge conversation about, oh no, China is so far ahead or the US is catching up, or Europe is doing this, this and that. Could you, could you deep us some words around that more human conversation and that evolution of contents? [00:23:49] Speaker A: So Dupes started as a European project for a few reasons. And one reason is that there was not a global identity, global institution that we could go after to raise the money. Okay. And if it was there that I think what we would have done. Language is a universal problem. It's a humanity problem. It's not related to borders. And so if I had the opportunity to create a consortium with European, American, Chinese and everyone in the world, probably that would be my choice. But you know, the world is not organized like that. So we have to start somewhere. And Europe is the place where language as a problem is more perceived. Also, Europe is behind in artificial intelligence. We don't have foundational model. We have only a few companies that are creating foundational models. And those models are not really on the top three of the world model. So there is a need in keeping up. And in terms of language technology and transformation especially, this is where we have the sensibility to the problem of language diversity. We have access to the linguists in only a time zone. So there is an unnatural, unfair competitive advantage in starting this in Europe. This is the right place to start a project like this. But I hope that over time we'll bring more people in the project, making it more open to Asia and the US to participate in the research we're. [00:25:23] Speaker B: Coming to an end of our conversation, and I'm sure we could talk for hours here, but language and linguists are a very important part of all these conversations. We're very clear on that. We understand as well that education around language is going to change. What do you think projects like these do to language in particular? We understand the importance of language for these projects, but this project, is it going to help institutions at pitch language? Is it going to help linguists understand the role within. And I know we talked a little bit about that already. What can we say to linguists and those that care about words and the meaning of words and where words are coming from? [00:26:05] Speaker A: Well, as I move forward with the project, with the research, you know, and I see the best translators in the world, the best linguists in the world, how they think about language, I always realize how deep the problem is and how much work there is there. And so what I recommend to all the translators to keep doing great work, go deep and deliver incredible quality and don't give up, don't become lazy and think that the AI can help you so much that you have to go. You. You don't need to go as deep. I think that the future for linguists will be great if they instead use a leverage AI lot. They became power user it, and they use it actually to step up and go even deeper in meaning and improving more and more the quality. The more they do it, the more what they're doing becomes valuable for the society and for training the artificial intelligence. And so obviously people will recognize that economic value at one point. So I think there is a lot still to do in that direction. Also, I think everyone is talking about AI as like every language is the same. Instead, we have languages that in tier one languages, the top 10 languages have, you know, have got an incredible amount of data, a lot of information, and we can reach a certain level. If I look at Swahili and I would take 500 years, I calculated to reach the same level of data that we have in French. So AI is not happening at the same speed for every language. And that requires a very different approach in how to manage the different languages. And so in different stages, translator need to cooperate and interact with AI in different way. But what I'm sure is that they should always go deeper and try to use AI to create and deliver better quality. [00:28:17] Speaker B: Marco and as a final question for those of us that are covering the story, those that are involved in the DUPES project and those that want to get involved in the DUPES Project, what would be your comment, your recommendation for us. [00:28:34] Speaker A: So DUPES is a research project that lasts four years and I think by the end of next year, so by one year from now, so let's say June, July, next year, I think we will ship the first models, the small one, the super small size of models. And out of that, based on how competitive this model are, our hope is that they will be the best model in the world. If they are, then we would invest probably 100 million to create the mid sized model in only maybe six months. And if again we're leading that category of mid models, then I think raise a billion and try to join the game of the very big models. And so as we're moving forward, we're discovering new problems. And so every research institution in the world that have idea now to solve the key problems we're attacking in Dupes are welcome to partner with us. We want to partner with them to raise more fund for the research and we want to contribute to their research with small fund that we have for the external partnership. We do have still, we do have actually already a million dollar just for interaction with other institutions on this. But I think if you want to solve this problem, you need hundreds of millions. And so probably the best approach is working together to try to raise more money to support their research and everything that is in line with the three main problems we want to solve. We're very welcome to partner. And I think for companies in the translation industry, for customers in the translation industry, I think our role at Translator is to define super well those problems. I said. And yes, we have a visibility, I think we have a very good visibility of what the problems are. But if you know about a problem that's very painful, very perceived that you would like to solve and you think that the current architecture of artificial intelligence is not able to do it, this is what we want to hear because those problems inspire us then now to find a solution in those. [00:31:03] Speaker B: Excellent. Thank you so much, Marco for your time. [00:31:06] Speaker A: Thank you. [00:31:07] Speaker B: All right, and this was our conversation with Marco Trombetti, CEO of Translated. We just talked about dupes, the kickoff of the Artificial Intelligence Initiative here in Europe. And you will hear more about this in the upcoming or years.

Show Notes

Episode Transcript

Other Episodes

Episode 30

Creative Destruction or Creative Enhancement? Understanding the impact of hyperautomation and workflow orchestration in the language industry | February 2023

Episode 77

Weekly Review — April 22, 2022 — Stefan Huyghe

Episode 98

LocWorld50 - Interview with Jill Goldsberry of Women in Localization