Bridging the Language Gap: Oxford Languages' Mission to Preserve and Digitalize

Episode 242 January 14, 2025 00:29:48
Bridging the Language Gap: Oxford Languages' Mission to Preserve and Digitalize
Localization Today
Bridging the Language Gap: Oxford Languages' Mission to Preserve and Digitalize

Jan 14 2025 | 00:29:48

/

Hosted By

Eddie Arrieta

Show Notes

In this conversation, Alexandra Feeley, Director of Market Development at Oxford Languages, shares her take on the company's mission to digitalize under-resourced languages and how they are paving the way for better localization through language data.

She discusses the importance of authoritative language resources, the challenges of working with underrepresented languages, and the role of AI in enhancing localization. Alexandra also highlights Oxford Languages' efforts to preserve cultural heritage and make digital experiences more accessible for non-native English speakers.

View Full Transcript

Episode Transcript

[00:00:03] Speaker A: In countries where English isn't a main language but a lot of people speak English, they're forced to use English for technology. Right. Because the features aren't there to the way it is for a native English speaker. [00:00:21] Speaker B: Hello, welcome to Localization Today. My name is Eddie Arrieta. I'm the CEO here at Multilingual Magazine. Today I'm joined by. Alexandra Filey is leading the market development team at Oxford Languages by solving complex NLP problems for our customers. She is right now the director of Market development at Oxford University Press. Thank you so much for joining us today and why don't you tell us a little bit more about your current transition. That's why we're here. Oxford University Press off the mic. I was mentioning to you, it's been really impressive how the Oxford University Press has engaged the localization industry. We see them at the events, we see you at the events, we see you starting conversations, proposing topics and that's really refreshing. So tell us all about it. [00:01:23] Speaker A: Yeah, well Oxford Languages is division of Oxford University Press. So something I learned joining OUP was that anybody can just slap Oxford onto the names of things and call it Oxford, but we are actually part of Oxford University Press, the languages division. And I would say the main objective of Oxford Languages is to digitalized under resourced languages. So with that in mind, localization is at the forefront of facilitating that goal. So it's been really interesting for me coming into a market development role and understanding how do you make a mission driven institution commercially viable so that like it all works hand in hand, it doesn't separate out the research and the, and the. I don't want to just say good work but because it doesn't do it justice. But yeah, everything that's mission driven feeding back to how do we support commercial partnerships as well and have it intertwined. But yeah, I mean starting this role was. It's interesting because I at Transperfect was technology director and that was more of a technical role. And now moving into business development. The answer to how's it going? Is really I've been able to see some ways that we can easily support localization better. And I think that being part of Transperfect made it really obvious to me that people aren't thinking about language data in a authoritative sense because they think they have everything they need. Like most language services providers wouldn't invest in in authoritative data because they have so much data from their own translation work, especially the large players. [00:03:15] Speaker B: I was just going to say it makes a lot of sense to think about it in terms of language data. And then when you start digitalizing under source languages is because at oup, not necessarily only Oxford languages, you experience what it is like to be a repository for categorized language terms because of course you have a, like historically you've created a dictionary and many other forms of printed press. But of course that's the ones that's digitalized. There is so much power to it. So it's kind of like already you're already in language data. You've been there for a long time now that there is a need for it. There is, there is so much you can source from it. And it got me really interested when you mentioned details under resource languages. So those under resource languages, probably many of them are endangered. We interviewed Tim Brooks yesterday and we're talking about that. And then we were thinking, well, but if we digitalize them, wouldn't that be kind of like a lease of safety net for making sure certain languages and cultures are preserved? So very interesting what you're mentioning in terms of like language data and how you are approaching it. [00:04:30] Speaker A: Yeah. And talking about Oxford Language's mission. So there is an element of preserving language, like you said, from a historical point of view, cultural point of view. It's also to think about in countries where English isn't a main language, but a lot of people speak English, they're forced to use English for technology. Right. Because the features aren't there to the way it is for a native English speaker. So when I open my phone or my email or like anything nowadays, I expect predictive text. I expect it to kind of fill in the blanks for what I want to say. I expect things to spell check. I can't imagine a world where my children have spelling tests. I mean, I know it will happen in school, but I just can't imagine it be taking as seriously as I took it when I was really young. Because you didn't have automated spell check and everything. But when you look at like Indian languages for example, or, or African languages, there isn't that same level of native digitalization. So you people cannot just, or I take it for granted, but I can just have that like extension of myself in the, in everything digital that I use. And I think that's something that we've created resources to allow technologists to develop the tools for those under resourced languages. And I see a lot of the businesses that we work with, their strategy is to creep into these markets because if you can experience something in your native language, it becomes an extension of you and it's a lot easier to relate to products and to expand your usage of things. And companies want that. They want you to be assimilated into their brand and not want to leave. [00:06:23] Speaker B: Yeah, yeah, yeah, yeah. Thank you. And thank you for sharing that. I think something else that I found really interesting in what you were mentioning is that because of your experiences Transperfect, you've come to OUP and to Oxford Languages and then you're like, okay, there are some pretty obvious ways in which this is useful in which like the philosophy and kind of like the ethos and then of course the actions of like actually going there to these under resourced languages and documenting them makes. Makes a difference. So what are some of those, what were some of those obvious things that immediately kind of like stood out to you where it's like, okay, yeah, definitely here we can help. [00:07:07] Speaker A: Well, to talk a little, to kind of recap my career, my quick career of 10 years at TransPerfect. I started out as a project manager there and my main responsibilities were assigning translation work or placing translation work with vendors for PDF medical translations. So imagine pre neural machine translation and pre project management software where you're just emailing and trying to get everything done as quickly as you can. Then it moved on to project management software, neural machine translation. Then we saw the rise of interpretive AI interpretation. Sorry. And where I started my career as a project manager, went into client services and then moved into the technology consulting and created that team at Transperfect. There were so there was like two big points in the industry that changed. And I feel like being seeing neural machine translation adoption and then seeing AI interpretation and synthetic voices and how they cut costs in localization and, you know, enhance what we can do. Those hurdles and seeing those through makes me really think like, well, whatever's next to come is only going to get better. And it's been an interesting approach to come into an organization that has foresight and nobody can tell the future. I definitely don't think we at Oxford Languages know what the next technology revolution will be in languages. No one does. But language data itself is future proof. And when you have a team of hundreds of lexicographers keeping that data up to date to match how language is used today with context from how it was used historically, I think those assets are priceless. And I guess I'm biased. Working at Oxford Languages and getting to work with this team and on projects that really make a difference, but seeing those things through and now seeing what Oxford Languages offers when it comes to AI translation, there needs to be a sense of authority and truth in some things. And, and you can call it grounding, you can call it benchmarking. There's, there's a lot of AI techniques and, and not to get too into the weeds of that, but it's, it's using a data strategy that incorporates a lot of different, different pathways and making sure that you're measuring the outcomes and doing what your clients want you to do. Right. And none of that is mind blowing, but I think it's. If you took an approach of only using your own translation memory, or you took an approach of only using your own data or open source data, you can probably achieve good enough. But with, or okay, but to get something that's truly setting a different quality standard, you need to supplement it with something else. So that, that from my perspective being, you know, in the transition was why isn't everybody using this? Because if you are able to check what you have your own data, validate it, then you've got this stamp of approval that it hits a certain standard. And I think that's something that using translation memory, neural machine translation and even LLM translation, it's needed because there's so many buzzwords and there's so many like efficiency fluff rolling around now. I mean, I haven't seen any LSP that their website hasn't changed to AI, AI, AI. [00:10:48] Speaker B: And it's super interesting because there are so many elements to that conversation of AI really LLMs or AI assisted translation or AI assisted editing, and it's pretty much all assisted. So now that we're coming to terms with that reality, it becomes really important or like, okay, how are we going to do the things that now technology allows us to do? So this is so cheap to do all these things, then we probably can get to so many other locales and then everyone hits the bottleneck, which is what you're saying, right? Language data, how wide, how deep is Oxford languages current language data scope, like how many languages does it kind of like touch and it's getting deep into. And I believe humans are the best way to do that. There is no robot that will be able to get into these under resource languages. [00:11:48] Speaker A: Yeah, well, it's funny you say that because I will say our research team is experimenting with how AI works with what they do. There is always going to be a curate curation element to our data because that's Oxford. You know, we would not put something out there that didn't meet our own standards. And there is a lot of thought and decision and debate around what words get added to the dictionary, let's say, and I will say I'm not the best person to talk about it. I won't do it justice because they do an incredible job. But our languages portfolio, I mean, it's growing and I would say we Service at least 60 languages, if not more. And you know, currently working on languages like Assamese and Catalan that, you know, either uncommon or under digitalized. And when we do those projects, there is a certain level of coverage that we command to make sure that we're representing the language, how it's spoken. Today, what goes into that is, you know, reviewing corpora, making sure that we have inflection coverage, or you know, like complete definitions, short definitions, word metadata, and all of those things feed into a ton of different use cases. And when it comes to localization, I think the more data you have, the better your outcomes will be. The faster, the less compute power and also the less time to review. [00:13:18] Speaker B: Yeah, I found very interesting what you're mentioning around leveraging language data there. Definitely you've touched upon some of the trends in the industry and is this LSPS and other type of tech companies making sure they mentioned AI? Are they thinking more about language data than they were before? Can you feel that in the conversations you're having? [00:13:45] Speaker A: It's hard to say. I would say yes because everybody is more. Not just localization. Every company is more aware of data. I think there was a business strategy curve, let's say where how you use your data and how it can make you money was one wave. Now it's how does your data and an LLM prove profitable or gain efficiencies? And that's every enterprise organization, every small organization. But when it comes to localization, I think that there is always going to be a need to make things cheaper, better, faster, cheaper to the customer, cheaper as an environmental footprint and compute power. As I said before, faster. I think faster is an understatement. Everything is instant nowadays and some industries won't tolerate that. And I could talk about the pharma industry in a minute, but. And with quality as well, your outputs are only as good as your inputs. And I don't think that, well, sorry if I backtrack. When having things take time was acceptable, quality could follow that. Now that anything that takes time, nobody has patience for. And maybe patience is the wrong word, but they don't have the stamina to wait for it. But things are moving so quickly. So you need to find a way to put quality outputs in the fastest way. And data is the way to do that. [00:15:26] Speaker B: And I've seen, and we've had conversations here with other LSPs, other technology companies, and they are also referring to this idea of, okay, how do we capture this training data? What's going to be the strategy to make sure that we go out there? It seems like you have a particular strategy to find these underrepresented languages. Do you know how the team does it, the research team? [00:15:55] Speaker A: We definitely do it in conjunction with partners. No mission driven organization really should work alone. You want to work with partners that can extend your reach. And there is a whole team at Oxford Languages devoted to new ventures. And this was really what pulled me in from a recruitment perspective was that they're able to devote time and not just pull people off. That when things hit the fan, there is a team invested in figuring out what's the next innovation that's going to be put out by Oxford Languages. And that was inspiring, you know, and made me want to come on board. So I'd say that I think the next thing is African Languages. I think the team is invested in that. I think there are so many options that it's hard to predict. But we experiment with things and we allow time for those to pan out and prove themselves or we abandon them because there is something that will better serve the marketplace or the mission. [00:16:59] Speaker B: Yeah, I also feel that same type of energy, especially with the conversation we've been having with Oxford Languages and how vibrant it's become in the past two years. So definitely we can see there. Are there any initiatives, projects you are excited about at Oxford languages for 2025? Of course, you've mentioned Africa, which is super exciting. We know there's an event happening in Zanzibar in February by the association of African Language Companies, which is going to be really great. And they've had more events now than they've had before. The community really is becoming much more visible and then there are many more activities happening, at least in the language services industry. [00:17:45] Speaker A: Yeah, I mean, I think one of my top. There's quite a few. And I'll talk about maybe some current ones and then. And then further down the line, um, we've. We've recently optimized the only monolingual Indonesian dictionary that's available for licensing. And that's amazing because it's. It is actually the only one. So in order to have, as I was saying, an experience on your devices in Indonesian, that is the same flexibility that English is for me, as a native English speaker, really makes a big difference. And that's super exciting. So I would love to see that more widely used. And then another project that is lifting off the ground is our sensitivity language data set. So this is something that's been curated with words and word metadata that are sensitive in context. So I don't want to say anything vulgar on the podcast. I know it's a. It's in a professional manner, but you could picture a word that is sensitive versus vulgar. So talking about genitals versus a slang term for them, it would be vulgar versus sensitive. Right? And having that tagging makes sense in an education space, it makes sense in a professional document space, it makes sense in marketing. It. It's just really good information to have to make sure that your leaders can message down what they want to achieve with their translation strategy, with their marketing strategy in different countries, and validate that with things like data to be able to say, can we consolidate this and see is there any risks associated with publishing something using this kind of language? Another application for the sensitivity data is words that might be not vulgar, not even a swear word, but can be sensitive in certain contexts. I'll save examples and we could always publish some collateral next. I really don't want to offend anyone, but you can imagine some words that are not swear, words that are not offensive, racist. But if they are used in context, abrasive context, then they are. And without the sensitivity data, a model might not be able to pick up that context and serve it up as well. In this context, this is vulgar or this is sensitive, or this is not appropriate for people under 18. But in another context, it would have fine meaning. Also kind of related to that is like, does it have a political sensitivity? Does it have a violence element to it? So there's a lot to this. It's really robust, and I think there's a lot of use cases for it. And I find it particularly interesting in education. I mean, I've talked about. I have two daughters, they're only two, not quite in school yet, but you want some filtering on the stuff that's out there now. And I think this is a good benchmark. This is a good authoritative tool to use to make sure that your company is achieving that. [00:20:59] Speaker B: And that's really great. That's really great. It gives a lot of texture related to the kind of work that's done behind. And I assume then for like thousands and thousands of words and millions of words in so many different languages doing this process of like tagging and categorizing and contextualizing, almost culturalizing, like terms and in Their context that becomes really. That's a lot of work. It's a lot of effort. It talks a lot about the mission that you all have at OUP and Oxford languages. Right? [00:21:35] Speaker A: Yeah. And don't get me wrong, we don't have this in every language now, but it's something that we're working on and the priorities are to make sure that, you know, in markets or places where there's elections or there's more sensitive things going on, like that's the stuff we want to work on. And we want to make sure that it's updated regularly too. Because as the we use language, it changes very rapidly. And out the back of that, how things are localized also changes rapidly because words that we would have used as a most common translation today are not the most common translation of 20 years ago because of the way we speak is completely different. [00:22:13] Speaker B: But I am wondering about privacy. Right. And in terms of how OUP and Oxford languages, how it's privacy approach from the sourcing perspective of all the language data that you provide to the integrations with the companies that you work with and also privacy to their data. Right. And if that's a conversation that we can even have, because I thought we might even have a panel in the future where we bring three of the main language data providers in the world to talk about privacy. [00:22:52] Speaker A: I mean, privacy. So we've been going to a lot of AI events in Europe and privacy and ethics is repeated over and over and over again. And you're never going to get someone that says don't worry about it. But clearly there are people that aren't that concerned with it. And the major part of my role as Director of Business Development is to ensure that we do not partner with people who do not have ethics or weed through the promises to make sure that we're partnering with ethical AI. So yeah, we could save this for another time. [00:23:35] Speaker B: Yes, we'll do that. So, Mila and Matteo, this bit will be for another panel. I think it'll be a really great conversation to have in multilingual at some point. Ali Alexandra Feeling, thank you so much for joining us today. Is there anything that we have left unsaid, anything you'd like to mention to the audience that listens to localization? Today I'm multilingual media. [00:24:03] Speaker A: Trying to think, I guess I could quickly talk about and I don't know if this will fit into what we've, you know, edit as you wish, I guess. Well, I won't say that, but try to make it fit. If it makes Sense. But I will say I think it was a struggle to leave the translation industry because I say I grew up in that industry. Right. I started when I was 22, left at 32. So everything professional on the ground, I learned in that industry. And mostly servicing our pharmaceutical customers at Transperfect and understanding how to solve their problems. So taking on this new role and moving into a sales role versus a technical, it, it was daunting. But at the same time it made me realize that solving business problems, it's a skill across the board. So to any advice of anyone listening that if you're worried about switching industries, everything from localization applies because it truly touches every piece of every business out there. And I have to say that working with regulated industries through localization, working with people who are more marketing or revenue growing savviness, all of that was something that I was able to apply in my role. And working for a nonprofit in business development, the necessity is that you're solving people's problems the right way. And I do think I learned that at TransPerfect and I think I've applied it in this role and my team is benefiting from it. And I haven't lost touch with localization because at Oxford Languages we're working to support that industry and make it faster and better quality. So if I had to add any note, it would be that. I think. [00:25:55] Speaker B: Thank you, Alexandra. And I think we all agree that OUP and Oxford Languages is now actively, actively involved in the localization industry. Language services is so big that I think we're not even capturing the degree in which language services that are not reported into our radar of events and associations is so large we cannot even understand how deep this is. [00:26:26] Speaker A: And another like initiative I guess that we've had that has, has, I would say has always been something in the books and it's something that Oxford Languages has worked on. But I validated that it is so necessary in, in the localization space is industry specific dictionaries. So let's say having a medical pronunciation dictionary or data set that improves the way that AI interpretation works because medical industry, pharmaceutical industries are not going to touch AI interpretation because there's so much risk with, with patients, there's just too much risk. So if there's anything authoritative that can come on board and that we can help optimize so that it benefits, this technology will be able to step into those, into into supporting those industries better. The medical dictionary is something that is close to my heart because I do feel like it will solve a lot of problems in localization too. So I'd love to see that widely used when ready. [00:27:33] Speaker B: Yeah, I hope so too. And I'm pretty sure it's gonna aid our conversations with artificial intelligence. I'm using Alexa right now and it doesn't activate, but it's. Thank you. Thankfully it's been trained, but it's because these type of dictionaries are really going to make a lot of things a lot easier for students, for practitioners. And then when you are interacting with whichever interface it is that we will choose in the future, having the right information from reputable sources is going to make a difference. Right now you will get like from schwartzisbluesky.com the answer is like. All right, thank you so much for that. [00:28:18] Speaker A: Also, things like World Englishes, which I think people don't often think about. So if, if English is not your native language, the way that you pronounce words and words in English are different than someone who's native. I mean, America, England, Canada, come on. My household alone, my fiance is British. I'm American. So, you know, it's about like user experience as well. If Alexa can answer you in your accent, you'd feel more at home. You'd feel like it's your AI agent is part of your family, which could be scary, but could actually be a better experience. So World Englishes is not something that's off the table either to help supplement how we all interact with, with AI. [00:29:09] Speaker B: Well, very, very exciting. I really look forward to our next catch up, perhaps at an event or virtually. But we'd love to continue knowing more about OUP Oxford Languages, how the processes are going and very interesting what's happening in Africa. We like to get involved. We have been supporting some events. So I think Multilingual, OUP and Oxford Languages could continue working together. Alexandra, thank you so much for joining us today. [00:29:40] Speaker A: Thank you. It's been great. [00:29:44] Speaker B: All right. And that was our conversation with Alexandra Thiele from OUP Oxford Languages. More specifically. My name is Eddie Arrieta. I'm the CEO here at Multilingual Magazine and this was localization today. Until next time, goodbye.

Other Episodes

Episode 166

July 18, 2022 00:03:34
Episode Cover

Creating a language access plan for an unwritten language

Translating a well-documented, widely spoken language is hard enough. Translating one that doesn’t even have a standardized writing system is even harder. That’s the...

Listen

Episode 121

May 19, 2022 00:02:59
Episode Cover

RWS shares drop 19% after BPEA abstains from takeover

RWS shares declined sharply this morning as Baring Private Equity Asia Fund VIII (BPEA) announced that it would not be making an offer for...

Listen

Episode 17

January 24, 2023 00:08:41
Episode Cover

How AI Tools Will Make Global Communication Better for Everyone

Generative AI tools have arrived, quite loudly, onto center stage, and in my opinion, this is not just part of another tech hype cycle....

Listen