Episode Transcript
[00:00:00] Speaker A: Torul Koweeson, Powering the Modern World with Unicode Interview by Cameron Rasmussen if you carry a phone, wear a smartwatch, or use a computer at work or home, you have an unsung hero that makes it all possible. That hero's name Unicode, the text encoding standard managed by the Unicode Consortium, is what guarantees that the characters and scripts displayed on your digital devices function properly in your language of choice. A pivotal technology for language professionals of all varieties, Unicode is the workhorse that gets it done, and it's managed by the leanest of lean teams, supported by an army of contributors from varied professional and cultural backgrounds. Torul Kowisen, chief executive officer of the Unicode Consortium, works at the heart of the international coalition that keeps technology working for everyone, whether they be casual users, programmers, language workers, or anyone in between. Multilingual spoke to her about this essential work that makes the world go around. How did you first become involved with Unicode? What was the career path that brought you this direction?
[00:01:11] Speaker B: I have a master's degree in international business, so I thought I was going to work for IBM in a cubicle somewhere. But when I was working for a software development company, I ended up helping the owner's wife develop a documentary film about language and cultural preservation in native communities in the United States and Canada.
From there I worked in academia and for an international non governmental organization.
That NGO was the operational home for the Internet Engineering Task Force, which is the standards body for the Internet. I also work for a company called NewsEdge, where I ran one of the product teams. That company was ultimately acquired by Thomson Reuters. When I saw the Unicode job posted on LinkedIn, what resonated with me was that it brought together so many aspects of my professional experience and personal interests. I have a multicultural background, having been born in India and grown up in Chicago and New Orleans. I I'm also very passionate about board governance and working with mission driven organizations. During the interview process, what attracted me to the Unicode position was the opportunity to apply fundamental business principles for a greater good. When I spoke with Mark Davis, one of the co founders, he explained how Unicode was created at the dawn of the Internet to address people's need to communicate online in languages other than English.
Having that vision over 35 years ago and seeing where we are today is truly remarkable.
[00:02:43] Speaker A: Could you briefly go over the history of Unicode?
[00:02:47] Speaker B: If we go back to 1991, which feels like eons ago, that was the year HTML and the World Wide Web were first publicly mentioned. It was also the year Unicode was incorporated and the first technical Release, Unicode Standard Volume 1 was published.
This initial version included commonly used scripts like Arabic, Armenian, Bengali, Georgian, Greek, Gujarati, Han, Katakana, Tamil, Telugu, and tie. Then in 1997, Microsoft Office 97 with Word, Excel, and PowerPoint became the first major business productivity suite to support Unicode, marking a significant milestone in its adoption. Over time, the Unicode Consortium added various other projects and products. Two notable examples are the Common Locale Data Repository and the International Components for Unicode.
These are embedded in every browser and operating system. They enable critical functionalities such supporting bidirectional text the scenario where left to right languages like English are used interchangeably with right to left languages like Arabic or Hebrew managing text boundaries for word and sentence positioning, wine wrapping and so on and handling currency and date formats, which differ between countries. Another key milestone came in 2009-10 when emoji were added to Unicode. Initially, some member organizations pointed out that emoji were gaining popularity in Japan, while others dismissed them as a passing fad.
Eventually, Unicode integrated emoji into its standards, and today 92% of digital device users use emoji daily. We often talk about what Unicode produces, but what really stands out are the values that guide it. These four values form the DNA of Unicode. Local solutions require global collaboration. Internationalization respects and empowers users. Interoperability across platforms supports you and the greater good and transparency and open source helps ensure reliability, security, and stability.
If Unicode falters, it affects everyone, which is why these values are so deeply embedded and why we can take for granted how seamlessly it works.
[00:05:15] Speaker A: How does Unicode impact the language industry?
[00:05:18] Speaker B: Unicode plays a crucial role in localization and translation by creating a unified system for handling text and data across languages, which supports efficiency and accuracy. For example, Unicode offers a comprehensive character set and cross platform compatibility, aiming to support nearly all characters, symbols, and scripts used in written languages worldwide.
This is vital for translation companies and localizers managing multilingual content. It allows them to handle diverse languages, from Arabic and Chinese to Cyrillic and Tamil, without worrying about encoding issues. Another example is globalization and scalability.
Unicode allows companies to globalize their products more easily by enabling seamless scalability across languages.
This helps language service providers expand their services to new markets without facing technical challenges. Additionally, Unicode facilitates better integration across software, websites and documents, particularly for global products that require localization in multiple regions.
[00:06:26] Speaker A: How does Unicode work to bridge language gaps and facilitate international communication?
[00:06:33] Speaker B: Mark won't say this because he's humble about his accomplishments, but the Unicode character encoding is used by almost every bit of software that uses any text, and the Unicode locale data and programming libraries are used by all major operating systems and browsers, and through them by almost all major apps. It's on tens of billions of devices, ensuring interoperability across all platforms. It's how we connect with loved ones and colleagues, and how we sort and search text across languages. Unicode enables us to display formulas and scientific text, reserve flights, shop online, ship packages across borders, check exchange rates, and adjust to different time zones. Take a company like Airbnb, for example. Someone in Germany searching for a place in Thailand can do so seamlessly because the Thai host can list in their language and the German user can search in theirs. That's just one example of how Unicode supports and facilitates global communication.
[00:07:35] Speaker A: Who all contributes to making Unicode function? Could you tell us about the operational structure?
[00:07:42] Speaker B: Unicode supports tens of billions of devices while operating with fewer than the equivalent of three full time employees. We run very lean, and while that can be both a plus and a challenge, it's a testament to the incredible teamwork involved and the it takes a village mentality that I observe daily.
Software programming libraries and Architecture for Internationalization don't write themselves.
It all happens because of contributors, volunteers and institutions.
And it's not just internationalization engineers involved in Unicode's work. It's also linguists, program managers and language communities. We've started engaging more volunteers to help with community outreach, marketing and social media to raise awareness about what Unicode does.
Over the years, hundreds if not thousands of people from academia, government and tech companies have contributed to what we all use today. For example, Bloomberg recently joined and brought in use cases from the financial services industry that weren't previously part of Unicode's corpus. Companies and organizations realize it's much more efficient and effective to collaborate with Unicode rather than handling everything in house. This ensures interoperability across platforms as well as reliability and security. We continuously see new work coming in, all aligned with the mission of Unicode.
[00:09:10] Speaker A: Could you tell us about how Unicode continues to evolve and how it aims to support even more languages?
[00:09:16] Speaker B: Unicode was founded on the premise of supporting all languages.
In 2023, we formalized a longstanding work stream into a new technical working group called Digitally Disadvantaged Languages, aimed at bringing lesser known or long tail languages into the digital era. This work is unique in that it focuses on equipping and empowering language communities to be part of the solution, engaging them from the ground up rather than taking a top down approach. Over the summer we collaborated with two interns from Stanford University on a project called Unicode Vertical Bar begin working with the DDEL team to create a blueprint for Engaging these Communities Another ongoing effort, which has been part of Unicode since the CLDR was introduced, focuses on improving how text is handled across different locales. Hundreds of contributors, often native speakers, participate in this work, bringing their expertise and understanding of language nuances. For example, discussions might involve how a particular city is referred to or how changes in language are implemented, such as Turkey's official change to Turkey. These types of discussions happen regularly within the CLDR group.
[00:10:32] Speaker A: How does Unicode operate to allow all stakeholders to weigh in, especially considering the number of people who depend on it every day?
[00:10:40] Speaker B: Unicode is a nonprofit organization that operates as an open source, open standards joint development organization. We have four core streams of work, including cldr, icu, icu, forex, and utc, each with its own working groups. On the operational side, we focus on supporting the technical mission, including building and engaging communities. We are fully supported by members and donations. Our organizational members include well known companies like Google, Apple, Microsoft, Meta, Translated, Airbnb, and Netflix.
We've also started to attract new members with specific use cases such as Bloomberg and the Wikimedia Foundation. In addition, we receive support from individual members. We also fund our work through donations, particularly via our Adopt a Character program, where anyone can sponsor one of the more than 154,000 Unicode characters, including emoji at various levels.
The funds help support our work with digitally disadvantaged languages. For example, we have used funding to assist with the Cherokee keyboard and support projects for languages like Rohingya. A good example of the type of work our members bring to us is person name formatting, which ties into issues of identity and culture. In some regions, like South America, individuals may have four or five names, while in other cultures people may have only one name. Our members asked us to standardize person name formatting to reflect these cultural differences. I can personally relate to this. For my high school and eighth grade graduations, I had to reduce my middle name, which consists of two words and 23 letters, to just five letters.
Instead of representing my full name, which honors both my father and grandfather, it was shortened. That's why I appreciate Unicode's work on identity and person name formatting. It helps preserve and respect individual identities.
[00:12:48] Speaker A: With all the technological shifts happening today, where do you see the future heading for Unicode?
[00:12:54] Speaker B: When discussing this with Mark and others, it's always useful to ask, has a situation like this come up before the Internet was one such example. Generative artificial intelligence is now the focus and metadata was a big topic a few years ago. Unicode's approach has been consistent, supporting the broader tech ecosystem by updating its work as needed. When Unicode started In the early 90s, the Internet wasn't well known, understood, or standardized. As the Internet became mainstream, Unicode's work evolved in parallel to support it. We've seen this evolution in collaboration with other standards groups like the World Wide Web Consortium and Ethnic, and we'll continue to see how Unicode's role develops in emerging technologies like Genai. Traditionally, innovation precedes standardization.
People and organizations have to agree one that interoperability has a high value, and then two to work together once standards are in place. That provides a foundation that fosters even greater innovation.
There is also the possibility that Unicode could leverage AI in our own coding and data generation. I recently attended tas, a multilingual AI conference in Albuquerque, where I discussed this with attendees and keynote speakers deeply involved in the field.
One of the main questions was what role do standards bodies like Unicode play in Genai? While we've started the conversation internally, we're still exploring and trying to understand where this might lead for you personally.
[00:14:36] Speaker A: What keeps you engaged in this work? What keeps you curious and excited to tackle each new day?
[00:14:43] Speaker B: The quality, caliber and passion of the people I get to work with daily, whether technologists, linguists, or the chair of the Emoji Standard and Research Working Group, is truly inspiring. Everyone is deeply committed to the mission and I feel privileged to be a steward alongside them. Another important aspect is working on something so meaningful. In past roles I've been part of product teams where projects might disappear or companies are acquired and things change. But Unicode is a constant. Regardless of what happens with any of us, Unicode will always be there, supporting everything we do online. That's powerful, and as a steward, I'm excited to help take it to the next stage.
[00:15:29] Speaker A: What about outside of the office? I'm sure it's easy to see the effects of the work you do everywhere.
[00:15:36] Speaker B: My mother recently moved in with us and it's meaningful to see how Unicode impacts her. I love that Unicode is behind the Gujarati keyboard that allows her to communicate with family in India.
I also love when my daughters and their friends encounter Unicode in their studies and reach out to me. It reinforces just how deeply embedded Unicode is in everything we do. I feel like life is good. There's so much positivity in the world and Unicode is part of that. Even though I live in small town New Hampshire, I get to witness the impact of Unicode in all its different iterations and interactions, which is pretty cool. I don't know if that answers your question, but I get energized by seeing people engage with Unicode. I'm also excited about our in person event, the Unicode Technology Workshop. In my role, it's hard to completely disconnect. If I see something working well or notice an area for improvement, I can propose and support changes. It's also about leading by influence, which ties back to why I loved working in documentary film at the start of my career. Back then the focus was on leading with limited resources and authority, getting people behind a shared vision.
I see the same dynamic at Unicode. It's about bringing more contributors on board and leading with influence, continuing the inspiration that Mark and others have built.
[00:17:00] Speaker A: Is there anything else you want to add?
[00:17:03] Speaker B: I just recommend that everyone take a moment to pause and recognize that when we check the time or exchange currency, we shouldn't take it for granted. It's made possible by so much work behind the scenes and it's essential for all of us, both personally and professionally. And of course, a big round of thanks to our members and contributors around the globe. Finally, if you are personally interested or your organization is, reach out to join us on our mission. Become a member and consider adopting a character. It truly takes a village.
[00:17:37] Speaker A: This article was written by Cameron Rasmussen, Senior writer and Editor for Multilingual Media. Originally published in Multilingual Magazine, Issue 233, October 2024.