Technology is advancing at a high pace, with Artificial Intelligence (AI) playing a pivotal role. However, it’s worth noting that certain AI tools, such as Google Translate and Siri, currently face limitations when it comes to Kinyarwanda. The issue excludes a significant portion of the Rwandan population who are not proficient in foreign languages.
ALSO READ: How new AI infrastructure will preserve Kinyarwanda
"Instead of these technologies delivering on the promise of bridging the gap towards access to information for everyone, they end up doing the opposite, which is increasing the gap by handling information for a select few, and now taking away information access for the majority of other speakers,” said Audace Niyonkuru, the Chief Executive Officer of Digital Umuganda, a Rwanda-based AI company on a mission to enhance access to information and services in local African languages.
Recognising the challenge, in February 2019, they started creating language-specific AI infrastructure to ensure that the tools were readily accessible for people, particularly innovators and researchers as well as persons with visual disabilities.
ALSO READ: Rwanda to become hub for AI research in Africa
According to Niyonkuru, they specifically focused on building datasets including those for machine translation, speech recognition, and text-to-speech capabilities, all designed to cater to the Kinyarwanda language, which he said was often underserved by existing technologies.
To achieve this, they initiated digital events at universities and public spaces, involving over 1,000 contributors from across the country. They also collaborated with various institutions, including media companies like DA Media as well as the Rwanda Cultural Heritage Academy (RCHA).
ALSO READ: Kinyarwanda’s hope for survival lies in our tongues
"These collaborations helped us gather datasets and information, especially for words and concepts lacking Kinyarwanda equivalents,” he said, emphasising the role of RCHA in continuously providing data and working on dictionaries for various sectors.
Niyonkuru highlighted the critical role of the data in enhancing their datasets, ultimately illuminating the process of converting a dataset, serving as infrastructure, into a fully functional tool.
"For translation, what you’re going to feed the machine is a lot of sentences in Kinyarwanda. You’re going to feed, line by line, the translations in English that were done by the contributors. What it’s doing is learning how to translate. The more data it’s provided, the better it becomes at doing this,” he shed light on the process.
Tackling the way their infrastructure helps in the preservation of Kinyarwanda, Niyonkuru highlighted that if technology is not available in Kinyarwanda, people will have to adapt to using foreign languages that these digital tools are designed in, thus "integrating Kinyarwanda into technology is a way to facilitate language preservation and ensure that future digital tools are accessible in local languages.”
Niyonkuru highlighted their commitment to ensuring data collection accuracy, explaining that they paid special attention to regions where distinct accents were prevalent, such as the northern part of Rwanda.
Furthermore, he said, they ensured gender balance within the collected data sets, declaring that an imbalance in contributors’ genders could lead to bias in the data, potentially favouring one gender over the other in speech recognition systems.
When asked about challenges, Niyonkuru pointed out data availability as a historical issue that is gradually being addressed, as well as the need for both private and public sectors to fully understand and adopt these infrastructures.
"I think what ChatGPT (popular generative AI tool) has done is that it has brought AI into the face or into the eyes of many users. And moving forward, we’ll be able to accelerate access to these services easily,” he said.
To ensure the sustainability of the datasets and ongoing projects, Niyonkuru said they are partnering with entities like the Centre for the Fourth Industrial Revolution in Rwanda on various AI tools.
Digital Umuganda is also expanding data collection to other languages, including Shona in Zimbabwe. According to Niyonkuru, they seek to publish the datasets by 2024.
Digital Umuganda’s datasets for Kinyarwanda are free to use to facilitate innovation and accessibility to information, according to Niyonkuru. He encouraged interested parties to reach out to them or visit their website for download options.