Digital Umuganda, a fully Rwandan artificial intelligence (AI) firm, is building an automatic speech and voice recognition infrastructure for Kinyarwanda in partnership with Rwanda Utility and Regulatory Authority (RURA).
The technology is part of RURA’s joint initiative dubbed KinyaTech with a non-profit behind Firefox web-browser, Mozilla and the German Development Agency, GIZ, with the aim to improve the preconditions for AI implementation and utilization.
Opposed to the common text-based interface between computers and their users, ASR or Automatic Speech Recognition is a technology that uses voice inputs to facilitate human-machine interactions.
It allows human beings to use their voices to speak with a computer interface in a way that, in its most sophisticated variations, resembles normal human conversation.
Inclusive technology
With this technology, for instance, an end-user who cannot read can listen to news originally written in an online newspaper, or vocally name symptoms to a health app instead of writing them. The objective of KinyaTech is to develop an open-source speech-to-text model for Kinyarwanda.
"Digital literacy and literacy itself are still low,” Audace Niyonkuru, Chief Executive Officer of Digital Umuganda told The NewTimes.
"We are looking at how voice technology can be used to bridge the gap between digital solutions that reside on the internet and people who access that information”.
According to a report released in 2012 by the World Health Organisation (WHO), one out of every 100 people in Rwanda is visually impaired, therefore speech technology will drive inclusiveness in the digital world.
"If you look at disabled people, let’s say, blind people who cannot access information because they don’t have a way to access that information because it’s written, with speech technology, no one will be left behind,” Niyonkuru noted.
People without fingers can use a speech-to-text tool to write messages, documents using Kinyarwanda on a smartphone or a computer.
Other potential use cases
Patrick Nyirishema, RURA’s Director-General said the wider community of Rwandans and Kinyarwanda speakers will benefit from the initiative by having access to Kinyarwanda speech-to-text and text-to-speech applications.
"Broadly, the models will allow people to talk to computers or devices that can interpret what they are saying in order to respond to their questions or commands,” he observed.
For instance, he said speech-to-text applications such as conference transcription software based in Kinyarwanda can be used for those who have a low level in spoken Kinyarwanda.
Minutes of meetings or court records held in Kinyarwanda can also be directly recorded from speech to text without any handwriting and typewriting, hence removing any human errors.
AI takeover localized
The project is being run on Deep Speech, Mozilla’s open-source speech recognition engine by a fully Rwandan team with Mozilla’s expert support.
Voice datasets are being collected that are then used to train computer systems to speak and interpret Kinyarwanda through machine learning - a technology that enables computer systems to act without being explicitly programmed by learning and improving from data and patterns.
Speech and voice recognition is expected to maintain momentum globally. The growth is also attributable to the increasing demand for voice authentication in biometric security systems.
With the expansion of consumer electronics and smart devices, humans keep the quest of smoothing the interface.
This has led the world’s tech giants to develop tools such as Apple’s Siri, Amazon’s Alexa and Microsoft’s Cortana that uses voice to assist consumers in their day to day tasks. Unfortunately, none of the above virtual assistants supports a single African language.
Localization of ASR technology will pave way for Kinyarwanda into similar systems. By the end of this year, RURA promises a ready, open-source infrastructure even for other developers to exploit.