A patient sits in a hospital bed, a bandage covering his neck with a small opening for the tracheostomy tube that supplies him with oxygen.
Because of his recent surgery, the man featured in this marketing video can’t vocalize. So a doctor holds up a smartphone and records the patient as he mouths a short phrase. An app called SRAVI analyzes the lip movements and in about two seconds returns its interpretation—”I need suction.”
It seems like a simple interaction, and in some respects, SRAVI (Speech Recognition App for the Voice Impaired) is still pretty simplistic. It can only recognize a few dozen phrases, and it does that with about 90 percent accuracy. But the app, which is made by the Irish startup Liopa, represents a massive breakthrough in the field of visual speech recognition (VSR), which involves training AI to read lips without any audio input. It will likely be the first lip-reading AI app available for public purchase.
Researchers have been working for decades to teach computers to lip-read, but it’s proven a challenging task even with the advances in deep learning systems that have helped crack other landmark problems. The research has been driven by a wide array of possible commercial applications—from surveillance tools to silent communication apps and improved virtual assistant performance.
Liopa is in the process of certifying SRAVI as a Class I medical device in Europe, and the company hopes to complete the certification by August, which will allow it to begin selling to healthcare providers.
While their intentions for the technology aren’t clear, many of the tech giants are also working on lip-reading AI. Scientists affiliated with or working directly for Google, Huawei, Samsung, and Sony are all researching VSR systems and appear to be making rapid advances, according to interviews and Motherboard’s review of recently published research and patent applications. The companies either didn’t respond or declined interviews for this story.