A brand new peer-reviewed research explores a complicated strategy to silent speech interface (SSI) know-how, providing a strong answer for communication in high-noise environments.
The analysis crew at Pohang University of Science and Technology included a wearable multiaxial pressure sensor with artificial intelligence (AI) to decode and reconstruct speech with out counting on sound.
The sensible functions of this know-how could also be extra widespread than anticipated. A good portion of the worldwide workforce operates in environments the place communication tools is restricted because of noise and different environmental components. Jobs similar to building, navy operations, and emergency response situations usually face communication challenges because of excessive acoustic interference.
Existing SSI applied sciences—similar to electroencephalography (EEG), floor electromyography (sEMG), and single-axis pressure sensors—have extra limitations than anticipated, together with discomfort, restricted speech seize, and invasiveness. However, the brand new system designed by Pohang University addresses these points head-on.

At the center of this know-how is the Computer Vision-Based Optical Strain (CVOS) sensor embedded in a versatile neck choker. The sensor makes use of a gentle silicone substrate with high-contrast micromarkers, mixed with a miniature digicam, lens, and LED illumination. This setup permits it to precisely monitor each the magnitude and course of throat muscle actions throughout speech. Unlike typical sensors, the CVOS captures two-dimensional pressure maps, enabling a extra complete illustration of complicated muscle dynamics.
The analysis crew stories that the sensor demonstrates distinctive efficiency, with a excessive gauge issue of three,625, minimal hysteresis, and robust linearity. It can detect extraordinarily small deformations and stays secure over greater than 10,000 use cycles. Notably, the know-how maintains accuracy in environments with noise ranges as much as 90 decibels, making it well-suited for real-world functions.
The knowledge is then processed via an AI-driven pipeline designed for speedy and correct speech decoding. This strategy permits the popularity of each localized muscle actions and broader speech patterns within the person.
One of the system’s most vital options is its capacity to reconstruct a person’s distinctive voice utilizing little or no coaching knowledge—roughly 10 minutes of recorded speech. Currently, the system focuses on recognizing the NATO phonetic alphabet, a standardized set of 26 phrases designed to scale back ambiguity in vital communications.
Laboratory testing has proven promising outcomes. The system achieved 85% accuracy below managed circumstances and maintained robust efficiency in noisy environments, even throughout high-intensity situations similar to rifle firing. It may leverage minimal coaching knowledge via quick fine-tuning strategies.
Beyond industrial and navy functions, the know-how exhibits robust potential in healthcare. It affords a non-invasive communication methodology for people with speech impairments, together with those that have undergone laryngectomy procedures.
Future work will goal to broaden the system’s vocabulary, enhance resistance to motion-related artifacts, and develop a extra refined and wearable design.
This research was printed in Cyborg and Bionic Systems.
Chrissy Newton is a PR skilled and the founding father of VOCAB Communications. She at present seems on The Discovery Channel and Max and hosts the Rebelliously Curious podcast, which might be discovered on YouTube and on all audio podcast streaming platforms. Follow her on X: @ChrissyNewton, Instagram: @BeingChrissyNewton, and chrissynewton.com. To contact Chrissy with a narrative, please e mail chrissy @ thedebrief.org.