An AI-trained interface records her brain activity when she tries to say words and reproduces them using the patient’s synthesized voice, who suffered a stroke.
Collaboration by Miguel Ángel Criado
Ann was 30 years old when she suffered a stroke in the brainstem, the brain base that connects to the spinal cord. She lost the ability to move her legs, arms, and even the muscles that operate her vocal cords. Now, after years of training with artificial intelligence (AI), a brain-computer interface (BCI) allows her to communicate almost in real time with her own synthesized voice. To achieve this, her head must be connected to a machine that records her neural activity through a mesh of 253 electrodes placed directly in her brain. But it is the first time she has been able to speak—even if it’s robotic and plugged in—in more than two decades.
Ann, now in her fifties, does not think the words—she tries to say them. The speech-dedicated region of the motor cortex is not damaged. That’s where the work of the group of neuroscientists, engineers, and AI programmers begins, and where one of the key differences lies compared to other attempts to restore communication ability to those who cannot speak. Other BCIs act on the specific language area while patients think of a word or imagine writing it. This new system records what happens in her brain when she wants to say “hello.”
Gopala Anumanchipalli, professor of electrical engineering and computer sciences at the University of California, Berkeley (USA) and senior co-author of this research recently published in Nature Neuroscience, explains it in an email: “It is when she tries to say ‘hello,’ without thinking it. Due to Ann’s paralysis, she cannot articulate or vocalize anything. However, the neural signal of her intent is strong, making it a reliable cue for decoding,” Anumanchipalli explains.
The decoding begins with the electrodes placed on the speech motor cortex. In a healthy person, this is where neural connections begin, traveling through the brainstem to the muscles that control the vocal tract. With this connection lost, a team of about twenty scientists from Berkeley and the University of California, San Francisco, building on several previous studies, designed a learning system based on algorithms that decoded Ann’s specific brain activity when she wanted to articulate a word.
According to Cheol Jun Cho of Berkeley, co-lead author of the study, “basically, we intercept the signal where thought becomes articulation.” In a university statement, Cho adds: “What we decode happens after the idea has emerged, after deciding what to say, after deciding what words to use and how to move the muscles of the vocal tract.” For the machine and Ann to communicate, she had to train with a set of 1,024 words presented by the system in the form of phrases (see video). They also trained the BCI with a set of 50 pre-established phrases. As soon as they began to appear on the screen, Ann would start her attempts to speak, and the system would convert the brain signal into both text and voice.
Ann had saved the video of her wedding, which turned out to be very useful. With it, they were able to choose the synthesizer’s voice just like one chooses a GPS or Siri voice. Ann told the researchers that hearing her own voice helped her connect with the machine. It is becoming common practice to record people with cognitive decline or conditions that may later affect their ability to speak, in hopes that science can restore their voice in the future.
The second major contribution of this work is speed. This BCI is not the only one that has enabled people who lost the ability to speak to communicate again. But until now, these systems were very slow. The process through which subjects tried to speak or write had to go through multiple steps. It took several seconds for anything intelligible—whether voice or text—to appear on the receiving end of the system, far too long for real and fluid communication. This new BCI significantly reduces latency.
“Approximately one second, measured from the moment our voice decoder detects her intent to speak in the neural signals,” says Anumanchipalli. For this neuroscientist, an expert in language processing and artificial intelligence, this new transmission method converts her brain signals into her personalized voice almost in real time. “She doesn’t need to wait for a phrase or word to finish, since the decoder operates in sync with her intent to speak, similar to how healthy people speak,” he adds.
To rule out the possibility that Ann and the BCI had simply learned to parrot the system’s offered phrases (even though there were thousands of possible combinations), in the final phase of the experiments, researchers had the screen display the 26 words that make up the so-called NATO phonetic alphabet. This jargon was a method initiated a century ago and adopted by the military organization in the 1950s to facilitate radio communication by spelling commands. It begins with the words alpha, bravo, charlie, delta… Ann, who had not trained with them, was able to say them with no major difference from the vocabularies she had trained on.
What has been achieved is just a small part of what’s still missing. Work is already underway for the AI to capture the non-formal dimensions of communication, such as tone, expressiveness, exclamations, questions… “We have ongoing work trying to see if we can decode these paralinguistic features from brain activity,” says Kaylo Littlejohn, also a co-author of the research, in a statement. “This is a problem that goes way back—even in traditional audio synthesis fields—and solving it would enable full naturalness.”
Other problems remain unsolved for now. One is having to open the skull and place 253 electrodes on the brain. Anumanchipalli acknowledges: “For now, only invasive techniques have proven effective for speech BCI in people with paralysis. If non-invasive techniques improve signal capture accurately, it would be reasonable to assume we could create a non-invasive BCI.” But for now, the expert admits, they are not there yet.
Miguel Ángel Criado is a Spanish science and technology journalist known for his work covering advancements in neuroscience, artificial intelligence, and environmental topics. He frequently contributes to leading Spanish publications such as El País, where he explores the intersection of science, society, and innovation.





0 Comments