Connect to share and comment

Asia’s state-sponsored plan to destroy the language barrier

Japan, Thailand and other Asian nations see modern technology as the future of translation.

is limited and you speak very standard Chinese,” said Jianhua Tao, a Beijing-based researcher whose government-funded National Laboratory of Pattern Recognition contributed a vast supply of Chinese speech data to U-STAR. “But if you have a dialect or accent, in English or in Chinese, it will come out wrong.”

U-STAR is also a lousy wingman: in noisy environments, such as coffee shops and nightclubs, the application is useless. “You have to adapt to the system,” Chai said. “Talk slow. In a quiet place. Ask your friends nearby not to shout.”

Given all of these flaws, Tao is reluctant to portray U-STAR as a bulldozer ready to bear down on language barriers worldwide. “Sure, you can use it if you really need it. In a situation where no one can help you translate,” he said. “But bilingual people are still necessary.”

But for how long?

Tech companies' goal: speech-to-speech translation

Americans mindful of China’s rising clout — but too lazy to learn its tonal language — may feel buoyed by Microsoft’s latest advancement in translation technology. Last November, the tech giant’s chief research officer, Rick Rashid, flew to Tianjin, China, to debut a breakthrough in what he called “Deep Neural Networks” technology. His speech was delivered to a crowd of 2,000 young Chinese listeners in English. But as he spoke, his monologue was also rendered in real time into Chinese characters on a glowing screen above his head. (“The output was very accurate,” Tao said.)

“The results are not perfect ... but the technology is very promising,” Rashid later wrote on his blog, “and we hope that in a few years we will have the systems that can completely break down language barriers.”

Microsoft, through a spokesman, declined to comment on this technology’s future commercial availability “as this technology is still in the research phase.”

Google, which already offers a speech-to-speech translation application for Android phones, is also hinting at a looming breakthrough.

“We imagine a future where anyone in the world can consume and share any information, not matter what language it’s in,” wrote Franz Och, a senior Google Translate researcher, in a 2012 blog. “We want to knock down the language barrier wherever it trips people up and we can’t wait to see what the next six years bring.”

Another competitor is Jibbigo, founded by a professor at Pittsburgh’s Carnegie Mellon University, home to a reputed communication technology lab. Unlike its counterpart U-STAR, which requires an online connection, Jibbigo sells apps that translate language pairs offline for $4.99. (The online version is free.)

Several advancements have set the stage for this rush to debut universal speech-to-speech translators. Only in recent years have well-to-do people toted pocket computers — i.e., smart phones — capable of the processing power these applications demand. And only in somewhat recent years have non-English language labs collected enough recorded human speech to underpin digital translators with data.

“Mandarin speech recognition technology really didn’t develop much until the late 1990s, the 2000s,” Tao said. He credits Siri, Apple’s “intelligent personal assistant” with popularizing speech recognition in China. (Yes, Siri speaks Mandarin.)

“Even with Siri, maybe 40 percent of the replies are either totally wrong or not all that correct,” Tao said. “But even if it’s not correct, people still love it and have a laugh.”

The future of digital translation

Given the complexity of human interaction, digital translators are more likely to chip away at language barriers slowly rather than kick them down all at once.

Once novel, but now ubiquitous, airlines’ service hotlines introduced speech recognition technology to consumers years before U-STAR and Siri.

“Of course, they are extremely context dependent,” said Mikael Parkvall, an author and linguistics professor at the University of Stockholm. “You can’t ask, ‘What do the stewardesses' uniforms look like? Can I steer the plane?’ It won’t get it at all.”

Today’s translators, he said, show much improvement over the previous decade’s offerings. “Depending on the language you use, they’re decent,” Parkvall said. “I am not unimpressed.”

But the road to humanizing speech-to-speech translators requires processing hours upon hours of real-life humans talking. This is why U-STAR records its users' speech. This is also why Google — which translates in one day the same volume of language human beings translate in a year, Och claims — appears well poised for future translation breakthroughs.

From helping travelers’ figure out how to request the bathroom key, digital translators will likely move on to a more accurate deciphering of technical discussions and perhaps the language of international trade. But real-time translation of what linguists call “spontaneous speech” is “10, 15, maybe 20 years away,” Tao said. “Any translation of real dialogue, like you and me talking right now, that will take a very long time.”

“In speech, there are layers upon layers upon layers of intonation patterns that affect meaning,” Parkvall said. “All the computer can hear is a given frequency. These terabytes of speech, to the computer, that’s just hertz and milliseconds.”

“Now consider irony, for instance, or humor,” Parkvall said. “It’s hard enough to make some humans grasp that. How do you ever make a machine grasp it?”