Asia’s state-sponsored plan to destroy the language barrier

Chinese President Hu Jintao is helped by an aide to set up a headphone for translation as Indian Prime Minister Manmohan Singh looks on. If a consortium of Asian nations have their way, machine translation will someday take the place of human translators.</p>

Chinese President Hu Jintao is helped by an aide to set up a headphone for translation as Indian Prime Minister Manmohan Singh looks on. If a consortium of Asian nations have their way, machine translation will someday take the place of human translators.

BANGKOK — Attention, forward-thinking American parents forcing your children to learn Chinese: a consortium of Asian governments is pouring cash and brainpower into technology that might render your kids’ budding language prowess useless. And they want to give it away, for free, to anyone with a mobile phone signal.

With little fanfare, a league of linguists hailing from almost every sizable Asian nation has released an iPhone application called “U-STAR.” Their goal: disseminating technology that can ingest speech in any widely spoken language and regurgitate a translation on the spot.

“That’s the idea of U-STAR,” said Chai Wutiwiwatchai, head of a language lab at Thailand’s National Science and Technology Development Agency. “And it’s open source. Free of charge. No problems with intellectual property.”

The software is the culmination of seven years of research funded by more than 20 governments. Chai’s state-sponsored lab began building U-STAR with its Japanese counterpart, the National Institute of Information and Communications Technology, in 2006. Institutes from 24 countries, including European nations, have since signed on. (The US isn’t one of them.)

Its buffet of languages include the major-league tongues — English, Spanish, Mandarin Chinese, Hindi and Arabic — as well as Mongolian, Malay and Dzongkha, the native language of Bhutan.

In the Star Trek sagas, this sort of “universal translator” device isn’t invented until 2151. But U-STAR and its competitors — which include Google, Microsoft and a young US start-up called Jibbigo — aren’t waiting for the next century.

In the globalized information age, language remains a heavy shackle. Though we now have the ability to instantly zap communiques across the globe, the way we speak still reflects ancient geography: the rivers, hills and seas that isolated groups of humans and bred distinct languages.

This is particularly troublesome in Asia, a rising engine of global growth. The romance languages of Western Europe at least share root words and a common script. In South America, Spanish dominates. But Asia is so dense with varied languages and scripts that businessmen from neighboring countries often hammer out trade deals in their only shared tongue: choppy English.

In the future imagined by U-STAR’s designers, international travelers and traders will ditch hired translators and strained English for free smart phone software. Chai concedes that, for now, it’s highly prone to mistakes. “More research will make it more natural, more expressive,” he said. “Even at this stage, it’s useable.”

That depends on how "useable" is defined. A test drive of U-STAR, and warnings from the linguists behind the program, suggest that human translators may hold onto their jobs for a while yet.

Beyond simple situations, major flaws

Has U-STAR liberated us from the confines of language? No. In fact, that question, which I carefully read aloud into my iPhone’s U-STAR app, was interpreted as “Where is you star liberated is from the confidence of language?”

Translated into Thai — one of 28 available languages — the question comes out even weirder. The filtered response, to a native Thai speaker, sounds like the equivalent of “You come from Star Hotel? Where is your language?”

Though U-STAR knows roughly 20,000 words from many of its onboard languages, it tends to butcher small talk or college-level prose. The software is at its best when the speaker asks simple questions common among travelers. This is by design, Chai said.

“To make speech recognition accurate, you have to sculpt the vocabulary to the domain,” he said. (A “domain” is linguist-speak for a topic of conversation.) “To start off, we made sure every language could at least cover the basic domain of travel. You can’t use it to talk politics.”

Mileage also varies by language. English speech fares better with U-STAR because the language is beamed to servers in Japan, where a high-tech government language lab has gathered decades of English speech. Basic queries such as “When does the bus arrive?” or “I’ll have the pork, please” are usually rendered accurately.

So are short commands in Mandarin Chinese. I asked a native Mandarin speaker to say “I’m sick and I need to go to the hospital” as well as “This key doesn’t work. Can I have another one?” into U-STAR. After a delay of 10 seconds, it could spit out both phrases accurately and with Chinese script to boot.

“It does work when the domain is limited and you speak very standard Chinese,” said Jianhua Tao, a Beijing-based researcher whose government-funded National Laboratory of Pattern Recognition contributed a vast supply of Chinese speech data to U-STAR. “But if you have a dialect or accent, in English or in Chinese, it will come out wrong.”

U-STAR is also a lousy wingman: in noisy environments, such as coffee shops and nightclubs, the application is useless. “You have to adapt to the system,” Chai said. “Talk slow. In a quiet place. Ask your friends nearby not to shout.”

Given all of these flaws, Tao is reluctant to portray U-STAR as a bulldozer ready to bear down on language barriers worldwide. “Sure, you can use it if you really need it. In a situation where no one can help you translate,” he said. “But bilingual people are still necessary.”

But for how long?

Tech companies' goal: speech-to-speech translation

Americans mindful of China’s rising clout — but too lazy to learn its tonal language — may feel buoyed by Microsoft’s latest advancement in translation technology. Last November, the tech giant’s chief research officer, Rick Rashid, flew to Tianjin, China, to debut a breakthrough in what he called “Deep Neural Networks” technology. His speech was delivered to a crowd of 2,000 young Chinese listeners in English. But as he spoke, his monologue was also rendered in real time into Chinese characters on a glowing screen above his head. (“The output was very accurate,” Tao said.)

“The results are not perfect ... but the technology is very promising,” Rashid later wrote on his blog, “and we hope that in a few years we will have the systems that can completely break down language barriers.”

Microsoft, through a spokesman, declined to comment on this technology’s future commercial availability “as this technology is still in the research phase.”

Google, which already offers a speech-to-speech translation application for Android phones, is also hinting at a looming breakthrough.

“We imagine a future where anyone in the world can consume and share any information, not matter what language it’s in,” wrote Franz Och, a senior Google Translate researcher, in a 2012 blog. “We want to knock down the language barrier wherever it trips people up and we can’t wait to see what the next six years bring.”

Another competitor is Jibbigo, founded by a professor at Pittsburgh’s Carnegie Mellon University, home to a reputed communication technology lab. Unlike its counterpart U-STAR, which requires an online connection, Jibbigo sells apps that translate language pairs offline for $4.99. (The online version is free.)

Several advancements have set the stage for this rush to debut universal speech-to-speech translators. Only in recent years have well-to-do people toted pocket computers — i.e., smart phones — capable of the processing power these applications demand. And only in somewhat recent years have non-English language labs collected enough recorded human speech to underpin digital translators with data.

“Mandarin speech recognition technology really didn’t develop much until the late 1990s, the 2000s,” Tao said. He credits Siri, Apple’s “intelligent personal assistant” with popularizing speech recognition in China. (Yes, Siri speaks Mandarin.)

“Even with Siri, maybe 40 percent of the replies are either totally wrong or not all that correct,” Tao said. “But even if it’s not correct, people still love it and have a laugh.”

The future of digital translation

Given the complexity of human interaction, digital translators are more likely to chip away at language barriers slowly rather than kick them down all at once.

Once novel, but now ubiquitous, airlines’ service hotlines introduced speech recognition technology to consumers years before U-STAR and Siri.

“Of course, they are extremely context dependent,” said Mikael Parkvall, an author and linguistics professor at the University of Stockholm. “You can’t ask, ‘What do the stewardesses' uniforms look like? Can I steer the plane?’ It won’t get it at all.”

Today’s translators, he said, show much improvement over the previous decade’s offerings. “Depending on the language you use, they’re decent,” Parkvall said. “I am not unimpressed.”

But the road to humanizing speech-to-speech translators requires processing hours upon hours of real-life humans talking. This is why U-STAR records its users' speech. This is also why Google — which translates in one day the same volume of language human beings translate in a year, Och claims — appears well poised for future translation breakthroughs.

From helping travelers’ figure out how to request the bathroom key, digital translators will likely move on to a more accurate deciphering of technical discussions and perhaps the language of international trade. But real-time translation of what linguists call “spontaneous speech” is “10, 15, maybe 20 years away,” Tao said. “Any translation of real dialogue, like you and me talking right now, that will take a very long time.”

“In speech, there are layers upon layers upon layers of intonation patterns that affect meaning,” Parkvall said. “All the computer can hear is a given frequency. These terabytes of speech, to the computer, that’s just hertz and milliseconds.”

“Now consider irony, for instance, or humor,” Parkvall said. “It’s hard enough to make some humans grasp that. How do you ever make a machine grasp it?”