Recently Microsoft demo’ed its new speech to speech (S2S) translation system that delivers a target language version in the “voice” of the speaker. A remarkable breakthrough, but does it herald the arrival of automated interpreting, as some people would like to think?
Probably not. Conference/meeting/trial/interrogation/medical interpreting is a different animal from the scenarios usually put forward for spoken translation. Whereas an S2S app may help you solve a personal communication problem in a hotel, police station, or restaurant, it can’t yet substitute for full-scale professional interpreting.
As Mark Seligman of Spoken Translation demonstrated recently at a TAUS session on S2S translation, a key component in a professional S2S tech mix is a ‘back translation’ channel – a method for double checking that the meaning of a polysemic word or phrase (i.e. with multiple possible meanings) has been properly translated. Without this back-translation, conversations can go deeply wrong very quickly. Yet real-time back-channel quality control is obviously not feasible for conference interpretation situations.
This means there is no point today in strategizing a direct pathway from the full S2S MT of existing apps such as Jibbigo or SpeechTrans to the standard professional interpretation situation. But as a recent insightful post from a director of ZipDX (a technology provider to the profession) shows, interpreting should embrace rather than disparage the promise of automation enhancement. There are three specific areas where innovative technology could help: speech semantics, voice quality, and device agnosticism.
Speech tracking as a productivity tool
Rich interpretation-specific speech resources should be built up to allow smart developers to invent apps that can leverage intrinsic intelligence for interpretation performances. One example: a monitor could track in real time what is being talked about in the meeting/conference, search in an interpretation memory cloud for previous translations, and make them available as written on-screen terminology prompts to the interpreter. In other words, transposing recent ‘memory’ tools developed for written translation to spoken translation could provide language aids that go beyond the term lookup tools typically found in the interpretation booth.
There are naturally legal issues about recording meetings, but if we could develop a multi-language event streams records all language streams in parallel in the cloud, then anonymised resources could be built up to help interpreters stand on the shoulders of their colleagues, and also stimulate LT developers to innovate with new smarter systems.
Good sound quality is critical
In international meetings, interpreters usually work on the premises using the local audio system. This is particularly costly due to the price of presence. As an alternative, there are now many mature collaboration telecom solutions that could help more interpreters work remotely (possibly using telepresence) under much improved audio conditions – a vital condition for good interpreter performance over the telephone.
Recently the German research institution Fraunhofer ISS released its Full-HD Voice codec that supercharges the communication quality of any VoIP app, providing the kind of professional quality that interpreters will need whatever their communication channel. Microphone makers such as Philips are also reducing ambient noise to boost audio quality in recordings.
Videoconferencing and the device/channel revolution
Interpreters always like to see the speakers whose words they are translating, together with any visual media used in the meeting. As smartphones, tablets and even TV sets are now part of the media mix for content sharing, there are multiple possibilities for a BYOD (bring your own device) agenda for interpreters. This would preserve the rich content of any meeting while also enabling interpreters to join the collaboration revolution driven by unified communication..
In addition to mainstream videoconference and telepresence players, providers such as Skype, a Microsoft company, are now deeply integrated with the voice features of Windows 8, for example. They are offering cost-effective communication platforms that could easily provide the visual capabilities to add value to interpretation functionality. This would enable cash-strapped SMEs to join in the multilingual global conversation under good quality conditions.
Do you have any ideas about how to make interpretation a more integrated function of digital communications?