LT-Innovate - The Association of the Language Technology Industry: November 2012

30 November 2012

Language Interpretation in Search of the Right Technology

Recently Microsoft demo’ed its new speech to speech (S2S) translation system that delivers a target language version in the “voice” of the speaker. A remarkable breakthrough, but does it herald the arrival of automated interpreting, as some people would like to think?

Probably not. Conference/meeting/trial/interrogation/medical interpreting is a different animal from the scenarios usually put forward for spoken translation. Whereas an S2S app may help you solve a personal communication problem in a hotel, police station, or restaurant, it can’t yet substitute for full-scale professional interpreting.

As Mark Seligman of Spoken Translation demonstrated recently at a TAUS session on S2S translation, a key component in a professional S2S tech mix is a ‘back translation’ channel – a method for double checking that the meaning of a polysemic word or phrase (i.e. with multiple possible meanings) has been properly translated. Without this back-translation, conversations can go deeply wrong very quickly. Yet real-time back-channel quality control is obviously not feasible for conference interpretation situations.

This means there is no point today in strategizing a direct pathway from the full S2S MT of existing apps such as Jibbigo or SpeechTrans to the standard professional interpretation situation. But as a recent insightful post from a director of ZipDX (a technology provider to the profession) shows, interpreting should embrace rather than disparage the promise of automation enhancement. There are three specific areas where innovative technology could help: speech semantics, voice quality, and device agnosticism.

Speech tracking as a productivity tool

Rich interpretation-specific speech resources should be built up to allow smart developers to invent apps that can leverage intrinsic intelligence for interpretation performances. One example: a monitor could track in real time what is being talked about in the meeting/conference, search in an interpretation memory cloud for previous translations, and make them available as written on-screen terminology prompts to the interpreter. In other words, transposing recent ‘memory’ tools developed for written translation to spoken translation could provide language aids that go beyond the term lookup tools typically found in the interpretation booth.

There are naturally legal issues about recording meetings, but if we could develop a multi-language event streams records all language streams in parallel in the cloud, then anonymised resources could be built up to help interpreters stand on the shoulders of their colleagues, and also stimulate LT developers to innovate with new smarter systems.

Good sound quality is critical

In international meetings, interpreters usually work on the premises using the local audio system. This is particularly costly due to the price of presence. As an alternative, there are now many mature collaboration telecom solutions that could help more interpreters work remotely (possibly using telepresence) under much improved audio conditions – a vital condition for good interpreter performance over the telephone.

Recently the German research institution Fraunhofer ISS released its Full-HD Voice codec that supercharges the communication quality of any VoIP app, providing the kind of professional quality that interpreters will need whatever their communication channel. Microphone makers such as Philips are also reducing ambient noise to boost audio quality in recordings.

Videoconferencing and the device/channel revolution

Interpreters always like to see the speakers whose words they are translating, together with any visual media used in the meeting. As smartphones, tablets and even TV sets are now part of the media mix for content sharing, there are multiple possibilities for a BYOD (bring your own device) agenda for interpreters. This would preserve the rich content of any meeting while also enabling interpreters to join the collaboration revolution driven by unified communication..

In addition to mainstream videoconference and telepresence players, providers such as Skype, a Microsoft company, are now deeply integrated with the voice features of Windows 8, for example. They are offering cost-effective communication platforms that could easily provide the visual capabilities to add value to interpretation functionality. This would enable cash-strapped SMEs to join in the multilingual global conversation under good quality conditions.

Do you have any ideas about how to make interpretation a more integrated function of digital communications?

26 November 2012

LT-Innovate’s Weekly - Joining the News Dots - November 19-24

Competition in the Intelligent Virtual Assistant (IVA) Space. In China, Ifyly's Tek Yu Dian dominates the mobile phone market. In Europe, Spain’s Inbenta has launched an IVA for enterprises, and Speaktoit has notched up 5 million users. It’s hardly surprising that Taiwan device makers believe that speech interfaces will be the next big thing after touch screens.

Translating Market Data into New Solutions. Middle-eastern company Dawak claims that failure to multilingualise ecommerce sites is costing trillions in lost business, while CSA reckons the manufacturing sector is worth a juicy $11B to the translation industry (a third of the total). That’s why Atril has launched a new web translation management product TEAMServer, Transfluent is expanding its domain range, and Lingo24 has loaded a new MT solution.

How Digital is your Language? Manx (spoken in the Isle of Man) gets a revival app, Portuguese is the fastest growing language on Facebook, and new multilingual social network Hamu will be trying to get their respective speakers to chat together.

Smarter Health Tech. AMI has delivered some clever coding to simplifying medical data recording; Nuance says radiologists see ASR as a must-have, Wales is strategising for Welsh to become a language of healthcare, smartphones and social media are becoming the new home medical encyclopaedia, and the UK government wants Britain to lead in health technology.

Kings of Content. Pearson is to leverage its educational publishing assets into smarter online content for schools, while Wolters Kluwer is expanding to Asia, expanding its Chinese workforce. Wherever you are you can benefit from a Springer API for more intelligent approaches to scientific; technical and medical content.

Customers of the Voice. Fraunhofer is improving VoIP voice quality, which should further enhance Skype’s new Android update as it pushes deeper into mobile voice and video services. Better channel quality would also add value to Belgian firm Acapela as it launches a new range of expressive TTS voices for content.

21 November 2012

TechScribe Releases OS Term Checker for Industry Controlled Language Standard

The UK technical writing aid provider has released an open-source term checker solution for ASD-STE100 issue 3 on its site. The ASD-STE100 standard is an international specification for preparing content for maintenance documentation in a controlled language. Until now software needed to help technical writers conform to ASD-STE100 has been very expensive. This TechScribe solution democratizes the technology.

19 November 2012

Multilingualism needs to be business-driven, and "no size fits all", in particular for SMEs

The Final Workshop of CELAN – Language strategies for competitiveness and employability took place on 15 November 2012 in Brussels and was attended by key stakeholders such as business associations, Higher Education representatives, decision-makers but also companies that implemented language strategies successfully and language technology providers.

"Multilingualism is key for business, growth and the Europe2020 strategy" emphasised Sonia Peressini, DG EAC, Multilingualism Unit, in her welcome address. And Wolfgang Mackiewicz, Freie Universität Berlin and CELAN coordinator evoked again the two main tenets CELAN came to: "multilingualism needs to be business-driven, and "no size fits all", in particular for SMEs".

Workshop presentations:

The beta version of a needs assessment tool is available now. Any feed-back is highly appreciated to letter-box@emfs.eu.

16 November 2012

taraXŰ: The first end-to-end machine translation environment project status as of October 2012

The rising demand for machine translation (MT) and the resulting implementations have shown that workflows and system architectures can become very complex. Particularly, the setup of combination or hybrid systems making use of different MT engines running in parallel has kept many language service providers from introducing MT technology.

To address this issue, euroscript teamed up in 2010 with a strong project consortium in the highly ambitious R&D project taraXŰ. The project has by now resulted in the first advanced machine translation environment that deploys amongst other things:

Corpus and translation management (running different rule or template based and statistical machine translation engines in parallel)
Automated source- and target-language quality assessment
Automatic selection of the best translation through innovative methods
A web-interface allowing human translators to correct machine translation output and assess its quality by ranking, post-editing, error-analysis, etc.

"The taraXŰ project gives us an unprecedented opportunity to perform applied research in a human-centric setting. The early inclusion of human translators in the development process is definitely a winning strategy to further improve machine translation quality" says Hans Uszkoreit, Scientific Director at the German Research Center for Artificial Intelligence (DFKI) and Head of DFKI Language Technology Lab.

The taraXŰ project consortium relies upon expertise in translation services, machine translation and evaluation, language checking, language technology, and related fields. Project partners are Acrolinx (member of the LT-Innovate Network), the German Research Center for Artificial Intelligence (DFKI), euroscript Deutschland and yocoy (member of the LT-Innovate Network and winner of LT-Innovate Award 2012).

Press Release on euroscript

SoDash has won a Real Business / Wonga "Future 50" award for it's NLP-powered social media analysis.

Guest post from Winterwell Associates, member of the LT-Innovate Network.

Firms need to interact with social media such as Facebook and Twitter. But it’s a hell of a chore. Much of the chatter on these networks is, well, just that.

So how to winnow the wheat from the chaff?

Two Scottish PhDs have developed an algorithm-powered Artificial Intelligence social-media dashboard that offers some truly extraordinary tools for firms seeking to improve their performance.

They explain it thus:

“The SoDash AI can be trained to distinguish between what’s relevant to your business or campaign, and what’s just pointless chatter. While other tools determine sentiment solely based on generic positive or negative words, SoDash is trained to decipher elements such as a message's structure, content, and who’s sending it. For example, ‘Irn Bru is a sick drink!’ means something very different to ‘Irn Bru tastes like sick!’. SoDash puts all this into context to build up a deep understanding of why certain messages mean certain things to each client.”

“SoDash clients define their own categories and enjoy powerful automation and analysis tailored to their specific definitions. The AI automatically tags messages with bespoke labels, delivers market information, ghost-writes responses, provides exceptionally accurate reports and much more. With SoDash, clients can quickly make sense of the countless conversations flowing across social media platforms and turn them into valuable, performance-defining research.”

Virgin, Universal and Phones4U are all using it. SoDash revenues are growing fast. Well worth following.

More information about this news.

Author: Dr. Daniel Winterstein, Winterwell Associates.

13 November 2012

Open Call (+€7K!) for Collaboration from Fusepool

The Geneva-based Fusepool initiative is offering up to €7.000 to any SME or similar interested in co-creating and testing the flagship applications developed in one of Europe’s leading data-pool platforms!

If you are an SME - or working closely with SMEs - and interested in data tools for patents, tenders, partner matching or customer feedback, then the Fusepool team is interested in your help to deliver usable and effective applications as close as possible to your user needs. Find all the information and details here.

Fusepool is dedicated to refining and enriching raw data using common standards and provides tools for analyzing and visualizing data so that end users and other software receive timely, context-aware and relevant information whenever they need it and wherever they are. To ensure high quality data and results, Fusepool combines well-defined but error-prone (semantic) Web 3.0 with controlled supervision and the collaborative but often messy (social) Web 2.0.

12 November 2012

Is it Worth the Investment to Develop in Your Native Language if it isn’t English?

Guest post from Kwaga, member of the LT-Innovate Network.

Creating your own online start-up is a fascinating, challenging and certainly risky path. While there are a number of “invigorating challenges” along the way, there is one that is a particular thorn in the foot of many of non-native English speakers.

Launching your venture in English AND your native language.

Our Paris start-up, Kwaga, felt compelled to develop our applications for the larger international market in English, but has also done so in French, the language of the majority of our team. It makes sense, right? Yes, but does that mean it’s worth doubling the workload at every step?

It depends.

Our expertise at Kwaga is in natural language processing and after a few initial “pivots” we developed something that’s really caught on; our flagship product, WriteThat.name, analyses email signatures and updates our clients’ address books automatically. The complex processing chains we developed must cover at least two languages, as experience has taught us that multilingualism cannot be improvised: it is the overall architecture of a processing chain that is multilingual or not and recoding a monolingual prototype is not the sort of nightmare our developers would want to relive!

Packaging the product or service in multiple languages can be very time-consuming.

Designing and implementing multi-lingual user interfaces takes twice as much time and can be costly, so here are a few things to keep in mind:

Do you outsource the English? Translating documentation isn’t that expensive, but the cost of continually changing your website can be significant, make you less agile and of course take more time.
Translating unique terminology and product names can be really tough and certainly takes a few back-and-forths before settling on what feels “just right. So outsourcing this can become quite expensive and time-consuming.
Testing new interfaces, correcting bugs etc. are of course twice as time-consuming
At some point, we should mention that coding in multiple languages means coding accents, yet not all APIs and development environments are even set up for this. This means finding “hacks” or workarounds that are – again - time-consuming and can result in bugs as products evolve.
Two different texts might not fit into the same image on a website, not to mention the issue of encoding and fonts: most European languages are much richer than English in diacritics and can be another headache (as mentioned in point 4).
It’s always good to do some A/B testing to really get your site and communications fine-tuned which of course means not writing twice as much... but four times (and that’s if you’re only working with 2 languages)!
Do you develop two separate social media streams as well? If you choose to do only one, yet reply in many, some clients can become annoyed at reading a foreign language.

Our Solution and Current Tough Decision

Once we had a good product fit and enough traction, we decided to hire Native English speakers for customer support, communications and marketing as the international English-speaking market quickly became our dominant user base.

At this point, we sometimes wonder if continuing to market in French is a good business decision.

Should we hang onto bilingualism at the expense perhaps of our agility to develop new products? Our ability to process messages in many languages is certainly a competitive advantage in a global market, but maybe the interfaces and communications in French are a luxury that a start-up cannot afford?

We wonder how these kinds of decisions are impacting your language technology or start-up ventures and would love to hear how you’ve approached the subject.

Author: Gaëlle Recourcé, Chief Scientist at Kwaga.
(PS This was translated from French by Brad Patterson, hence it took twice as long as well ;-)

08 November 2012

Eptica has acquired multilingual semantic search engine and “sentiment analysis” software developer Lingway.

Eptica, a leading provider of Multi channel Customer Interaction software, today announced the acquisition of multilingual semantic search engine and sentiment analysis software developer Lingway .

Lingway's advanced technology will strengthen Eptica's multichannel customer service suite, enabling organisations to improve the customer experience and benefit from increased consumer insight.

Eptica enables organisations to create the best customer experience by delivering the answers customers want on the channel of their choice (phone, email, web, chat, social media, and mobile). Through its powerful technology, Eptica's software ensures every request is handled efficiently whether managed through a self-service channel or the contact centre.As Eptica records the complete multichannel history of every customer contact, whether through Facebook, email, chat or Tweets , companies benefit from improved customer management, reporting and insight.

More about this acquisition from Eptica

Videos of the TAUS User Conference 2012 on Translation Automation Now Showing

The full videos of contributions to the recent TAUS User Conference are now available online They include two panel sessions on how (large IT) translation buyers and translation service vendors see the future; contributors including Translated.net and Moravia. There is a clear sense that consolidation is likely in the service sector, and that continuous content streaming, crowdsourcing, and machine translation will now be the norm on the buyer side. The shift to a new, more agile market for translation services may well favour new, innovative players.

You can also watch a session on speech-to-speech translation (with demos), and a host of rapid-fire technology showcases proposing updates or new ideas in translation automation from small or large translation tech firms from around the world. TAUS conferences play a major role in developing a sharing and collaborative mindset among translation players from all over the world, especially Europe. Not just sharing ideas but also in building a practical platform for sharing resources.

It is worth remembering that LT-Innovate has found that two thirds of the top 100 vendors in the “globalisation industry” are based in Europe (half of the top 10), and that there are several thousand companies offering Translation Technology services of various sorts, many of them micro-enterprises but including a significant number with revenues over €50M. Their future may well depend in part on what we pick up from open discussions at TAUS events.

Author: Andrew Joscelyne

The BBC: from Auntie to Lady Semantica

The BBC is affectionately known in the UK as “Auntie”, probably for its gentle and slightly old-fashioned didactic style. But deep in its IT ecosystem, the huge broadcaster is a hot bed of innovation. Not for nothing is it ranked only second to Google as the “favorite place to work” for LinkedIn techie job seekers.

For anyone interested in how a major content publisher is embracing the challenge of language technologies, check out this long interview with BBC ‘semantic web’ people. After full-scale coverage of the World Football Cup last year and the London Olympics in 2012, the content team have been exploring all the implications of delivering tailored archival content for a cutting-edge online user experience. Or what LT-Innovate is calling “intelligent content”.

Below is a summary of what they’re thinking about today:

“We are currently exploring various other uses of Semantic Web technologies within BBC R&D. In particular we’re looking at ways in which Linked Data can be used to help search and discovery of archive content. We have been working on automatically identifying the topics and the contributors for BBC programmes from their content, using a combination of Linked Data, signal processing, speech-to-text and Named Entity Recognition technologies, which we have been talking about in various places, such as the Linked Data on the Web workshop and at WWW’2012. The automatically generated links from programmes to entities described in the Linked Data cloud might be incorrect in places, so we are also exploring how users can validate or correct those links, and how this feedback can be taken into account within our automated interlinking workflow. We are planning to write in more details about our experiments in that space on the our blog in the next couple of weeks.”

Check out their blog to keep abreast of Auntie’s rapid reinvention as Lady Semantica.

Author: Andrew Joscelyne

05 November 2012

The LTi News Roundup - 5th November 2012 (part 1)

Weekly news round-up prepared by the Editorial Staff of LangTechNews for LT-Innovate, the Forum for Europe’s Language Technology Industry.

LT-Innovate: October European News Round-up

Introducing a new LT-Innovate service - a monthly update up of must-have news about events impacting the European LT industry from our dedicated site.

October traditionally marks the start of Q4 for businesses, a massive global conference season, product launches (in the consumer run-up to Christmas), and much strategizing and predicting about the coming year. Which means plenty of news flowing around the LT sector.

Big Content Publishing

Pearson and Bertelsmann have agreed to merge Penguin and Random House book publishers to build the world's largest consumer (popular and educational) publishing house with an eye on the still-emerging e-book or e-reader market. For the LT sector, a global player will mean large language resources for testing and expanding language technologies. E-books will draw on voice synthesis, and literary and educational content will need translating quickly when best-sellers hit the global markets. Digital convergence of text, video, and sound technologies should also drive development in innovative new 'cultural' products.

Meanwhile technical publishers are further honing the usability of their wares. Elsevier has announced that chapters will now be a natural unit of online information for the technical publishing market, and will add chapter-specific metadata to help users search for the precise content they need. And Wolters Kluwer has acquired the company Health Language to boost the searchability of its point-of-care content by incorporating high quality medical terminology into its search technology.

Film buffs and subtitle technology suppliers should note that the British Film Institute is planning to digitise 10,000 films over the next five years, possibly providing a vast store of spoken language data and a very long tail of speech translation opportunities.

A handy content generation aid (aka authoring) comes from the Norwegian firm iFinger which offers word/term search on or offline from leading content providers (Collins, Ernst Klett Verlag and Cappelen Forlag).and Wikipedia in all languages.

The LTi News Roundup - 5th November 2012 (part 2)

Weekly news round-up prepared by the Editorial Staff of LangTechNews for LT-Innovate, the Forum for Europe’s Language Technology Industry.

Making Translation Simpler

Lots of conferences and networking in the translation industry this month with the TAUS User Conference, tcworld and LocWorld events among others. One small trend: the emergence of what might be called “translation analytics” – i.e. business intelligence about the people and processes involved in translation. The Luxembourg provider Wordbee released a new business analytics module for their Enterprise Translation Management System, and discussions among TAUS service vendors focussed partly on the need for more data on the translation process so as to optimise anything from technology to translator selection. This is only natural as analytics of all kinds become part of the Zeitgeist.
When it comes to operations, the trend is to simplify workflows to reduce process time in the production chain. TranslateKarate for example delivered a super-simple online workflow, TAUS has launched a stripped-down API to streamline translation content exchange after inheriting the mantle of standards watchdog earlier last year. Meanwhile the Irish start-up KantanMT launched a BetaIV version of its cloud engine as part of its continuous development agenda, in a bid to attract more customers before the paying service kicks in. And Kilgray received a Deloitte Award as one of the leading young tech companies in Hungary, rewarding its insistence on building a user ecosystem bottom-up together with its customers.

Language Learning

In a global language learning market worth $58.2B in 2011 (including individual and enterprise services), it is surprising how little serious innovation news swims into our focus. So it was good to hear that Irish start-up RendezVu received an EC seal of approval for its ExamSpeak application to help learners prepare for exams.
In another move, the online language learning busuu with over 25M users in 200 countries has won another round of funding and is moving to London to stay closer to their VC partners and HarperCollins publisher – and perhaps benefit from the positive tech-biz vibrations in the UK.
AABBY has meanwhile released a set of Lingvo Dictionaries for iOS which offer learning friendly information and pronunciation aids for smart phone users.
In a big data world, it’s also worth noting that the Swiss firm Education First has published its latest report that rates countries on their proficiency in speaking English. Although it is obvious why English is the language of choice for this exercise, it would be interesting to have data about proficiency in other languages. And to fine-tune the analysis below the level of countries (too many are almost neck and neck) to other useful demographics.

The LTi News Roundup - 5th November 2012 (part 3)

Weekly news round-up prepared by the Editorial Staff of LangTechNews for LT-Innovate, the Forum for Europe’s Language Technology Industry.

Text Analytics

In the busy world of sentiment detection and text analytics, there seems to be a growing need to treat languages in the plural. Witness the choice of the Spanish company Bitext to join the Salesforce Marketing Cloud Social Insights Ecosystem. This will offer it access to more customers and enable it to promote its Spanish language analytics solutions more globally. Interestingly, this ecosystem has attracted the US firm LinguaSys which also specializes in multilingual analytics and strongly believes that you cannot do proper text analytics via translation. European social media analysis specialists would seem to be well-placed to provide the added value of local language insight.

Pressure on Unified Communications?

A report from the UK found that open standards are needed in the videoconferencing market to simplify the complex puzzle of video hardware and services. Videoconferencing in a time of shrinking travel budgets looks like a no brainer, and the arrival of new interfaces – tablets and smartphones – is putting pressure on suppliers to democratise this still-expensive meeting facility. At the same time VoIP companies like Skype with its Windows 8 integration and the German company FriendCaller are offering user-friendly alternatives to full-blown telepresence. In due course they will benefit from recording, summarisation and speech search and translation technologies that can help transform calls and online meetings into actionable rich content resources. Especially where risk, compliance and confidentiality play a key role