14 April 2016

The Future of Language Resources for Machine Translation (LR4MT)

In a recent brief survey of language service suppliers (LSPs), LT-Innovate attempted to find out from the translation industry how they saw the future of language data availability specifically for machine translation. The results provide food for thought when it comes to planning for the improved usability of digital text resources in the years ahead. It looks as if new developments in machine translation (MT) technology will work in parallel with a growing need for the right data.

First, statistical machine translation is clearly on most LSPs’ radar screens. Hard data on the actual size of the user market for MT systems is impossible to calculate today, as is information on who uses which free or paying services available online in their everyday work. But everyone who responded to our survey claims they will be “using” MT in the next 2 to 3 years. 

Preparing for this transition is therefore vital for the nascent language data resource sector.

Overall, 30% of our LSP respondents reckon that the data they will need to prime their MT engines will come from their clients, 78% will use their in-house translation memories and similar, and 70% will try to find third party sources from outside their immediate business nexus. 

15% of them would be prepared to buy such data, 70% of them will crawl the web, while a total of 83% of them expect more free resources will become available.


Judging by our current findings from mapping publicly-available LR4MT in Europe, the chances of them finding the relevant resources easily look relatively small. Sharing language data is not a high-visibility phenomenon so far.

However 39% said they did not have the necessary engineering resources in-house to transform the content they might find into viable MT data. This suggests there could be a small market for language data cleaning and aligning for data harvested from the web or well-known repositories.

When it comes to the desired quality criteria for usable language resources, by far the most important criterion (84%) was unsurprisingly domain relevance. Indeed, small customer- or domain-specific language models for MT are typically considered to outperform general models by a very large factor. This suggests that some serious effort will need to go into pinpointing domain relevance in any language resource supply platform, rather than rely, say, on volume as a virtue in itself.


Appropriately, there was also considerable emphasis on leveraging the semantic characteristics of language data needed for MT. Semantically enriched data, as proposed by such EC-funded projects as LIDER (a Linked Open Data-based ecosystem of free, interlinked, and semantically interoperable language resources) and BabelNet (multilingual dictionary underpinned by a rich semantic network) clearly have potential as a future resource. We therefore need to examine the fastest and most efficient way to transform this potential technology stack into an operational reality. We can also expect to hear much more from Coreon about multilingual knowledge management as a fundamental business tool.

So what can we expect for a more effective and efficient deployment of LR4MT? In general, respondents are looking towards new hybrid models of machine translation involving the integration of transfer/grammar and semantic modules into the plain vanilla statistical model as it exists today. This suggests that language technology and data resource quality will need to evolve closely in parallel.

They also expect deep learning to be applied to MT, together with such processes as continuous retraining during the MT post-editing phase. In other words, we are just at the beginning of a new cycle of more artificial intelligence-driven MT systems that will be able to learn as they go and leverage even more usability from relevant data resources. But as one respondent pointed out, the ultimate litmus test for the value of translation resource data is whether or not the original translation is any good. Tools to tame the elusive beast of rapid translation quality evaluation will still need to be part of the mix.

What specific needs for or constraints on MT data resources do you foresee in Europe in the near future? Tell us here or respond to our survey.

Jo Céline

25 February 2016

ESIF Funding Opportunities


We know it all: There are these perfectly innovative ideas, a dream-team available and willing to cooperate, but, alas, no money. It is for language technologies the same as for other topics - no, it is a bit worse, as there is no dedicated budget in H2020 for LT only. Hence, it was time to look for other money sources. LT-Observatory, a project funded by H2020 (yes :-) looked into alternative opportunities and started with ESIF. This lesser known acronym stands for European Structural and Investment Funds, funds distributed by the EC but administered by national/regional authorities. For those that want to know more about ESIF, click here.

Member States and/or regions identify priorities, so-called "Research & Innovation Smart Specialisation Strategies" or RIS 3. Other priorities include SME competitiveness. While no priority mentions directly language technologies, many see a window of opportunity for LT projects within their priorities. The Member States' and regions' information can be found at the National Funding Opportunities web page. Or you can download a pdf document.

Our tip: Visit the website in the coming months, we will add more national and regional funding opportunities. If you try it out and are successful, let us know so that we can feature your "success story" on-line.

20 November 2015

The European Language Cloud, or How to Enable Multilingual Europe

Multilingualism is a core value of the European Union, as integral to Europe as the freedom of movement, the freedom of residence, and the freedom of expression. The European Charter of Fundamental Rights, which enshrines the foundational rights and freedoms protected in the EU, upholds a respect for cultural, religious, and also linguistic diversity as a cornerstone of European policy.

Europe’s commitment to linguistic diversity is most clearly apparent in its unwavering decision to maintain 24 official languages in the EU – no matter how large or small their speaker populations, and despite the bureaucratic hurdles in Brussels and Luxembourg – though many more regional languages are widely spoken and officially promoted across the continent.

Language barriers in the digital world

Though Europe’s multilingualism is a fundamental cultural and social value, its treasured linguistic diversity can also lead to significant communication barriers between people. The upholding of “unity in diversity” remains a difficult challenge. The effects of linguistic fragmentation can be seen most clearly in maps of language use on social media sites such as Twitter, where conversations are mostly restricted to national languages and thus limited by geographic borders.

As these fascinating maps make all too clear, language barriers can hinder the free flow of information and knowledge between nations, effectively fragmenting Europe into “language silos.” This is a major obstacle for the receiving and imparting of information and ideas across national borders – a defining aspect of the freedom of expression.

Language barriers also represent a major obstacle to the creation of the Digital Single Market, which seeks to combine the 28 national digital markets, harmonizing regulations and uniting all 500 million citizens of the EU in a single online marketplace.

At the moment, as a Eurobarometer study shows, more than 40% of Europeans never purchase goods or services if they are not available in their native language. Language barriers therefore severely restrict access to goods for European consumers, hindering the creation of a Digital Single Market. Having more access to information in multiple languages would go a long way toward increasing the number of cross-border sales. At the moment, according to the European Commission, only 15% of European consumers shop online in other EU countries, and only 7% of European SMEs sell cross-border. There is much room for growth.

How language technologies can help

Fortunately, there is a technological solution to easing linguistic fragmentation online. Recent developments in language technologies, such as state-of-the-art machine translation and automated speech recognition, now enable us to overcome language barriers between people, simultaneously allowing multilingualism to thrive in the digital world.

Thanks to language technologies, people are enabled to write, read, or speak online in their own native language, while others can access the information in a language that they understand.

The heightened application of these language technologies to the online market will not only foster communication between nations. It will also help boost the European economy by enabling more cross-border trade.

Just imagine: a digital market where absolutely all online content is instantly available in all languages of the European Union; where Internet users can interact seamlessly in real time with one another regardless of the language they are speaking or writing; and where goods and information can be searched for and accessed no matter where it was posted or in which language.

European Language Cloud

Where to begin to realize this vision? Fortunately, European excellence in language technology research and the thriving language technology industry has already laid the foundations for a viable solution. This includes recent breakthroughs in services from fields such as natural language processing, machine translation, text analytics, speech recognition, multilingual SEO/SEM, and semantic analysis.

But none of these services alone can meet the comprehensive needs of European industry and enable a truly multilingual Digital Single Market. To meet the complex needs of the market, language technology services must be accessed, combined, and leveraged into large-scale solutions, which can then be plugged directly into applications, making them fully multilingual.

This is where European policymakers can step in and help, by setting up a public language technology infrastructure – the European Language Cloud. This infrastructure would use the power of cloud technologies and combine the best that European industry and research have to offer.

A European Language Cloud would ensure easy access to key enabling language technologies for all EU languages, in the areas of natural language processing, automated translation, speech processing, and semantic analysis, among others. This would make these enabling services easily available to developers and integrators of commercial and public digital solutions.

The infrastructure should also include open access to multilingual language resources – the raw material for data-driven technologies and solutions – which all too often remain buried deep in corporate and government databases, instead of being used to build the solutions sorely needed by the marketplace.

Once a solid European Language Cloud infrastructure is in place, commercial players and public sector organizations could then use the available language technology services as buildings blocks, or core components, to create innovative multilingual solutions for their high-demand applications. 

The role of European research and innovation

At the same time, the European Language Cloud must be continuously replenished with new services and innovations. The driver of this innovation is the cutting-edge language technology research emerging from Europe’s universities and research centers. However, several “knowledge gaps” still exist in research, and often our research doesn’t fully evolve into commercially viable applications.

Targeted actions are critically needed to address the gaps in coverage for all EU languages, and provide novel methods to improve quality and applicability of language technologies. In a tangible display of its respect for linguistic diversity, Europe must fill the gaps in existing knowledge and ensure that all EU languages (not just larger languages) have the same degree and quality of language technology services.

Europe must also guarantee that European excellence in research keeps up with the growing demands of global industry. This will help Europe to remain globally competitive with next-generation services and solutions.

What Europe can do to implement this vision

How to implement this vision? The European Commission already has the right instruments in place – an encouraging sign. The Connecting Europe Facility programme is taking the initial steps to create an automated translation infrastructure for Europe. Public institutions across Europe have already begun to reap the first fruits of this programme, as it extends and improves its translation technologies for European languages. But the CEF programme should be significantly expanded to include other essential language technologies as well.

Europe must also reinforce its innovative language technology research, through programmes like Horizon 2020 and other instruments. Unfortunately, language technologies are missing from the latest Work Programme for 2016-17. Not only should they return to the Work Programme for 2018-19, but they should also assume a central priority to address this major challenge for Europe.

Breaking the language barrier in Europe is essential to make the EU more united in its diversity. It is crucial for not only increasing trade and commerce, but also fostering communication and understanding between the 500 million citizens of multilingual Europe. This is needed today more than ever. We should not miss this opportunity.

Andrejs Vasiļjevs and Rihards Kalniņš, Tilde

13 July 2015

Le traitement automatique des langues (enfin) à l’honneur


Confrontée à un volume d’information toujours croissant, l’Europe découvre, ravie, la valeur du traitement automatique des langues, ciment de la construction européenne.

Lors du récent sommet LT-Innovate, Alexander De Croo, vice premier ministre de Belgique et ministre de l’Economie digitale ainsi que Robert Madelin, directeur général de la DG CONNECT à la Commission européenne, ont envoyé un message très clair à la communauté du Traitement Automatique des Langues (TAL) : « Nous comprenons aujourd’hui l’importance de votre discipline et le rôle qu’elle joue dans le développement économique de l’Europe. Nous apprécions aussi votre capacité à transformer et adapter votre discours à nos préoccupations économique et politique ».

Ce message, illustré dans les interventions régulières des intervenants politiques, souligne la prise de conscience du rôle fondamental du traitement automatique des langues.

Cette « compréhension déclarée » serait ainsi liée à la transformation du discours de notre discipline envers les autorités. Je n’en suis pas aussi convaincu que cela. Je n’ai pas le sentiment que notre discours ai subitement ou progressivement changé fondamentalement, et ce quelle que soit la discipline concernée, la traduction, la reconnaissance vocale ou encore l’analyse sémantique.
Cette soudaine prise de conscience des autorités européenne me semble être davantage une conséquence de leur difficulté, voire de leur impossibilité à faire face au volume d’information qui les submerge aujourd’hui.

En ce sens, nous pouvons rappeler à la communauté et aux autorités, que l’un des 3 V du Big Data – la Variété - caractérise intrinsèquement la masse des données qu’il faut appréhender et traiter.
C’est bien cet enfant de l’ère numérique qui a éveillé les consciences sur l’importance des données, leur nature, leur diversité, leur masse, pour l’aide à la décision technique, économique et politique.
Cependant, quelles qu’en soient les raisons, cette prise de conscience, dans le contexte du « Digital Agenda for Europe » est une excellente nouvelle pour notre communauté. Celle-ci, rappelons-le, est composée à la fois d’universitaires, mais également d’un grand nombre de PME à travers toute l’Europe. Il apparaît donc aujourd’hui que nous sommes clairement identifiés et reconnus pour nos expertises variées et notre valeur contributive au développement et aux enjeux européens.

Nous le savons, et j’ai pu le vérifier lors de notre réunion annuelle, toutes les entreprises engagées de notre communauté connaissent bien la manière dont elles peuvent contribuer à ce développement stratégique. En revanche, trop nombreuses sont celles qui finissent par baisser les bras au moment de se confronter aux mécanismes administratifs complexes et statutaires de l’Europe. Nous sommes en général des entreprises de petite taille et malheureusement pas toujours correctement équipées pour échanger d’égal à égal avec les autorités Européennes à l’occasion de projets de type H2020 ou autre. Nous avons parfois le sentiment regrettable qu’au cours des 15 dernières années, le fossé entre nous continue inexorablement de se creuser, qu’une communication simple et directe reste toujours difficile et qu’au final, l’Europe ne sait pas nous accompagner.

Il est urgent et impératif que l’Europe assume et entretienne un rôle d’accompagnement stratégique – à l’instar des États-Unis – auprès de nos PME innovantes, de nos start-up, parfois fragiles, afin d’assurer des perspectives de développement pérenne à moyen et long terme. L’Europe doit comprendre l’importance stratégique des technologies innovantes que nous développons pour servir, entre autre, l’indépendance technologique, économique, culturelle et juridique de notre continent européen.

Il est heureux que nos représentants européens prennent conscience de notre existence technologique et de notre valeur associée. Il est temps maintenant que notre Europe administrative se mette à notre hauteur afin de nous apporter une aide active en nous impliquant dans des projets d’exécution et de production. L’un des premiers bénéfices attendus permettrait certainement de simplifier et d’optimiser ses propres rouages administratifs...

Charles Huot est le directeur général délégué et co-fondateur de TEMIS, une société de gestion des données non structurées. TEMIS aide les entreprises à archiver, gérer, analyser, trouver et partager un volume d’informations toujours croissant. Cet article a été publié aussi sur EurActiv.

The LT-Innovate Summit 2015 in a Nutshell

The LT-Innovate Summit 2015 was attended by more than 140 stakeholders from industry, research, consultancy and policy making. Below are its highlights.

LT-Innovate Award Winners 2015


Five Winners of the LT-Innovate Award 2015 were selected by the jury & participants from the 17 applicants who showcased themselves during the Summit:
The 5 Winners of the LT-Innovate Award 2015 were designated at the LT-Innovate Summit on 25 June:
digm
Dolphio Technologies
Interprefy
recapp IT
Speexx
The Winners were selected by the jury & participants from the 17 applicants who showcased themselves during the Summit.
See more information on our main LTI Award page.
- See more at: http://www.lt-innovate.eu/lti-summit/award-2015#sthash.ngYjBzd8.dpuf
The 5 Winners of the LT-Innovate Award 2015 were designated at the LT-Innovate Summit on 25 June:
digm
Dolphio Technologies
Interprefy
recapp IT
Speexx
The Winners were selected by the jury & participants from the 17 applicants who showcased themselves during the Summit.
See more information on our main LTI Award page.
- See more at: http://www.lt-innovate.eu/lti-summit/award-2015#sthash.ngYjBzd8.dpufthe LT-Innovate Award 2015 were designated at the LT-Innovate Summit on 25 June:

digm
Dolphio Technologies
Interprefy
recapp IT
Speexx

The Winners were selected by the jury & participants from the 17 applicants who showcased themselves during the Summit.

See more information on our main LTI Award page.Five Winners of the LT-Innovate Award 2015 were selected by the jury & participants from the 17 applicants who showcased themselves during the Summit:
For more information, see main LTI Award page.

The 5 Winners of the LT-Innovate Award 2015 were designated at the LT-Innovate Summit on 25 June:
digm
Dolphio Technologies
Interprefy
recapp IT
Speexx
The Winners were selected by the jury & participants from the 17 applicants who showcased themselves during the Summit.
See more information on our main LTI Award page.
- See more at: http://www.lt-innovate.eu/lti-summit/award-2015#sthash.ngYjBzd8.dpuf

Launch of the LTI Cloud


Jochen Hummel - CEO, ESTeam and Chairman, LT-Innovate; Robert E. Etches - CIO, TextMinded; Luc Meertens - CEO, CrossLang; and Christoph Prinz - CEO, SailLabs called upon the Language Technology industry to join forces for the Launch of the LTI Cloud, a Software-as-a-Service (SaaS) wrapper which will make it easy to discover and plug ‘n’ play language technology components.

For more information see letter and presentation
.
Jochen Hummel - CEO, ESTeam and Chairman, LT-Innovate; Robert E. Etches - CIO, TextMinded; Luc Meertens - CEO, CrossLang; and Christoph Prinz - CEO, SailLabs call upon the Language Technology industry to join forces for the Launch of the LTI Cloud at the occasion of the LT-Innovate Summit on 26 June 2015. - See more at: http://www.lt-innovate.eu/lt-observe/document/call-collaboration-join-us-launch-lti-cloud#sthash.8QNp6p0Q.dpuf

Industry Challenges


Several industry executives provided an overview of their company's current and future needs from a language technology point of view:
  • Christian Dirschl, Chief Content Architect, Wolters Kluwer Deutschland GmbH
  • Florence Beaujard, Head of Linguistics and Physiology for Cockpit Design, Airbus
  • Armin Hopp, Founder, Speexx
  • Christophe Leclercq, Founder, EurActiv
Christian Dirschl wrote a blog, summarising his experience and offering the support of Wolters Kluwer for the next steps.

Keynote speakers


The Summit welcomed three keynote speakers representing the three institutions involved in policy making at EU level:
  • Paul Rübig, Member of the European Parliament
  • Robert Madelin, Director General, European Commission, DG CONNECT
  • Alexander De Croo, Belgian Federal Vice-prime Minister and Minister for Development Cooperation, the Digital Agenda, Telecommunications and Post
For more information, see blogs by Margaretha Mazura, Charles Huot (in French) and BelgienInfo article (in German).

Board of Directors


At the occasion of the Annual General Meeting of LT-Innovate, several new members of the Board of Directors were appointed:
  • Robert Etches, CIO, TextMinded
  • Matthias Heyn, Vice President Global Solutions, SDL International
  • Charles Huot, COO, TEMIS
You will find a full list of the current Board of Directors here.

Key links


Here are additional links to find out more about LT-Innovate 2015:

Programme and presentations
Speakers
Participants
Storify (summary) drawing upon the #ltisummit Twitter stream
Picture gallery

07 July 2015

Three high-level political messages in support of a multilingual Digital Single Market

On 25-26 June 2015, experts, technicians researchers, business people, intermediaries and politicy makers got together at the LT-Innovate Summit in Brussels to discuss and explore how language technologies can make the Digital Single Market multilingual.

MEP Paul Rübig opening the LTi Summit 2015

The first keynote by Austrian MEP Paul Rübig set the scene: "Language technologies represent a substantial economic power, they are set to grow dramatically, and thus, represent a great asset for Europe!". While languages are a representation of Europe's cultural heritage, they represent also an obstacle to cross-border trade. Paul Rübig emphasized that Europe's target should be to "remove language barriers as well as leverage on language diversity in order to support intra-European commerce and foster international trade. But this requires deep integration of language technologies in business processes and the public administration services. Therefore, the LT component must be part of the Digital Single Market".

R. Madelin, Director-General DG CNECT

The second high-level speaker was Director-General of DG CONNECT of the European Commission, Robert Madelin, who will become Special Adviser on Innovation to President Juncker as of 1 September 2015. He pointed out that the "ability for all Europeans to get what they need in the language of their choice is a requirement of the 21th century". He emphasized that LT-Innovate had contributed to clarifying the vision, which now needs to be implement practically and concluded: "the European cloud has to include an answer to the challenge of languages. Multilingualism is a necessity for Europe!".


Belgian Federal Vice Prime Minister, Alexander de Croo

The final keynote speaker of the event conquered the audience with a truly inspiring speech: Belgian Federal Vice Prime Minister and Minister for Development Cooperation, the Digital Agenda, Telecommunications and Post, Alexander de Croo. He described the current digital landscape with all its opportunities but also issued some warnings: "Digital is one of the strongest democratic forces in the world. The biggest opportunities are in the traditional industries, here digital can make the big difference. Digital is a driving industry; language technology is empowering it to go global. [...] At some stage everything will be connected - language technology will empower people to understand". When it comes to the people, his message is clear: "Give more attention to people! Engage the European community in the creation of our digital future! We need to give more importance to eSkills, the educational system needs flexibility and innovation. I don't want an Einstein economy!" And he continued with another issue that was also raised at the LTO dialogue workshop on language resources: Trust! "We need trust. Give us more trust please in Europe and for Europeans. If you let people get on with it, people will do good things. Give it a try". And he ended with a hint towards the EU policy: "Thinking behind the single market is futile if we do not get rid of mobile roaming! Breaking up platforms is no solution! Creating the Digital Single Market will empower our own platforms." 

See the Summit' s full programme and presentations.

16 June 2015

Over the Hurdle of Multilingualism to Global Leadership


The Digital Single Market (DSM) has been declared a European priority by the European Commission. Rightfully so! Software eats everything and particularly eCommerce is enjoying dramatic growth rates and thus heavy investment. VP Andrus Ansip  has nicely summarized the vision of the Digital Single Market: “Consumers need to be able to buy the best products at the best prices, wherever they are in Europe.”

Today, unfortunately, that means that the consumer is in most cases spending her/his money on a non-European site. The numbers are actually shocking: according to a recent Commission infographic the Digital Market today is made up by 39% national online services (likely not giving you the best deal) and 57% by US-based online services. EU cross-border, however, represents only a minuscule four percent!

Given also the potential for growth and new jobs, the Commission has launched a digital strategy to pave the way towards the DSM. It lists many laudable initiatives, like affordable parcel delivery costs, tackling of geo-blocking, simplifying VAT arrangements (after they just have been made unmanageable for cross-border SMEs), modernizing copyright, and strengthening European data protection rules. All this will surely help, but does it really address the core challenge of the Digital Single Market?

Commissioner Oettinger  recently stated that "a Polish citizen being refused to buy products on a German website is not compatible with the idea of Europe". I am not so sure whether that online business is really rejecting the customer. Why should it? It probably rather has a hard time communicating with this customer. But worse, the Polish citizen likely never managed to find the German website. A simple search already breaks the vision of border-less shopping. Enter a string in your language and the search results will already trap you in your national market. But even if a product name search crossed these language silos, the Polish citizen probably won’t understand what the German website is offering and under which conditions.

The main hurdle towards a Digital Single Market are Europe’s many languages. It’s amazing how politics, but also business, have overlooked this so far! Or maybe rather chosen to ignore it? Perhaps because they don’t know how to solve it? The big investments in technologies to overcome the language barrier have often produced only academic results. The field is dominated by research institutions and small niche players. This makes it hard to discover, purchase, and deploy language technology solutions.

Luckily, language technologies can today indeed enable the Polish citizen to find, buy, and use a German product or vice versa. By using data-driven approaches, innovative language technologies such as search, automatic translation, voice recognition, knowledge management, sentiment analysis, and many others, have achieved acceptable quality for the major languages. They are ready to be deployed in European eCommerce sites.

However, for achieving the vison of the Digital Single Market, we have to support at least all our 24 official languages and those of our most important trading partners. This requires a basic natural language processing (NLP) infrastructure. The European Language Technology industry is therefore pushing for the European Language Cloud (ELC), a public infrastructure providing the basic functionality required to process unstructured content. Through an API the ELC provides basic language technology services such as tokenization, named entity detection, etc. for all languages, in the same base quality, under the same favorable terms.

On top of this infrastructure, European language technology companies, mostly SMEs, will expose their offerings in the LTI Cloud. The LTI Cloud is a Software-as-a-Service (SaaS) wrapper around language technology components and functions as a marketplace. It will make it easy for start-ups, eCommerce, system integrators, and software companies to discover and plug ‘n’ play language technology.

The fourth edition of the LT-Innovate Summit, the yearly point of convergence of the Language Technology industry, will explore how to concretely launch these crucial building blocks for the DSM.

In a recent article, the Washington Post mocked Europe’s DSM efforts by stating that "Europe’s digital decline is accelerating". I would counter that. Why don’t we turn this much moaned about hurdle of Europe’s multilingualism into a unique opportunity? If we manage, in spite of our many cultures and languages, to create a Multilingual Digital Single Market and cross-border eGov, we will become the fittest for the global market.


Jochen Hummel is CEO of ESTeam & Coreon and Chairman of LT-Innovate