16 June 2015

Over the Hurdle of Multilingualism to Global Leadership

The Digital Single Market (DSM) has been declared a European priority by the European Commission. Rightfully so! Software eats everything and particularly eCommerce is enjoying dramatic growth rates and thus heavy investment. VP Andrus Ansip  has nicely summarized the vision of the Digital Single Market: “Consumers need to be able to buy the best products at the best prices, wherever they are in Europe.”

Today, unfortunately, that means that the consumer is in most cases spending her/his money on a non-European site. The numbers are actually shocking: according to a recent Commission infographic the Digital Market today is made up by 39% national online services (likely not giving you the best deal) and 57% by US-based online services. EU cross-border, however, represents only a minuscule four percent!

Given also the potential for growth and new jobs, the Commission has launched a digital strategy to pave the way towards the DSM. It lists many laudable initiatives, like affordable parcel delivery costs, tackling of geo-blocking, simplifying VAT arrangements (after they just have been made unmanageable for cross-border SMEs), modernizing copyright, and strengthening European data protection rules. All this will surely help, but does it really address the core challenge of the Digital Single Market?

Commissioner Oettinger  recently stated that "a Polish citizen being refused to buy products on a German website is not compatible with the idea of Europe". I am not so sure whether that online business is really rejecting the customer. Why should it? It probably rather has a hard time communicating with this customer. But worse, the Polish citizen likely never managed to find the German website. A simple search already breaks the vision of border-less shopping. Enter a string in your language and the search results will already trap you in your national market. But even if a product name search crossed these language silos, the Polish citizen probably won’t understand what the German website is offering and under which conditions.

The main hurdle towards a Digital Single Market are Europe’s many languages. It’s amazing how politics, but also business, have overlooked this so far! Or maybe rather chosen to ignore it? Perhaps because they don’t know how to solve it? The big investments in technologies to overcome the language barrier have often produced only academic results. The field is dominated by research institutions and small niche players. This makes it hard to discover, purchase, and deploy language technology solutions.

Luckily, language technologies can today indeed enable the Polish citizen to find, buy, and use a German product or vice versa. By using data-driven approaches, innovative language technologies such as search, automatic translation, voice recognition, knowledge management, sentiment analysis, and many others, have achieved acceptable quality for the major languages. They are ready to be deployed in European eCommerce sites.

However, for achieving the vison of the Digital Single Market, we have to support at least all our 24 official languages and those of our most important trading partners. This requires a basic natural language processing (NLP) infrastructure. The European Language Technology industry is therefore pushing for the European Language Cloud (ELC), a public infrastructure providing the basic functionality required to process unstructured content. Through an API the ELC provides basic language technology services such as tokenization, named entity detection, etc. for all languages, in the same base quality, under the same favorable terms.

On top of this infrastructure, European language technology companies, mostly SMEs, will expose their offerings in the LTI Cloud. The LTI Cloud is a Software-as-a-Service (SaaS) wrapper around language technology components and functions as a marketplace. It will make it easy for start-ups, eCommerce, system integrators, and software companies to discover and plug ‘n’ play language technology.

The fourth edition of the LT-Innovate Summit, the yearly point of convergence of the Language Technology industry, will explore how to concretely launch these crucial building blocks for the DSM.

In a recent article, the Washington Post mocked Europe’s DSM efforts by stating that "Europe’s digital decline is accelerating". I would counter that. Why don’t we turn this much moaned about hurdle of Europe’s multilingualism into a unique opportunity? If we manage, in spite of our many cultures and languages, to create a Multilingual Digital Single Market and cross-border eGov, we will become the fittest for the global market.

Jochen Hummel is CEO of ESTeam & Coreon and Chairman of LT-Innovate

10 June 2015

Major disruption ahead in the language industry!

The Q2 issue of GALAxy, the quarterly newsletter of our partner association GALA, is guest edited by LT-Innovate Chairman Jochen Hummel (@JochenHummel) with a thought provoking piece on how Language Technology will leverage Big Data and transform the industry.

Several other articles are contributed and/or co-authored by LT-Innovate members and partners:
  • Big Data and the Translation Industry: Three Technology Challenges by Andrew Joscelyne, LT Innovate
  • Finding New Business Segments Through Big Data by Michael Wetzel, Coreon GmbH & Matthias Heyn, SDL plc
  • How to Improve Your Relationship with Machine Translation co-authored by Heidi Depraetere, CrossLang
  • Unlocking Language Resource Assets by Christian Galinski, Infoterm
  • Riga Summit Forges a Unified Vision for Multilingual Europe by Rihards Kalniņš, Tilde

08 June 2015

Highlights of the LT-Innovate Summit - Brussels, 25-26 June 2015


The fourth edition of the LT-Innovate Summit, the yearly point of convergence of the Language Technology industry, will take place in Brussels on 25-26 June 2015. It will benefit of the presence of leading policy makers:

  • Paul Rübig, Member of the European Parliament
  • Alexander De Croo, Vice Prime Minister of Belgium, Minister for the Digital Agenda
  • Robert Madelin, Director General, European Commission, DG CONNECT

Launch of the LTI Cloud

Jochen Hummel, LT-Innovate Chairman:

"We are launching the LTI Cloud on 26 June as a major new initiative that has the potential to benefit all our members. The aim of the LTI Cloud is to create a SaaS wrapper around language technology components developed by LT-Innovate members. It will make it easy for entrepreneurs, start-ups, software developers, IT departments, system integrators, and many others to source & plug ‘n’ play language technology components, allowing them to focus on their core business and competencies. Join us to find out more and make the LTI Cloud a success!"

See call for collaboration - Join us to launch the LTI Cloud!
Join us to Launch the LTI Cloud!
Join us to Launch the LTI Cloud!


LT CEO Summit and Industry Challenges

As every year, we have lined up a roster of "challengers":
  • Christian Dirschl, Chief Content Architect, Wolters Kluwer Deutschland GmbH
  • Florence Beaujard, Head of Linguistics and Physiology for Cockpit Design, Airbus
  • Armin Hopp, Founder, Speexx
  • Christophe Leclercq, Founder, EurActiv

These high level industry executives will provide an overview of their company's current and future needs from a language technology point of view. Do not miss the opportunity to participate in these forward looking "challenges".

LT-Innovate Award 2015

Discover "The Best in LT", network with entrepreneurs, experts and investors... and celebrate the Winners of our prestigious industry Award.

Workshop on the future of conversational interaction technologies

We are collaborating with leading academics to prepare a Research and Innovation Roadmap for multilingual and multimodal conversational technologies. The current version of the roadmap is available at citia.eu.

The main goal of the workshop is to collect feedback and recommendations on (1) refining the research & innovation scenarios; (2) mechanisms to bridge the gap between research (including cognitive sciences) & innovation; (3) further development of the stakeholder community; and (4) how to develop a startup culture to bridge the gap between the excellent research base and commercial reality.

Workshop on language resources: foundations of the multilingual digital single market

This workshop aims at identifying concrete scenarios for the improvement of the usability of Language Resources (LR). It is split into 3 interrelated panels: LR demand, LR supply and Matching LR offer to demand. Panelists from industry, research and the public sector will, in particular, discuss the following questions:
  • How can LR identification become a more streamlined, accessible and easily achieved activity?
  • Where and how can LRs be found and identified to solve a specific MT problem?
  • Who would be able to do the work as a service?
  • How can terminology of a given field and text data relevant to the same field be found online in a dependable way?
  • What are the major barriers for finding and using LRs from existing repositories?
  • What would be best ways to overcome these barriers?

Check out the full programme and register here!

30 April 2015

Broad Language Industry Coalition launches Call for Action: The Digital Single Market Must Be Multilingual!

At the occasion of the Riga Summit on the Multilingual Digital Single Market, held on 27-29 April 2015, a broad coalition of organisations representing the language industries came together to sign a Declaration of Common Interest.

The Riga Summit also launched a Call for Action entitled "Multilingual Europe: The Crowning Touch to the Digital Single Market".

The Declaration and Call for Action emphasise the importance of language technologies as key enablers for a truly multilingual Digital Single Market.

For additional media coverage of the Riga Summit, see "Babelitious" in The Economist, "Will the Digital Single Market be multilingual?", "Europe needs a language infrastructure, not just Google Translate" (interview of Jochen Hummel, ESTeam) and "Smaller languages could be lost in the Digital Single Market" (interview of Andrejs Vasiljevs, Tilde) in EurActiv.

29 March 2015

Data, architectures, and other forms of cooperation

In my last blog post, I argued that Europe can't win the game of who can field the best-known, most widely used smart personal assistant – and since these assistants are really impersonal, we wouldn't want to. We want an eco-system of companies doing what they do best: spotting a need, filling it well, and cooperating with each other.

We talked at the ROCKIT Roadmap Conference about data and infrastructure and necessary connections, but here I have to tread on dangerous territory – by tying lots of things together into a story about practical action for CITIA to take. All the bits of the story come from someone who was there, but do they work together to make what we need?

One of the recurring themes of the conference is that we need to share data with each other so that all of our algorithms and products improve. This needs to start with academics. Many academics feel that data collected using public resources should be public. That doesn't make releasing it easy. It's very difficult to make scientists, who are always chasing the next publishable result, stop before the end of a project so that they have enough resource left to package the data, set license terms, and release it. 

Anonymization can be both necessary, and expensive – especially since there isn't general agreement about what level of certainty is legally acceptable. In addition, postdoctoral researchers and students often don't have the skills to do a decent job on packaging. They have trouble thinking like someone who doesn't already understand what they've produced, so their documentation can be very poor indeed. I think this is an important part of any future job – academic or otherwise – so training is part of the solution. However, I also think data repositories need enough support to be able to curate the good from the bad and quality check packaging in time that data producers can correct it.

I also think it's hard to engineer data sharing among companies – but we did generally agree that the best way to make headway is to start working together to target our customer contacts, picking off each vertical separately. I actually think if this were to happen, the data sharing would come as a side effect. So the main action here is finding out how CITIA can encourage this kind of working together, rather than thinking about data itself.

Another recurring theme was that we need open architectures and at least de facto standards, so that academics and businesses can each concentrate on the part they do best. That's great, but it has to get less vague very quickly. We've agreed that will be an important part of our work in the second year of our support action, but what are the best actions that actually fit a relatively small budget we have to achieve the results we need? When I get stuck on big problems like this, I try to think about the nearest successful analogue to the problem at hand, and the history of how that collaborative system emerged. What is the most similar story we can think of – some major open source system like Wordpress? Solutions from the logistics industry? I'm really not sure, myself, but someone must have a better vision than I do. Let us know!

Jean Carletta,  Edinburgh University

26 March 2015

Europe's Digital Single Market must be multilingual!

Vice-President A. Ansip and R. Kalniņš
More than 3000 members and stakeholders of Europe's Language Community signed an Open Letter to the European Commission: "Europe's Digital single market must be multilingual". On the occasion of an official lunch in Riga on 26 March 2015, Rihards Kalniņš, Marketing Communications Manager at TILDE handed the Open Letter with  more than 3000 printed signatures to Andrus Ansip, Vice-President of the European Commission.  The latter expressed his awareness of the multilingual challenge in building  the Digital Single Market, was impressed by the number of signatures and asked for further input from Europe's Language Community.

15 March 2015

Do we really want smart personal assistants?

During the ROCKIT Roadmap Conference in February, I was designated to take notes and summarize the results of the session for scenario 2, “smart personal assistants.” That's harder than it sounds! I think the most important thing we learned was that whether European businesses will be successful in this emerging technology area is highly dependent on the business models they adopt and the culture that develops in Europe around them.

When people think about smart personal assistants, they immediately assume that the goal is to build a rival to things like Siri – an engine that can assist any user on a wide range of topics. This is actually a no-go strategy for European business for two reasons.

The first is that it's not the European way. This kind of generic personal assistant makes a great sales vehicle for global giants, but is poor at delivering what the customer actually wants. By its very nature, it's really more of a smart *im*personal agent on commission. Historically, European businesses have been based on a strong business-to-business orientation. They understand their local contexts and verticals well, and provide better end-user products and services because of it. Their offerings are niche – sometimes so niche they're just ways of getting around the limitations in some technology one of the giants has been pushing – but there's no shame in that. It may not be the way to fast growth and glitzy headlines, but I for one would rather provide genuinely useful products and services. Couple that with the fact that it's what we do well and there's really no question about the right approach.

The second reason why it's a bad strategy for our community is that even if we wanted too, we couldn't compete for one simple reason – data. These systems work because the giants fielded them among enthusiasts when they were just “good enough”, and improved them massively with larger and larger amounts of data as their use grew. Yes, we need to do more to share data among ourselves, and yes, we may well have better machine learning – but they have first starter advantage. The group consensus was that it would take us ten years before we were ready to start thinking about fielding a rival system, by which time the world will look completely different.

Once we recognize that this is the shape of the game that we're in, it tells us much more about what kind of community infrastructure and cooperation we need to create in order to support each other and do better all round. That will be the subject of my next blog post.

Jean Carletta, University of Edinburgh

We invite stakeholders of all kinds to comment on these views, whether or not they were at the Roadmap Conference - are they right? Please use the comment facility below.