LT-Innovate - The Association of the Language Technology Industry: March 2015

29 March 2015

Data, architectures, and other forms of cooperation

In my last blog post, I argued that Europe can't win the game of who can field the best-known, most widely used smart personal assistant – and since these assistants are really impersonal, we wouldn't want to. We want an eco-system of companies doing what they do best: spotting a need, filling it well, and cooperating with each other.

We talked at the ROCKIT Roadmap Conference about data and infrastructure and necessary connections, but here I have to tread on dangerous territory – by tying lots of things together into a story about practical action for CITIA to take. All the bits of the story come from someone who was there, but do they work together to make what we need?

One of the recurring themes of the conference is that we need to share data with each other so that all of our algorithms and products improve. This needs to start with academics. Many academics feel that data collected using public resources should be public. That doesn't make releasing it easy. It's very difficult to make scientists, who are always chasing the next publishable result, stop before the end of a project so that they have enough resource left to package the data, set license terms, and release it.

Anonymization can be both necessary, and expensive – especially since there isn't general agreement about what level of certainty is legally acceptable. In addition, postdoctoral researchers and students often don't have the skills to do a decent job on packaging. They have trouble thinking like someone who doesn't already understand what they've produced, so their documentation can be very poor indeed. I think this is an important part of any future job – academic or otherwise – so training is part of the solution. However, I also think data repositories need enough support to be able to curate the good from the bad and quality check packaging in time that data producers can correct it.

I also think it's hard to engineer data sharing among companies – but we did generally agree that the best way to make headway is to start working together to target our customer contacts, picking off each vertical separately. I actually think if this were to happen, the data sharing would come as a side effect. So the main action here is finding out how CITIA can encourage this kind of working together, rather than thinking about data itself.

Another recurring theme was that we need open architectures and at least de facto standards, so that academics and businesses can each concentrate on the part they do best. That's great, but it has to get less vague very quickly. We've agreed that will be an important part of our work in the second year of our support action, but what are the best actions that actually fit a relatively small budget we have to achieve the results we need? When I get stuck on big problems like this, I try to think about the nearest successful analogue to the problem at hand, and the history of how that collaborative system emerged. What is the most similar story we can think of – some major open source system like Wordpress? Solutions from the logistics industry? I'm really not sure, myself, but someone must have a better vision than I do. Let us know!

Jean Carletta, Edinburgh University

26 March 2015

Europe's Digital Single Market must be multilingual!


Vice-President A. Ansip and R. Kalniņš

More than 3000 members and stakeholders of Europe's Language Community signed an Open Letter to the European Commission: "Europe's Digital single market must be multilingual". On the occasion of an official lunch in Riga on 26 March 2015, Rihards Kalniņš, Marketing Communications Manager at TILDE handed the Open Letter with more than 3000 printed signatures to Andrus Ansip, Vice-President of the European Commission. The latter expressed his awareness of the multilingual challenge in building the Digital Single Market, was impressed by the number of signatures and asked for further input from Europe's Language Community.

Join the initiative and sign the Open Letter!

margaretha mazura is Director of EMF Services, a consultancy in Brussels. She specializes in business models and funding opportunities for ICT projects. On a more "esoteric" level, she specialises in the history of fans (éventails, Fächer) and has an internationally recognized collection of fans. And if some time is left, she writes about them or acts as curator for fan exhibitions.

15 March 2015

Do we really want smart personal assistants?

During the ROCKIT Roadmap Conference in February, I was designated to take notes and summarize the results of the session for scenario 2, “smart personal assistants.” That's harder than it sounds! I think the most important thing we learned was that whether European businesses will be successful in this emerging technology area is highly dependent on the business models they adopt and the culture that develops in Europe around them.

When people think about smart personal assistants, they immediately assume that the goal is to build a rival to things like Siri – an engine that can assist any user on a wide range of topics. This is actually a no-go strategy for European business for two reasons.

The first is that it's not the European way. This kind of generic personal assistant makes a great sales vehicle for global giants, but is poor at delivering what the customer actually wants. By its very nature, it's really more of a smart *im*personal agent on commission. Historically, European businesses have been based on a strong business-to-business orientation. They understand their local contexts and verticals well, and provide better end-user products and services because of it. Their offerings are niche – sometimes so niche they're just ways of getting around the limitations in some technology one of the giants has been pushing – but there's no shame in that. It may not be the way to fast growth and glitzy headlines, but I for one would rather provide genuinely useful products and services. Couple that with the fact that it's what we do well and there's really no question about the right approach.

The second reason why it's a bad strategy for our community is that even if we wanted too, we couldn't compete for one simple reason – data. These systems work because the giants fielded them among enthusiasts when they were just “good enough”, and improved them massively with larger and larger amounts of data as their use grew. Yes, we need to do more to share data among ourselves, and yes, we may well have better machine learning – but they have first starter advantage. The group consensus was that it would take us ten years before we were ready to start thinking about fielding a rival system, by which time the world will look completely different.

Once we recognize that this is the shape of the game that we're in, it tells us much more about what kind of community infrastructure and cooperation we need to create in order to support each other and do better all round. That will be the subject of my next blog post.

Jean Carletta, University of Edinburgh

We invite stakeholders of all kinds to comment on these views, whether or not they were at the Roadmap Conference - are they right? Please use the comment facility below.