30 April 2015

Broad Language Industry Coalition launches Call for Action: The Digital Single Market Must Be Multilingual!

At the occasion of the Riga Summit on the Multilingual Digital Single Market, held on 27-29 April 2015, a broad coalition of organisations representing the language industries came together to sign a Declaration of Common Interest.

The Riga Summit also launched a Call for Action entitled "Multilingual Europe: The Crowning Touch to the Digital Single Market".

The Declaration and Call for Action emphasise the importance of language technologies as key enablers for a truly multilingual Digital Single Market.

For additional media coverage of the Riga Summit, see "Babelitious" in The Economist, "Will the Digital Single Market be multilingual?", "Europe needs a language infrastructure, not just Google Translate" (interview of Jochen Hummel, ESTeam) and "Smaller languages could be lost in the Digital Single Market" (interview of Andrejs Vasiljevs, Tilde) in EurActiv.

29 March 2015

Data, architectures, and other forms of cooperation

In my last blog post, I argued that Europe can't win the game of who can field the best-known, most widely used smart personal assistant – and since these assistants are really impersonal, we wouldn't want to. We want an eco-system of companies doing what they do best: spotting a need, filling it well, and cooperating with each other.

We talked at the ROCKIT Roadmap Conference about data and infrastructure and necessary connections, but here I have to tread on dangerous territory – by tying lots of things together into a story about practical action for CITIA to take. All the bits of the story come from someone who was there, but do they work together to make what we need?

One of the recurring themes of the conference is that we need to share data with each other so that all of our algorithms and products improve. This needs to start with academics. Many academics feel that data collected using public resources should be public. That doesn't make releasing it easy. It's very difficult to make scientists, who are always chasing the next publishable result, stop before the end of a project so that they have enough resource left to package the data, set license terms, and release it. 

Anonymization can be both necessary, and expensive – especially since there isn't general agreement about what level of certainty is legally acceptable. In addition, postdoctoral researchers and students often don't have the skills to do a decent job on packaging. They have trouble thinking like someone who doesn't already understand what they've produced, so their documentation can be very poor indeed. I think this is an important part of any future job – academic or otherwise – so training is part of the solution. However, I also think data repositories need enough support to be able to curate the good from the bad and quality check packaging in time that data producers can correct it.

I also think it's hard to engineer data sharing among companies – but we did generally agree that the best way to make headway is to start working together to target our customer contacts, picking off each vertical separately. I actually think if this were to happen, the data sharing would come as a side effect. So the main action here is finding out how CITIA can encourage this kind of working together, rather than thinking about data itself.

Another recurring theme was that we need open architectures and at least de facto standards, so that academics and businesses can each concentrate on the part they do best. That's great, but it has to get less vague very quickly. We've agreed that will be an important part of our work in the second year of our support action, but what are the best actions that actually fit a relatively small budget we have to achieve the results we need? When I get stuck on big problems like this, I try to think about the nearest successful analogue to the problem at hand, and the history of how that collaborative system emerged. What is the most similar story we can think of – some major open source system like Wordpress? Solutions from the logistics industry? I'm really not sure, myself, but someone must have a better vision than I do. Let us know!

Jean Carletta,  Edinburgh University

26 March 2015

Europe's Digital Single Market must be multilingual!

Vice-President A. Ansip and R. Kalniņš
More than 3000 members and stakeholders of Europe's Language Community signed an Open Letter to the European Commission: "Europe's Digital single market must be multilingual". On the occasion of an official lunch in Riga on 26 March 2015, Rihards Kalniņš, Marketing Communications Manager at TILDE handed the Open Letter with  more than 3000 printed signatures to Andrus Ansip, Vice-President of the European Commission.  The latter expressed his awareness of the multilingual challenge in building  the Digital Single Market, was impressed by the number of signatures and asked for further input from Europe's Language Community.

15 March 2015

Do we really want smart personal assistants?

During the ROCKIT Roadmap Conference in February, I was designated to take notes and summarize the results of the session for scenario 2, “smart personal assistants.” That's harder than it sounds! I think the most important thing we learned was that whether European businesses will be successful in this emerging technology area is highly dependent on the business models they adopt and the culture that develops in Europe around them.

When people think about smart personal assistants, they immediately assume that the goal is to build a rival to things like Siri – an engine that can assist any user on a wide range of topics. This is actually a no-go strategy for European business for two reasons.

The first is that it's not the European way. This kind of generic personal assistant makes a great sales vehicle for global giants, but is poor at delivering what the customer actually wants. By its very nature, it's really more of a smart *im*personal agent on commission. Historically, European businesses have been based on a strong business-to-business orientation. They understand their local contexts and verticals well, and provide better end-user products and services because of it. Their offerings are niche – sometimes so niche they're just ways of getting around the limitations in some technology one of the giants has been pushing – but there's no shame in that. It may not be the way to fast growth and glitzy headlines, but I for one would rather provide genuinely useful products and services. Couple that with the fact that it's what we do well and there's really no question about the right approach.

The second reason why it's a bad strategy for our community is that even if we wanted too, we couldn't compete for one simple reason – data. These systems work because the giants fielded them among enthusiasts when they were just “good enough”, and improved them massively with larger and larger amounts of data as their use grew. Yes, we need to do more to share data among ourselves, and yes, we may well have better machine learning – but they have first starter advantage. The group consensus was that it would take us ten years before we were ready to start thinking about fielding a rival system, by which time the world will look completely different.

Once we recognize that this is the shape of the game that we're in, it tells us much more about what kind of community infrastructure and cooperation we need to create in order to support each other and do better all round. That will be the subject of my next blog post.

Jean Carletta University of Edinburgh

We invite stakeholders of all kinds to comment on these views, whether or not they were at the Roadmap Conference - are they right? Please use the comment facility below.

13 February 2015

A glance at the latest and most comprehensive Roadmap for Conversational Interaction Technologies

The CITIA Roadmap Conference, with an impressive line-up of speakers and panellists, is around the corner (24-25 Feb in Brussels). We are excited to announce that the first version of the ROCKIT strategic roadmap, which will drive many of the sessions of the upcoming Conference and used to set the priorities of CITIA, is now available to view online.

To main goal of the roadmap is to engage public or private research organisations, including SMEs, into a constructive discussion towards full exploitation of new conversational interaction technologies. The roadmap should enable our community to compare prominent use cases, products and services, science and engineering capabilities, as well as readiness, needs and timeframes for future R&D. Entrepreneurs and researchers can use it to focus their R&D, partnerships and related strategic efforts.

This first version of the roadmap is the result of an on-going (2-year) consultative process which in its first year alone involved over 100 experts who provided their input during five physical workshops organised in conjunction with major sector events. For the science/technology areas alone, some 1,000 inputs were captured during these workshops. All those inputs were clustered, filtered and linked together across several layers, including:
  • 10 societal Drivers & Constraints
  • 5 generic R&D Scenarios
  • 10 Product/Service Types, with the added value, a SWOT analysis and a 10 year timeline for each of them.
  • 8 Science/Technology Areas, with cluster and 10 year timelines for each of them.
  • 7 Resource Types

The graphical version as well as a presentation of the first version of the roadmap is easily accessible via http://tinyurl.com/ROCKIT-v1

Figure 1 Initial view showing five interrelated layers.

The initial view (depicted in Figure 1) shows the main interacting layers, including: Drivers & Constraints (the “why”); Scenarios; Product/Service Types (the “what”); Science/Technology Areas (the “how”); and Resources. By hovering over any item, one may choose to see either a) a short description of that item or b) the cross-layer relationships with other items.

Clicking on any of the Product/Service Types or Science/Technology Areas allows drilling down to detailed information such as a SWOT analysis of a Product/Service Type (Figure 2), or the foreseen 10 year timeline for a particular Science/Technology Area (Figure 3).

Figure 2 The SWOT analysis of Generic Personal Assistants, under Product/Service Types.

Figure 3 The 10 year timeline of the Natural Language Interpretation & Generation, under Science/Technology Areas.

We now very much encourage discussions around the roadmap’s contents before, during or after the Conference. We are particularly interested in:
  • Verifying relationships between items
  • Establishing their readiness levels as well as
  • Measuring their expected social and economic impact.

If you want to be involved, just create a free account and visit the roadmap to add your comments and cast your votes.

Article contributed by Costis Kompis, Vodera

Costis Kompis is the managing partner of Vodera, a company that supports private and public organisations align their R&D activities, develop innovation strategies for emerging technologies and design new business models to capture market opportunities.

09 December 2014

LT-Accelerate: A Major Text Analytics-Meets-Multilingual Talkfest in Brussels

LT-Innovate and Alta Plana, headed by text analytics community builder Seth Grimes, combined forces last week (4 - 5 Dec) to launch the first LT-Accelerate conference in Brussels. This attracted a broad range of analytics technologists and user companies to an in-depth conversation about the LT contribution to business opportunities in text and speech analytics, with a discreet emphasis on the multilingual European context.

For those who missed it, there’s a handy summary at Storify. The presentations and pictures of the event are on the event's website. You may also want to check out the @LTAccelerate Twitter channel and hashtag #LTA14.

Basic text analytics is now maturing, with a growing stable of tech companies offering APIs to their NP solutions or dashboards that help user companies make sense of their “unstructured” data. At the same time, the relevance of the binary sentiment analysis models is starting to reach its limits for many users, who henceforth need more insight into how human emotions and intentions are expressed linguistically in the decision-making process. And multilingual text data modelling continues to raise barriers for global players, either due to the inherent structure of languages or to a lack of reach.

Here are four takeaways that we shall explore further in future blogs:

The future of market research: one of the biggest users of text analytics is the Market Research industry (worth $61.45B globally) and currently morphing into a digital player by adopting new technologies of automated listening, mining and engaging. For MR, the future will involve among other phenomena a billion new Chinese tourists (think language, travel tech, tourist infrastructures, and communication generally) – an extraordinary opportunity for almost any business in Europe if they know how to address the challenge.
Getting Down to Semantics: The market opportunity for text analytics covers at least two very different families of data: business-generated text such as that provided by publishers, and every other customer-facing enterprise. And user-generated data, often sourced in social media and customer reviews.
Havas Media showed how they can now classify customer generated data into one of the four stages of the “customer decision journey” on the basis of linguistic cues, with a success rate of some 74%. This allows them to automate the classification of short consumer messages and thereby vitally inform retailers and others about the crucial decision process those customers go through.
On the business content production side, Elsevier demonstrated how they use proprietary semantic technology – known as a Fingerprint Engine - to enrich existing text from authors, patents, and increasingly foreign language data so that specialised STM searches can be apply concepts rather than words alone.  This can enable a science author, for example, to find exactly the right journal that matches his research specialty.
We shall come back in a later blog to other semantic solutions in this space.  

Generation A to Z: The most unexpected data point in the whole event might well have been the claim by Robert Dale (Arria) that “by 2020, more texts in the world will be produced by machine than by humans.” Three European content generation tech suppliers (Data2Content, Yseop, and Arria) addressed the apparently massive market for automatically generating content from data, rather than about data. The challenges here are to understand data as information (which is where semantics comes in) and then to turn that information into a narrative that tells a story. In a sense, therefore, what natural language generation will be able to do is take the results of data analytics – i.e. data – and use language technology solutions to turn it into content that humans (and also machines presumably) want to read. Watch this space!

Relevant Data is not always Big: Although we were treated to some large numerical data points during the conference - IBM recorded 53 million social media posts during the 64 games of the Brazil World Cup this year and a 50-agent speech contact centre can generate about 11Tb of voice recordings a year – the oil company Total told a story about small data. It highlighted the extremely practical virtues of smart search, analysis and presentation of smallish sets of highly relevant data from a corpus on oil-well safety-standard issues. This showed how you can mine value from text data to optimise knowledge sharing within a business. And it demonstrated that in many cases business clients will want to tailor the solution to their own needs. A useful lesson in how to market certain kinds of text analytics solutions!

21 November 2014

Quick Q&A: On the Earned Media Value of a Brand’s Social Activities

Earned, paid, and owned media are distinct species. If you haven’t laid out cash for a mention of your brand, product, or personnel in a media outlet, whether online or social, you’re deemed to have earned the coverage. (Take “earned” with a grain of salt. You may have laid out big bucks for a publicist or efforts to build your brand’s visibility.) If you’ve bought the coverage — advertising, for instance — that’s paid. And if it’s your outlet, then that media is owned.

Whether media is earned, paid, or owned, you want to measure the extent of attention and the effectiveness of your message. The effort can get quite involved, when multiple channels and multiple exposures are in the mix. The get a precise picture, you have to engage in attribution modeling. When social platforms come into play, the effort can be substantial.

General social business challenges, and technical responses, are central topics at LT-Accelerate, a unique European conference, taking place December 4-5, 2014 in Brussels. We’ll have Roland Fiege of IPG Mediabrands speaking, on methodologies and tools for measuring the earned value of brand social-media activity. If this topic interests you as well, you’ll want to learn more. A quick Q&A I recently conducted with Roland is a start, then I hope you’ll join us in Brussels. First a brief bio –

Roland Fiege is head of social strategy at Mediabrands Social, home of Performly. In his spare time, he is working on a PhD project researching methodologies for measuring the value add of marketing on Facebook and Twitter. And next, -

Our interview with Roland Fiege

Q1: The topic of this Q&A is social media analytics. What’s your personal SMA background and your current work role?
Roland Fiege: My personal SMA background started with consulting projects evaluating social media listening systems back in 2009. In 2010-11, I was part of an international team at US technology company MicroStrategy that developed a solution that analyzed the social graphs of Facebook users to help brands to understand the interests and affinities of their “fans” better.

In my current work role, we analyze user interactions responding to brand messages on social media channels and have developed a model that attributes an monetary “earned media value” to these interactions. This allows brands to quantify and valuate the outcome of their social media investments.

In my current work role, we analyze user interactions responding to brand messages on social media channels and have developed a model that attributes an monetary “earned media value” to these interactions. This allows brands to quantify and valuate the outcome of their social media investments.
Q2: What are key technical and business goals of the analyses you’re involved in?
Roland Fiege: The technical challenges are to keep the solution up to date with ongoing API changes by the most popular social networks and how to loop back “real time” bidding price benchmarks into our systems (vs. a static benchmark). Another challenge is to meet the EU data privacy standards that enterprises,German especially, try to comply with.

Business-wise, the challenge is to establish a common understanding how to attribute and valuate user interactions.

Business-wise, the challenge is to establish a common understanding how to attribute and valuate user interactions.
Q3: And what particular analytics approaches or technologies do you favor, whether for text, network, geospatial, behavioral, or other analyses?
Roland Fiege: We basically gave up on automated text analysis when it comes to sentiment. It never worked in Europe with all the different languages, dialects, irony etc. There was too much manual work involved that clients were not willing to pay for.

Currently we concentrate on the quantification for user engagement and its financial valuation.

Q4: To what extent do you get into sentiment and subjective information?
Roland Fiege: Our experience is that if users like, share, and comment on brand content, it mostly is positive or neutral sentiment involved. Contrary to this, most user posts on brand channels are negative and in correlation with negative customer experiences. Since we measure the monetary value of brand communication, we only measure fans/follower interactions on brand content.
Q5: How do you recommend dealing with high-volume, high-velocity, diverse social postings — to ensure that analyses draw on the most complete and relevant data available and deliver the most accurate results possible?
Roland Fiege: We do not only rely on the APIs that Twitter, Facebook and YouTube (Google) provide but also user other (fire hose) data providers to get the most complete picture/dataset, also for retrospective analysis.
Q6: Could you provide an example (or two) that illustrates really well what you’ve been able to accomplish via SMA, that demonstrate strong ROI?
Roland Fiege: What we accomplish: Clients manage to optimize their content strategies in near real time, can compare the performance of their content (agencies) in different regions and countries, and can identify savings potential in the millions. It is the first time brands can calculate the total cost of ownership of their social media channels and have a clear Input vs. Outcome result all condensed into one KPI: Money.
Q7: I’m glad you’ll be speaking at LT-Accelerate. Please tell me about your presentation, briefly: What attendees will learn.
Roland Fiege: In this talk you will learn about the latest methodologies and tools to measure the Earned Media value of a brand’s activities on Facebook, Twitter and YouTube in hard currency.
Q8: Finally, do you have recommendations to share, regarding choice of data sources, metrics, analytical methods, and visualizations, in order to best align with desired business outcome?
I will share those in my presentation in as much detail as possible.
Thank you, Roland, for your responses. I’m looking forward to hearing more, at LT-Accelerate in Brussels.

This Interview has been done by Mr. Seth Grimes, the leading industry analyst covering text analytics, sentiment analysis, and analysis on the confluence of structured and unstructured data sources and founder of Alta Plana Corp.