LT-Innovate - The Association of the Language Technology Industry: December 2014

LT-Innovate and Alta Plana, headed by text analytics community builder Seth Grimes, combined forces last week (4 - 5 Dec) to launch the first LT-Accelerate conference in Brussels. This attracted a broad range of analytics technologists and user companies to an in-depth conversation about the LT contribution to business opportunities in text and speech analytics, with a discreet emphasis on the multilingual European context.

For those who missed it, there’s a handy summary at Storify. The presentations and pictures of the event are on the event's website. You may also want to check out the @LTAccelerate Twitter channel and hashtag #LTA14.

Basic text analytics is now maturing, with a growing stable of tech companies offering APIs to their NP solutions or dashboards that help user companies make sense of their “unstructured” data. At the same time, the relevance of the binary sentiment analysis models is starting to reach its limits for many users, who henceforth need more insight into how human emotions and intentions are expressed linguistically in the decision-making process. And multilingual text data modelling continues to raise barriers for global players, either due to the inherent structure of languages or to a lack of reach.

Here are four takeaways that we shall explore further in future blogs:

The future of market research: one of the biggest users of text analytics is the Market Research industry (worth $61.45B globally) and currently morphing into a digital player by adopting new technologies of automated listening, mining and engaging. For MR, the future will involve among other phenomena a billion new Chinese tourists (think language, travel tech, tourist infrastructures, and communication generally) – an extraordinary opportunity for almost any business in Europe if they know how to address the challenge.

Getting Down to Semantics: The market opportunity for text analytics covers at least two very different families of data: business-generated text such as that provided by publishers, and every other customer-facing enterprise. And user-generated data, often sourced in social media and customer reviews.

Havas Media showed how they can now classify customer generated data into one of the four stages of the “customer decision journey” on the basis of linguistic cues, with a success rate of some 74%. This allows them to automate the classification of short consumer messages and thereby vitally inform retailers and others about the crucial decision process those customers go through.

On the business content production side, Elsevier demonstrated how they use proprietary semantic technology – known as a Fingerprint Engine - to enrich existing text from authors, patents, and increasingly foreign language data so that specialised STM searches can be apply concepts rather than words alone. This can enable a science author, for example, to find exactly the right journal that matches his research specialty.

We shall come back in a later blog to other semantic solutions in this space.

Generation A to Z: The most unexpected data point in the whole event might well have been the claim by Robert Dale (Arria) that “by 2020, more texts in the world will be produced by machine than by humans.” Three European content generation tech suppliers (Data2Content, Yseop, and Arria) addressed the apparently massive market for automatically generating content from data, rather than about data. The challenges here are to understand data as information (which is where semantics comes in) and then to turn that information into a narrative that tells a story. In a sense, therefore, what natural language generation will be able to do is take the results of data analytics – i.e. data – and use language technology solutions to turn it into content that humans (and also machines presumably) want to read. Watch this space!

Relevant Data is not always Big: Although we were treated to some large numerical data points during the conference - IBM recorded 53 million social media posts during the 64 games of the Brazil World Cup this year and a 50-agent speech contact centre can generate about 11Tb of voice recordings a year – the oil company Total told a story about small data. It highlighted the extremely practical virtues of smart search, analysis and presentation of smallish sets of highly relevant data from a corpus on oil-well safety-standard issues. This showed how you can mine value from text data to optimise knowledge sharing within a business. And it demonstrated that in many cases business clients will want to tailor the solution to their own needs. A useful lesson in how to market certain kinds of text analytics solutions!

09 December 2014

LT-Accelerate: A Major Text Analytics-Meets-Multilingual Talkfest in Brussels