LT-Innovate and Alta Plana, headed by text analytics community
builder Seth Grimes, combined forces last week (4 - 5 Dec) to launch the
first LT-Accelerate conference in Brussels. This attracted a broad range of analytics
technologists and user companies to an in-depth conversation about the LT contribution to
business opportunities in text and speech analytics, with a discreet emphasis on
the multilingual European context.
For those who missed it, there’s a handy summary at Storify. The presentations and pictures of the event are on the event's website. You may also want to check out the @LTAccelerate Twitter channel and hashtag #LTA14.
Basic text analytics is now maturing,
with a growing stable of tech companies offering APIs to their NP solutions or
dashboards that help user companies make sense of their “unstructured” data.
At the same time, the relevance of the binary sentiment analysis models is
starting to reach its limits for many users, who henceforth need more insight
into how human emotions and intentions are expressed linguistically in the decision-making
process. And multilingual text data modelling continues to raise barriers for
global players, either due to the inherent structure of languages or to a lack
of reach.
Here are four takeaways that we shall explore further in
future blogs:
The future of market research:
one of the biggest users of text analytics is the Market Research industry (worth
$61.45B globally) and currently morphing into a digital player by adopting new technologies
of automated listening, mining and engaging. For MR, the future will involve among other phenomena a billion new Chinese tourists (think language, travel tech,
tourist infrastructures, and communication generally) – an extraordinary opportunity
for almost any business in Europe if they know how to address the challenge.
Getting Down to Semantics: The market opportunity for text analytics covers at least two
very different families of data: business-generated
text such as that provided by publishers, and every other customer-facing
enterprise. And user-generated data,
often sourced in social media and customer reviews.
Havas Media showed how they can now classify customer
generated data into one of the four stages of the “customer decision journey” on
the basis of linguistic cues, with a success rate of some 74%. This allows them
to automate the classification of short consumer messages and thereby vitally inform retailers and others about the crucial decision process those customers go through.
On the business content production side, Elsevier demonstrated how
they use proprietary semantic technology – known as a Fingerprint Engine - to
enrich existing text from authors, patents, and increasingly foreign language
data so that specialised STM searches can be apply concepts rather than words
alone. This can enable a science author,
for example, to find exactly the right journal that matches his research specialty.
We shall come back in a later blog to other semantic
solutions in this space.
Generation A to Z: The most unexpected data point in the whole event might well
have been the claim by Robert Dale (Arria) that “by 2020, more texts in the
world will be produced by machine than by humans.” Three European content generation
tech suppliers (Data2Content, Yseop, and Arria) addressed the apparently massive
market for automatically generating content from
data, rather than about data. The challenges here are to understand data as information
(which is where semantics comes in) and then to turn that information into a
narrative that tells a story. In a sense, therefore, what natural language generation
will be able to do is take the results of data analytics – i.e. data – and use language
technology solutions to turn it into content that humans (and also machines
presumably) want to read. Watch this space!
Relevant Data is not
always Big: Although we were treated to some large numerical data points
during the conference - IBM recorded 53 million social media posts during the
64 games of the Brazil World Cup this year and a 50-agent speech contact centre
can generate about 11Tb of voice recordings a year – the oil company Total told
a story about small data. It highlighted the extremely practical virtues of smart search, analysis and
presentation of smallish sets of highly relevant data from a corpus on oil-well safety-standard
issues. This showed how you can mine value from text data to optimise knowledge
sharing within a business. And it demonstrated that in many cases business clients
will want to tailor the solution to their own needs. A useful lesson in how to market
certain kinds of text analytics solutions!