Monday, October 27, 2014

LT-Accelerate conference highlights textual analytics and new solutions for leveraging the business value of content

Textual analytics is a relatively new business solution for monitoring, understanding and leveraging strategic information contained in documents and media. But it is already transforming the way businesses handle their own processes, learn about their customers’ desires and opinions (VoC), and mine their databases for unexpected insights to drive better performance. 

It might at first sight sound like a rather rarefied field of data science, but textual analytics (TA) is fast becoming a must-have tool for CIOs whose job it is to select technologies with strategic thrust. 

Its value lies in the opportunity to automate an information-gathering process that was previously nearly impossible because it was time-consuming and piecemeal. TA in fact invents a new business process: it is like having a tireless analyst plough through the massive amounts of language content generated around standard business operations to harness marketing and other insights. 

In contrast to many ways of generating business intelligence, textual analytics focuses on “unstructured” data. This quite literally means the life and times of text – or more generally language content. 

Although human language is a slippery medium, companies supplying TA are increasingly finding that content that might at first sight appear to be “unstructured” (i.e. when compared to numerical data) can be highly structured when viewed from the emerging expertise of natural language processing. 

Simple ‘positive’ and ‘negative’ assessments may be useful tags for certain types of textual information on a social media, but textual analytics is making rapid progress to more valuable insights into the meanings of human communications.


LT-Innovate sees three key touch points that will matter for suppliers of future text analytics solutions:
  • Drill-down semantics: Language technology is rapidly extending its powers of analysis: while the first generation of sentiment analysis tools tended to simplify opinions into binary distinctions, new semantic tools are radically expanding the power of TA solutions to deliver finer-tuned results of what is communicated. One such field is that of “emotion recognition.” This means that TA can help identify more subtle expressions of approval, interest, concern or rejection than has been possible so far, offering a finer-grained understanding of what customers, partners and the market are saying. 
  • Multilingual processing. Multiple languages pose a key challenge for any TA application. Some TA products can run text analyses in up to 20 or more languages today, a remarkable feat given the subtleties of human expressive resources. But as companies start to target some of the long tail of markets and communities, the ability to decode thoughts, feelings and opinions expressed on social media in many dozens more languages may well tempt more and more businesses. Rich linguistic resources will be needed for this, but at the same time, the results will need to be translated into a single central repository for analysis. This requires highly tailored translation technology at the right price. 
  • Specialisation. TA is increasingly targeting a broader palette of corporate language content. In some case this will include the content of contact centre conversations, the content of internal conference calls, or the input to online surveys. Which means that TA will need to integrate seamlessly with specialised technologies such as speech recognition and audio recording to gain purchase over the text implicit in new types of media? This will almost certainly lead to specialisations of TA in terms of business tasks, type of content, or industrial and commercial sector. 
In a word, TA is set to move towards new forms of contextualisation that will enable it to capture the kind of semantic nuance that makes the right distinctions to address business strategy in very different domains. 

This is just the beginning. In the longer run, it is very likely that the kind of linguistic analysis currently offered by the TA community will be embedded in a broader “cognitive computing” environment. For this scenario, learning systems will be programmed to range over corporate data to find patterns that suggest directions to be taken or decision to be made.

At the same time, CIOs will need to think about expanding the range of TA to other media than text – for example, video as image and as conversational content (and hence text) will also enter the content mix to be leveraged by analytics.  Such developments will put huge pressures on the trust we hope to 
have in the value of our analyses as they extend over more data types.

Yet at the same time, these integrated, easy-to-use solutions will need to appeal to CIOs and others who need to rapidly engage with the voice of their customers, the voice of their staff members, the voice of their service suppliers, and more generally the competitive voice of their market segments. 

LT-Innovate believes that TA opens up a promising market in which highly-specialised language technology  can provide effective responses to a business need that simply cannot be met by traditional solutions. The vital step is to recognise that one size will not fit all. Solution providers therefore need to know exactly what their potential customers need. A compelling case for textual analytics indeed! 

This is why LT-Innovate -- the Forum for Europe's language technology industry -- and US consultancy Alta Plana Corporation, headed by industry analyst Seth Grimes, are holding a brand new event entitled LT-Accelerate. This will take place in Brussels (Belgium) on 4-5 December and is devoted to the people, players and end users of Textual Analytics.

Wednesday, October 15, 2014

LT-Innovate pioneers new conference with industry analyst Seth Grimes: LT-Accelerate

LT-Accelerate is a unique European conference that will both educate the market about the value of language technologies and help solution providers understand market needs. The conference is designed to bring together user organizations, solution providers, researchers and consultants in a single venue that will provide learning, networking, and deal-making opportunities.
A joint production of LT-Innovate -- the Forum for Europe's language technology industry -- and U.S. consultancy Alta Plana Corporation, headed by industry analyst Seth Grimes, LT-Accelerate will take place on 4-5 December 2014 in Brussels.

At LT-Accelerate, you'll hear from and network with:

ESOMAR president Dan Foreman -- Shree Dandekar, Dell Software's information management strategy director -- Elsevier content and innovation VP Michelle Gregory -- Prof. Stephen Pulman of the University of Oxford -- Tony Russell-Rose, founder and director of UXLabs, a research and design consultancy -- Lipika Dey, principal scientist at Tata Consultancy Services Innovation Labs... to name a few speakers.

The full agenda is online.

Whom else will you meet? Brand and agency speakers from Havas Media, IPG Mediabrands, Sony Mobile, Telefonica, and TOTAL. Leading technologists from IBM Research, Synthesio, and the Universities of Antwerp and Sheffield. Innovative solution providers including Basis Technology, Confirmit, CrossLang, Daedalus, Ontotext and TheySay.

LT-Accelerate is about opportunity, unique in Europe.Do not miss it!

Friday, July 4, 2014

Language Prosthetics: After Life in the Digital Never Never

Media technology used to be thought of as simply an extension of the human sensorium. Now it will become an extension of our entire existence.

Back in the 1960s, media theorist Marshall McLuhan expounded a simple story about the evolution of media technology: (alphabetic) writing, print, photography, film, radio and TV are all extensions of our natural sensorium. Alphabet technology, for example, along with Chappe’s telegraph and similar devices was a visual (hence inspectable) extension of human speech’s natural capacity to produce aural verbal messages. TV (along with microscopes, telescopes and X-rays) was another extension of our visual capacity to view events, while radio and telephones were an extension of our mouths and eardrums to distant contacts.

In this tale, the history of technology recounts the gradual extension of a sensory apprehension of the world into a hardware amplifier. Since the senses are few in number, McLuhanites had to produce complicated work-arounds to save the theory. For example, the post-electric world (yes, McLuhan rarely used the term ‘electronic’ to identify the microchip revolution happening around him) would be one of “secondary orality” – in the 1970s we were gradually shutting the library door on linear visual written knowledge, and gathering together in a tribe around the oral/aural campfire of CB radio and rock, and the promise down the road of podcasts, always-listening smartphones and speech translation.

Remember The Who’s song entitled “Deaf, Dumb and Blind Boy” in their Pinball Wizard rock opera? Human sensory disabilities have systematically offered premonitory probes into the art of the technologically possible, and McLuhan’s ‘extensions-of-the-senses’ story became even more complicated when it engaged with disability.

Braille, for example, becomes a tactile extension of an alphabet which is itself a visual “extension” of spoken language. Yet in prehistory (i.e. before writing), sightless people would not have needed such a media – they would have developed acute hearing to catch the semantic grace notes of the ambient aural world. Today, though, the auditory/oral channel enabled by smartphones as an “extension” of the ear is becoming a far more powerful communication medium than Braille for the visually handicapped.

Now take the theory of media extensions into the digital world in which we live - and will increasingly die. McLuhan’s “media” have morphed into technologies (or apps) that we can use to extend our digital lives and surmount our physical failings. Braille was once a wonderful tool for accessing knowledge for the visually impaired. But now we can extend spoken knowledge to the terminally sightless, and give a plausible artificial voice to those struck dumb.

And, more grotesquely but also more touchingly, we can give the primi inter pares disabled – i.e. the truly brain dead - a new voice. Hallelujah. McLuhan had not expected the electronic nexus to afford room for the physically injured, the congenitally handicapped, or the terminally moribund.

Digital now allows us to extend our “lives” into the virtual, and broadcast our “voices” far beyond situated friends and family into the deep echo chamber of forever.

In the best of possible worlds, text-to-speech technology can invent voices for the congenitally mute. Such voices will probably be built from a cunning mix of real recorded voices chosen from a digital pool and totally artificial voices crafted into a unique timbre for someone who has become or always has been voiceless. But it raises the interesting question for a dumb speaker of which voice to choose: so watch out for “voice design” on your tech radar, especially for those who have always disliked their recorded voice.

Literature and historical movies give voices to dead souls. And we find it perfectly natural that Moses, Caesar, Elisabeth I, Catherine the Great and Mr. Bojangles have “spoken” to us from the stage or screen – the Greeks called it prosopopeia. Yet a newly crafted voice for a dead soul will eventually have to pick its way through the voice biometric devices that will underpin our online security systems.

Will these guard-dogs in future be able to handle artificial voices of real (yet currently speechless or even dead) humans? Or in the even longer run, the weirdly synthetic human voices of artificial beings – robotic avatars of the long gone?

Lastly, in a social media culture, who exactly will we be (with our digital identities and social graphs) when we shuffle off the mortal coil and go permanently virtual and post-human?

Will there be a sustainability app that keeps up our online presence as an eternally young speaking avatar (rather as actors tend to play Queen Elisabeth 1st as a forceful young woman when she was actually in her early dotage)?

Maybe this app could use intelligent methods to analyse what messages are sent us after our death, and by mining data from our previous content stack, guess what message we would have sent back. But can we or should we age that voice from spritely youth to creaky old age when we use it (post-mortem) to answer the phone? Or should we think about personality cosmetics?

My digital being is necessarily a virtual “extension” of the physical me. And analytics will inevitably characterise and embody me as a plausible avatar, sending out social-media messages digested by a smart reader with the kind of stuff I had blogged, uttered, You-tubed, tweeted, emailed, or merely “written” before.

Having an agent mine the web and automatically generate new in-genre content, I (but is it “me” any longer?) could extend my life almost indefinitely, by virtue of the smart robot that parses my old words, and keeps churning out simulacra variora of my textual life.

This is a big leap from McLuhan’s media vision. Digital media do not simply extend the reach of my senses; they transform my very persona (remember that etymologically this word means “sounding (sona) through (per)” –for example through a mask in a stage performance). Today, the irony is that I don’t have to die first. Theoretically I could sit back and watch my digital after-life evolve as an avatar of myself – and why not several different digital personae while I’m at it – and in a Joycean moment pare my fingernails from a stance of digital silence, exile and cunning. Perhaps I wouldn’t even need a “voice”.

Remember the old joke: you can never tell whether someone’s a dog on the web.

Woof woof!

Thursday, May 22, 2014

Language Technology is the drill to make Big Data "oil" flow in Europe!


It has always surprised me how much is written and talked about Big Data without pointing to the main barrier to the data revolution: our many languages (more than 60 in Europe alone). The numbers surely differ from sector to sector, but a fair guess would be that half of big data is unstructured, i.e. text. Most multimedia data is also converted to text (speech-to-text, tagging, metadata) before further processing. Text in Europe is always multilingual.

Europe prides itself of an “undeniable competitive advantage, thanks to [its] computer literacy level”. In fact, we have had this advantage for decades, but so far it hasn’t helped much. Good brains and companies are systematically bought by our American friends. No, we rather have to focus on what is specific for Europe. On what we have and the US doesn’t. Maybe even if it is a disadvantage - at first sight.

What makes Europe special and different is the fact that we are trying to build a Single Market in spite of our different cultures and systems. Our multilingualism is always seen as a challenge, a big disadvantage. Most Big Data applications only work well in English and, with some luck, okayish in German, Spanish, or French. Smaller EU countries with lesser spoken languages are basically excluded from the data revolution. The dominance of English in content and tools is the reason for the US lead in Big Data. Many European companies have reacted to this and now use English as their corporate language. But Big Data is often big because it originates from customers and citizens. And these rather use their own languages.

What if we managed to turn this perceived handicap of a multilingual Europe into an asset? Overcoming the language barriers would be a great step towards a Single Market. We would make sure that smaller Member States participate and perhaps become drivers of the data revolution. Even more importantly, Europe would become the fittest for the global markets. The BRICs and all other emerging economies do not accept any more the dominance of English. Europe has a unique chance... if it solves a problem the Americans do not have, or discover too late.

The real opportunity is therefore to create the Digital Single Market for content/data independently of the latter's (linguistic) origin. This would require that we overcome the language-silos in which most data remains captive and make all data language-neutral.

To achieve this, we urgently need a European Language Cloud. For all text based Big Data applications the European Language Cloud is a web-based set of APIs that provides the basic functionality to build products for all languages of the Single Digital Markets and Europe’s main trading partners. For more information, see my previous post.

While the European language technology industry might not have all the solutions readily available to deliver the European Language Cloud, many language resources could be pooled as a first step. In addition, many technologies are presently entering into a phase of maturity (after decades of European investment into R&D) and could be harnessed - through a set of common APIs - into a viral Language Infrastructure. This would go a long way towards delivering the European Language Cloud... without which the Big Data oil will only continue to flow from English grounds.

Jochen Hummel
CEO, ESTeam AB - Chairman, LT-Innovate

TKE 2014 – Ontology, Terminology and Text Mining 19-21 June, Berlin

The 11th International Conference on Terminology and Knowledge Engineering (TKE) takes place 19-21 June in Berlin. Learn from more than twenty selected presentations in two parallel tracks and attend two exciting key notes about most recent research in ontology, terminology and text mining:
  • "Ontology and the Illusion of Knowledge: Mines of text and nuggets of enlightenment" - Khurshid Ahmad, Trinity College Dublin.
  • "The Sphere of Terminology: Between Ontological Systems and Textual Corpora" - Kyo Kageura, University of Tokyo.
On Saturday, 21st June, a workshop on ISO language codes welcomes your active participation. Last but not least, TKE organizers and sponsors welcome you to a great conference dinner on Thursday, June 19th. Register now, available seats are limited!

The event is organized by GTW and DIN Deutsches Institut für Normung e. V. in cooperation with INRIA, Coreon, Cologne University of Applied Sciences, Copenhagen Business School, Termnet and other associations and consortia, national and international organizations.

Sunday, May 18, 2014

2014 - The year of the verticals for Europe's language technology industry

In a recent interview, the CEO of  the Spanish firm Daedalus, José Carlos Gonzalez  said with great verve that his “goal for 2014 is to cover progressively the specific needs of our clients by helping them to develop solutions in vertical markets, freeing them from the complexity of language processing technology.”

Freeing verticals from the complexity of language technologies is a necessary step forward. But it means knowing about the specific needs of industries, and how solutions can be invented that address the infrastructural conditions of these often large-scale players requiring fairly long-term

At LT-Innovate, we believe that 2014 will be the year of the verticals. This means that instead of endlessly repeating what our language technology could do if there was, as the poet said, world enough and time (and above all money), we should deliver solutions that industries actually need.

We kick-started this process of market analysis some 18 months ago and have built up a useful body of knowledge about gaps, want-to-haves, on-going problems, and the sheer lack of awareness among various verticals of the potential benefits of LT. We recently published our findings on these markets to help our members compare their experience and insight with our own efforts at trying to identify opportunities.

Each industry naturally has its specific needs, even though all of them tend to follow the trend towards breaking down information silos and stepping up cross-lingual data sharing while keeping costs down.
We found that the increasingly globalising Manufacturing industry tended to expect massively unified information centres with localised interfaces; that Tourism needed deep, multilingual sentiment analysis applications, and that Media & Publishing is increasingly requiring integrated multimodal (speech/text/image) monitoring, using multilingual speech recognition among other technologies.

We also learnt that whatever the structure of the industry, there are multiple touch points in most workflows where LT can play a role in lowering costs, improving efficiency and contributing to what we can call digital integration. Spoken interfaces can improve productivity in numerous industrial jobs, from store-room workers to clinicians making out reports on patients.

Likewise, the need for cross-lingual access to information of all sorts is now a constant in nearly every European vertical. Today these tend to be addressed by point applications; tomorrow we can expect far more integrated solutions that can adapt more effectively to specific requirements in the online workplace.

This year LT-Innovate hopes to leverage this initial knowledge base to build a clearer picture of where language & speech technology can play a differentiating, even disruptive, role in simplifying processes, adding value to operations, lowering costs and breaking down data silos in different industries in Europe. So stay in touch.

Friday, March 21, 2014

ROCKIT: Paving the Road to Future of Conversational Interaction Technologies

New conversational interaction technologies raise many business and societal opportunities.  European research can provide interactive agents that are proactive, multimodal, social, and autonomous. Moreover, it is now possible to draw data from many different sources together to provide very rich context and knowledge to use in applications.  But how can the organisations who want to exploit this technology decide what products and services to develop, and where to invest their R&D?  ROCKIT is a new strategic roadmapping initiative that will create a shared vision and innovation agenda to guide this process for all types of stakeholders in this emerging area.

Most technology roadmaps merely describe the future and speculate about what will happen if technology is left to evolve of its own accord.  ROCKIT is different – we will decide what we want the future to hold in ten years' time, and create a structured and visual map of the steps we need to take in order to realize our vision. Markets and drivers, products and services, and enabling technologies will all form different layers so that readers can see the basics at a glance, find what interests them most, and drill down into detail.   For example, if a company knows their market requires a particular service, they will be able to see exactly what technology developments and science research is required to make that service happen, complete with an assessment of the readiness for every item.  Conversely, technology providers will be able to see wider possibilities for their components than they could on their own.  

We will start by defining our vision of the future – constructing a number of key “scenarios”, our future use cases – and the drivers and constraints that assess where the community currently is in its ability to deliver that vision. We will then establish the possible routes and required developments to fulfil that vision.   With this mapping done we will be able to highlight key enablers, technology gaps, risks and resource gaps. Iteration will ensure our roadmap is robust and correct. The roadmapping process will be ably led by Vodera, which has produced roadmaps leading to sounder research and innovation programmes in diverse application domains such as automotive, aerospace, security, healthcare and environmental monitoring.

Getting all this information under control has traditionally been a problem for roadmaps – but in ROCKIT pioneers the use of SharpCloud, a new online collaborative visualisation platform that makes it much easier to capture, edit, display, and disseminate roadmap contents than was previous possible.  As a result, the knowledge of the community will be in an accessible format that allows easy identification of trends, gaps, opportunities, and resources.

A roadmap is only as good as the people who contribute to it.  ROCKIT needs the right participants, and they have to cover every stakeholder community, from R&D and system integrators to component suppliers and usability experts, and more.  SMEs are just as important as large companies and public sector research organisations, since most current commercial activity takes place there. If you want to be involved, join the Conversational Interaction Technology Innovation Alliance (CITIA) Linkedin Group or speak to the ROCKIT partners. Forthcoming Workshops are planned in conjunction with major sector events such as LREC (Reykjavik, May 2014), ICASSP (Florence, May 2014) and LT Innovate Summit (Brussels, June 2014).

Article contributed by Costis Kompis, Vodera

Costis Kompis is the managing partner of Vodera, where he helps private and public organisations align their R&D activities, develop innovation strategies for emerging technologies and design new business models to capture market opportunities.