What's happening in the Language Technology industry, from LT-Innovate
Not searching, finding:
There was naturally a lot of coverage by the technorati of Google’s inclusion of semantic boxes in its search results. Plus claims that Chines search engine Baidu had been doing this already (and they say doing it better). Google’s Knowledge Graph reflects the on-going evolution from showing keyword hits to processing results into meaningful categories – people, places, etc. LT watchers will know that this semantic shift has been on the books for a long time. It even appears to mirror a certain view of how humans learn to process language: first collecting data (by listening to mum and dad), and then gradually finding patterns in these data. Somehow we then categorize patterns into meaningful units that give a virtual map of our communicative universe, aka concepts.
And we test them by communicating with/about them with others. Google has indeed simultaneously published a concept dictionary (helped by work from the Ixa Group at the University of the Basque Country) that illustrates the relationships between data words and their semantic concepts. Also in the semantic space, PoolParty, the Semantic Information Management product from The Semantic Web Company, has announced a revamped version of its own semantic discovery application that demoes some of the conceptually-related information functionality that we shall soon come to take for granted when searching with any of our engines.
Speaking of Market Size:
Global Industry Analysts has published a report on the speech technology market in 2017, forecasting it will be worth about $31B ($3.3B of this for the IVR market). Ironically this is the same size as the global language services (outsourced translation and interpreting) market in 2011 as estimated by Common Sense Advisory. By contrast, analytics expert Seth Grimes estimated the size of the text analytics market to be “approaching $1B” last year, while Gartner says analytics is part of a $10B market for Business Intelligence software. Language services are growing by 7.4% a year, some estimates are that analytics is growing 25%/year. The speech tech market is also one of the fastest growing areas of ICT, and is expanding into new application areas almost by the week: from cars to healthcare to gaming to virtual assistants to security/identity to consumer interfaces and more. One example among many: BigHand, the UK recording/transcription supplier for lawyers and healthcare professionals, was acquired last week by investment company Bridgepoint Development Capital, who intend to support the growth of its technology base (especially in SaaS) and footprint. This could perhaps spell less reliance on Nuance as its main supplier. BigHand had launched a major awareness-raising campaign earlier this year and seems convinced that this market is exploding. In what is clearly a big data world, speech tech suppliers like BigHand and especially contact centre managers will be able to exploit their vast collections of what analyst Dan Miller of Opus Research (specialists in the speech market) nicely calls “speechable” moments. Speech (and speech translation) technology developers need lots of data in various voices, dialects and styles covering multiple subjects to train their engines. One source could be recordings of interpreting, another could be contact centre conversations. Speech data from interpreting services (medical and legal events, or conference situations) have so far not featured on the LT radar. This is possibly due to IP and privacy constraints. Contact centres however, are now generating large recorded speech corpora. Both sources would offer a huge resource for companies building speech technology – and eventually translation services.