29 April 2013

Note to Diary for 26-27 June in Brussels: LT-Innovate is Holding the "Towards the Multilingual Digital Single Market" Summit

LT-Innovate is holding its Summit in Brussels at the end of June with the aim of bringing together the European LT community, review our progress as a community and set the stage for the next concrete steps in building a more integrated language technology vendor community. We’re very excited about this and we very much hope all our members will register, attend and make their voice heard. 

LT-Innovate research estimates that in 2011 the worldwide LT market, including software and services, was worth €19.3B. By 2015, we estimate that this should grow to nearly €30B, much of the growth coming from translation software and services. This is why we are entitling the Summit “Towards the Multilingual Digital Single Market” – a compelling reason for leading LT players to explore ways of ensuring that European companies capture as much as possible of this unique multilingual market. 

Day I will focus on what LT-Innovate as a project has achieved in terms of its business model and administrative setup. In the afternoon we are inviting all the CEOs of our member organisations and selected panellists to explore five key questions about the status and potential of the European LT market. The idea is to confront vision with hard data so that we all operate in terms of a realistic but insightful model of the market we are addressing. 

Day 2 will focus fully on the Summit theme - the challenge of technologically enabling the Multilingual Digital Single Market. Through plenary panel discussions and group discussions, we shall look closely at three key future issues: building LT innovation scenarios for the medium term; creating the European Language Cloud as an enabling LT infrastructure; and constructing core service platforms as collaborative LT marketplaces. 

Innovation is all around us in apps, social media, and cloud content management systems to virtual assistants and automated translation solutions. But all these developments still tend to be fragmented in scope and vision – addressing a market or a country here, a language or two there, this device but not that one, and so on. 

Our research shows that it is time to move beyond fragmented, national marketplaces, and build synergies and commonalities that can lead to productive new technology alliances. We need to break down the “silo” organization of the way language is perceived in our communities, and organize a language industry that can enable any industry, sector or business to improve its competitiveness in Europe’s single marketplace 

Sound like a tough call? Just come and help us make the great leap forward, and launch the European Language Cloud as a powerful catalyst for a language-neutral content market in Europe. 

If you need the backstory to our ideas, please read the LT-I Industry Vision Statement and Market Report, see www.bit.ly/13UAaST). If you don’t agree and have better suggestions, an even better reason to come: we need to hear them! 

19 April 2013

Textkernel has the wind in its sails in the Human Resources management Industry

Pôle Emploi, the French national employment agency, has officially announced Textkernel and Accenture as winners of their European tender on integration of semantic technology in CV and job processing. For Pôle Emploi, Textkernel will check whether vacancies meet the legal requirements, in particular in the field of anti-discrimination. Besides job parsing, Textkernel also delivers CV parsing and anonimisation for this assignment.

The new release of Search!, the semantic cv search software from Textkernel, offers besides some new spectacular new features also a new integration with the largest job site of the Netherlands: Jobbird.com

On April 24th, Textkernel is organising its second webinar in the field of semantic recruitment technology. This free webinar “Active Sourcing and the role of Technology” will take place online and will be held in English by Jakub Zavrel, CEO of Textkernel. Jakub will take a look at the concept of active sourcing, the road that semantic technology is taking in this subject and the part Textkernel is playing with its semantic search tool ‘Search!‘.

Textkernel is a member of LT-Innovate network.
If you are an Language Technology company, register now!

10 April 2013

Download LT2013: Status and Potential of the European Language Technology Markets

Download your FREE copy here

LT2013: Status and Potential of the European Language Technology Markets - Key Extracts

Discover some key extracts of LT2013, LT-Innovate's landmark report on the Language Technology industry:

A new Ecosystem

New open paradigms, language‐neutral development platforms and multilingual development resources could foster disruption, particularly in Europe. (p. 5)

The ability to manage and process the tsunami of data across the world’s languages is one of the biggest challenges in the new ICT ecosystem, and one for which LT is a critical enabling technology. (p. 6)

LT is baked into the future of ICT in the mobile/social/global world of computing. (p.7)

In the era of semantics – when we need to know the meaning of the data that flows around the digital universe – Language Technology is essential for innovation. (p.8)

Although LT has been a commercial market for many years, only recently have technological conditions made it possible to exploit LT on a large scale. (p.9)

Markets for Language Technology

The fastest growth is in non‐European languages, though Spanish and Portuguese gain significance because of Latin American markets. Aside from English, Spanish and Portuguese, only five other EU languages (German, French, Italian, Polish and Dutch), out of 60 or more spoken in the Union, are published on more than 1% of the top million sites. (p. 10)

While the potential is for a single European digital market with 500+ million customers, the reality is a series of fragmented linguistic markets, none bigger than possibly 70/80 million customers, most much smaller. (p. 11)

At present no company or website could be genuinely global using the localisation techniques currently at our disposal. Only with large‐scale automation will the limited multilinguality of the web be transformed into a genuinely globally accessible medium. (p. 11)

Where language is the very stuff of our digital system – customer interactions, employee conversations, technical and scientific knowledge, cultural and social objects of all kinds – the era of the Lingua Franca is over. Interacting across the many languages of the digital world is no longer optional. (p. 12)

Europe’s share of the worldwide market will increase slightly to 38% over the five year period. However, that share is significantly lower (24% in 2015) for the software portion of the market... Factors that could change this include:

  • Faster and more extensive deployment of content applications in more European languages, in a coherent framework for all languages
  • Development – and integration – of speech components (for recognition, generation and identification/verification) in more European languages, affordably available for European application and solution developers
  • Large‐scale deployment of open source machine translation in open environments using shared resources
  • Large‐scale sharing of resources (paid and free) throughout the European industry
  • Development of vertical and industry‐specific platforms for LT development and deployment, engaging whole industries in cooperative initiatives (analogous to SWIFT in banking) (p. 21)

Collaboration between the industry and data owners will be needed. (p. 22)

Many IT managers are still relatively unaware of the benefits that LT can provide them... Suppliers, LT vendors and IT integrators should work closer and harder to identify killer business cases, increase market awareness and deploy market strategies understanding how economic return affects clients, developing modular/incremental products, and forging cross‐industry alliances for to strengthen market channels. (p. 24)

LT‐Innovate estimates that there are around 500 companies in Europe either actively developing Language Technology, or embedding its features in their products and services in an innovative way... The industry comprises mostly small companies, concentrated in the western and northern regions of the EU, with a mix of long‐established players but also a significant number of new entrants. A quarter of the companies are micro‐enterprises with fewer than 10 employees, while only 6% have more than 200 employees; almost the entire industry is composed of SMEs... Over half the industry comprises companies active for more than 10 years, many that remain small. The fact that so many companies fail to scale, even after years in business, is unusual in a technology industry, and indicative of the market context for LT in Europe: local/national companies with expertise in local languages serve local markets with services based on their own languages. This state of affairs is not likely to be sustainable, as cloud‐based language‐enabled services are launched on a large scale. At present, few European companies are in a position to compete in an ecosystem where access to technology, rather than narrow linguistic expertise, is the driving factor. (p. 25)

Innovation in the LT Industry

The dynamics of the general software market, and the limits of what is currently possible for niche LT SMEs, strongly suggest that a Digital Language Infrastructure for Europe could both unlock potential for the industry, and help meet the need for pervasive “multilinguality” in Europe’s digital economy. The industry itself should define the nature and content of the infrastructure, what features are appropriately shared and open, what should remain in the commercial IP realm. (p. 33)

The review of conditions in the LT industry suggests that collaborative approaches to the market could break through the fragmentation that is evident. (p. 33)

Asymmetric partnering for SMEs is a natural route to developing technologies in specialist areas with steep technical demands (heavy R&D), where domain expertise is key. Peer partnering takes the alternate route of creating new “breakout” categories of products or services through the collaborative combination of complementary technologies... Dominant markets are those where technical depth meets the greatest opportunity. (p. 38)

Interactive, Multilingual, Intelligent

The speech applications market shows immense potential, and it is expected to grow rapidly in the next few years... The market is mainly driven by the increased demand in the Mobile Devices segment. This segment is witnessing high demand for speech recognition applications because of the increase in the number of regulations on the use of mobile phones while driving. (p. 47)

We can debate whether the translation industry’s response to rapid globalisation and growth in content has been the right one. Has the industry made best use of technology to raise its capacity and stay profitable? Or has the content explosion marginalized an industry of artisans? (p. 55)

Intelligent content refers to content that is structurally rich and semantically aware, and is therefore discoverable, reusable, reconfigurable and adaptable and which is not limited to one purpose, technology or out put. These technologies rely on underlying techniques and tools such as natural language processing (NLP),categorization and clustering engines, and statistical approaches for processing the outputs of human language, such as written or spoken texts. (p. 83)

Download the full report here.

LT2013: Status and Potential of the European Language Technology Markets - Executive Summary

On 8 April, LT-Innovate released a landmark report which provides a comprehensive survey of the state of the Language Technology (LT) market in Europe today and projections for the next five years. It is divided into six parts, covering global trends in the ICT ecosystem, an analysis of specific trends in the LT industry, an exploration of innovation options for European LT companies, and a detailed account of the three strategic technology segments of speech interaction, multilingual communication and translation, and intelligent content that make up the LT market.

Mobile communications, cloud service models and social media are transforming the way citizens, companies and public administrations act in the digital world. This report identifies three deep trends driving nextgeneration ICT that will open up significant opportunities for LT:
  • Unified Communication: cross‐platform, multimodal and multilingual. Mobile connectivity and service unification across devices and platforms will offer business and consumer users seamless communications 
  • Unified Information Access: in any language and across languages. This will remove barriers to content and enable integrated messaging, conferencing, collaboration, content ‐ and data‐sharing based on intelligent content and applications, multilingual and interactive systems and technologies.
  • Unified User Experience, based on natural interaction with machines and processes, in any language.
This will remove barriers to the access, use and understanding of information from large volumes of unstructured, semi‐structured, and structured data. LT is the critical enabling technology for each of these fundamental trends and stands to benefit from the emerging interconnections between interaction (speech), information processing (intelligent content) and automatic translation in a multilingual connected digital space. It is therefore vital for the European LT industry to embrace and foster the opportunities the rapidly evolving ICT eco‐system offers and to pursue a dynamic innovation agenda ahead of its competitors globally.
LT‐Innovate has developed a market model to estimate the size of LT market in terms of sales and services.

The worldwide LT market is worth around €19.3B today and should grow to nearly €30B by 2015. The European speech technology market is growing by 9.7%; and should grow to €8.6B by 2015. The intelligent content market is set to grow to €6.2B. The translation technology market is worth some €8.6B and should grow to €14.9B. The growth rates in the “Rest of the World (ROW)” markets should be significantly higher than in Europe and the Americas as these emerging markets mature. The translation technology segment will continue to dominate the European LT market.

In terms of market participants, there are some 500 European companies actively developing or integrating LT, most of them still small companies and all too often, focus on niches in their national (language) markets. However, the European LT industry is gradually moving LT up the value chain into mainstream applications and markets. Furthermore, the gaps in language coverage for speech and content technology, and the potential to create a demand‐driven dynamic holds significant potential for growth of the LT industry across Europe.

To facilitate this strategic growth, it is suggested that the pace of development could be accelerated through collaborative innovation bringing together LT companies with their peers and other corporate actors and buyers across the ICT value chain. Various scenarios for this process are explored in the Report.The final three sections analyse in detail the history, companies and product/service offerings in the key LT segments of speech, translation and intelligent content technology, providing a guide to key players and their role for the three different application areas.

Download the full report here.

04 April 2013

Interview with Lena Bayeva

Lena Bayeva is a Computational Linguist at Textkernel, an innovative company that specializes in multilingual semantic recruitment technology. In this interview, Lena shares her experience with job hunting, tells us her story and opinions on present and future of NLP and Computational Linguistics.

  • How did you get to the world of  computational linguistics? Please tell us where and what did you study and what is your previous work experience?
I started out studying Computer Science at Portland State University. During this time I took a few Machine Learning courses, which got me really excited about the field. I then worked as a developer for IBM (Oregon), but never gave up the idea of studying Machine Learning. I went on to get a Masters in Artificial Intelligence at the University of Amsterdam with a focus on Machine Learning and Information Extraction. Machine Learning courses gave me a very good foundation for Computational Linguistics that uses a lot of the general ML algorithms. Understanding the mechanics behind learning is quite important when applying it to solve a specific problem.  Instead of blindly applying some learning algorithm it helps to know things like what types of learning algorithms are out there and what are the differences, why some methods are better for certain domains/data sets than others,  understanding the source of error, and so on. A course in Elements of Language Processing and Learning was quite helpful as well. It was based on a Speech and Language Processing book by Jurafski and Martin. I highly recommend it as a starting point.
  • What work do you do now? What is your company and your current responsibilities?
I’m currently working for Textkernel - a company that uses language technology to deliver solutions to HR sector that include CV parsing, Semantic Search and Match (of candidates to vacancies and visa versa). I’m happy to say that I get to apply my Machine Learning skills at my job. Among my responsibilities are  development of multi-lingual information extraction models, preprocessing of text and post processing the result, evaluation and error analysis. Of course there is a lot of software engineering involved as well.  
  • How difficult was it to find a job in industry after graduation?
I was lucky to have found a job rather quickly. There are a few small companies in Amsterdam that use some form of AI, but many more of them across Europe and US.  If you are willing to relocate there are plenty of job opportunities.
  • What is your advice to the recent graduates looking for a job in industry?
There are a lot of articles on the web that provide useful advice to recent graduates and job seekers in general. I’m going to mention a few things that I found useful personally.
  1. Think of your dream job and focus your search keeping both your experience to date and your dream job in mind. Your first job might not be the dream job, but it should take you a step or more closer to the dream job and help you develop the necessary skills.  
  2. Networking always helps - talk to your friends and professors, and join the groups that specialize in your field.  
  3. Capitalize on your achievements - projects you’ve worked on, perhaps your thesis or publications, anything to show the employers you have a deep understanding of  some aspect of the field and/or relevant experience.  
  • What do you think about the near future of NLP – which areas are going to grow in the next few  years?
First and foremost, I see the need for development of NLP methods that effectively use unlabeled data for learning. Much has been done in the direction of semi-supervised and unsupervised learning, however there’s still a need for better understanding of when the unlabeled data can be advantageous and how it can be best incorporated.

Second, transfer and multi task learning techniques are a promising way to reuse shared information across domains/tasks. It is often the case that some of the information from a given domain(s) for which we have a lot of data is shared with some other domains for which the data is less plentiful. Ideally we would like to reuse/transfer as much information as possible between domains to help build better models, but this should be accomplished without introducing noise (negative transfer). Exactly what information can be transferred and when is an important problem that needs to be investigated in depth.

Third, an interesting area is feature extraction and data representation. How can we (mostly) automatically extract features useful for a particular learning task? Much has been done in this direction as well, but this problem is far from solved.

Another interesting area is systems that are capable of easily incorporating new data and learning overtime. In this regard, the NELL project by Tom Mitchell (Carnegie Mellon University) is quite inspiring.

In general, I would like to see different NLP and Machine Learning methods combined in order to build more complete NLP systems. Starting with a representation appropriate for the learning task, to modeling data with methods that use transfer techniques to reuse common knowledge, unlabeled data to deal with sparseness, clustering and other techniques to deal with ambiguity in classes, and so on.

About the author:
Maxim Khalilov, PhD is the R&D manager at TAUS B.V and the co-founder of NLPPeople.com. He is a former post-doctoral researcher at the University of Amsterdam, intern at Macquarie University (Australia) and a PhD student at the Polytechnic University of Catalonia (Spain).

02 April 2013

Machine Translation Jobs

Our world is currently going through the most significant period of globalization in human history. Although English is still a second language for many around the world, most people still feel more comfortable in their own native language or in the language that they know best. 
At the same time, more and more people are talking about the exploding volume of multilingual content sitting on the web and on the servers of international corporations and smaller companies. A significant chunk of this content has a certain value, but in many cases it is not considered important enough to hire professionals to translate it.
The emergence of Machine Translation (MT) technology has made it possible to have these documents available in multiple languages. Sometimes the quality of translation can be relatively low, but without MT, this content would never been translated at all.
Another attractive characteristic of MT is that it enables an increase in the productivity of translators without reducing the quality of translation. In this instance, translators are presented with the output of the MT engine (aka Google Translate) and are asked to simply post-edit the MT output instead of translating from scratch. For most languages, this is faster and demands far fewer resources.
These are only two of the reasons why MT is emerging as one of the most important technologies in the modern industrial world.
MT is constantly improving. It is a hot industry, with millions of strategic and technical challenges to be solved. A professional or a recent graduate with experience and/or background in MT has a lot of opportunities in the worldwide job market.

Let’s have a look at the types of companies who are often interested in hiring MT professionals:
  • Machine translation service providers. Big and small IT companies that provide their own MT solutions or customize off-the-shelf and commercial MT engines. Jobseekers can look to take this path if they are strong at research and are fast and efficient at implementing new features to the MT engine. Examples: Google (Google Translate R&D group), Microsoft (Microsoft Bing R&D group), Asia Online (customization of O/S Moses engine), Systran (the oldest players on the market), ProMT (Hybrid MT).

  • International corporates adopting MT technology and localization departments and R&D labs of big companies. Recently, corporates that view multilingualism as a strategic issue have started adopting existing MT systems or developing their own MT solutions. They need MT specialists to partially or totally automate the translation process. This category of companies also includes public international organizations, like Patent Offices.
    Examples: Autodesk, eBay, AT&T, World and European Patent Offices.

  • Language Service Providers (LSPs). Currently LSPs, who represent a significant percentage of MT buyers, are in different phases of MT implementation and customer-oriented adoption. For their goals, I would expect a high level of correlation between MT and translation memory technology. Examples: HP ACG, SDL, Semantix.

  • Academia: universities and research centers. In academia, MT is considered one of tasks of artificial intelligence research and a sub-domain of computational linguistics. MT has become an extremely popular and intensively funded research area. Over the last few years MT has been intensively funded by European Commission, the US and Canadian governments through a number of short- and mid-term projects. Jobseekers interested in pursuing an academic career, including those who want to complete a PhD, can try to find a position within one of these projects or joining permanent University staff.
    Examples: Institutions active in MT research: Carnegie Mellon University, University of Edinburgh, RWTH – Aachen University, Projects: Moses Core, EXPERT, FAUST

  • Government labs and research centers. MT is a strategic goal for at the governmental level too. MT systems for this type of use traditionally incorporate MT solutions from different providers (commercial and open-source). Since the majority of the MT-related research for government applications is concentrated in USA, the eligibility criteria often includes US citizenship and security clearance.Examples: National Institute of Standards and Technology, Language Technologies Research Centre, MIT Lincoln Lab.

We’ve been collecting a number of statistics since our launch in May 2012. Let’s have a look at some that are applicable to MT:

Table 1. Distribution of MT-related jobs per type of hiring institution. 
Type of companyNumber of job ads
MT providers13.33%
Of all the jobs advertised at nlppeople.com, more than half are submitted by corporates, international institutions and academia. A low level of hiring at LSPs indicates that translation agencies are not convinced enough about MT to run the risk of developing their own MT solutions.

Table 2. Distribution of MT-related jobs per country.
CountryNumber of jobs

Almost half of the jobs are concentrated in the USA, followed by the European localization hot-spot of Ireland and the Benelux countries. The UK’s market share is a bit lower for the MT market than for the NLP market in general (6.25 % versus an overall share of 10.21 %).

Table 3. Distribution of MT-related jobs per type of employment.
Type of employmentNumber of jobs

The lion’s share of jobs in industry are full-time, permanent positions, while in academia practically all the openings are fixed term jobs. Freelancing – a popular option for many IT people seeking to earn a flexible income – hardly features as a possibility in the MT market at the moment.
MT is booming. MT’s time has come. The number of companies and academic institutions hiring people with an MT background is increasing, so don’t lose your chance to jump in!

About the author:
Maxim Khalilov, PhD is the R&D manager at TAUS B.V and the co-founder of NLPPeople.com. He is a former post-doctoral researcher at the University of Amsterdam, intern at Macquarie University (Australia) and a PhD student at the Polytechnic University of Catalonia (Spain).