The Future of Language Resources for Machine Translation (LR4MT)

14 April 2016

The Future of Language Resources for Machine Translation (LR4MT)

In a recent brief survey of language service suppliers (LSPs), LT-Innovate attempted to find out from the translation industry how they saw the future of language data availability specifically for machine translation. The results provide food for thought when it comes to planning for the improved usability of digital text resources in the years ahead. It looks as if new developments in machine translation (MT) technology will work in parallel with a growing need for the right data.

First, statistical machine translation is clearly on most LSPs’ radar screens. Hard data on the actual size of the user market for MT systems is impossible to calculate today, as is information on who uses which free or paying services available online in their everyday work. But everyone who responded to our survey claims they will be “using” MT in the next 2 to 3 years.

Preparing for this transition is therefore vital for the nascent language data resource sector.

Overall, 30% of our LSP respondents reckon that the data they will need to prime their MT engines will come from their clients, 78% will use their in-house translation memories and similar, and 70% will try to find third party sources from outside their immediate business nexus.

15% of them would be prepared to buy such data, 70% of them will crawl the web, while a total of 83% of them expect more free resources will become available.

Judging by our current findings from mapping publicly-available LR4MT in Europe, the chances of them finding the relevant resources easily look relatively small. Sharing language data is not a high-visibility phenomenon so far.

However 39% said they did not have the necessary engineering resources in-house to transform the content they might find into viable MT data. This suggests there could be a small market for language data cleaning and aligning for data harvested from the web or well-known repositories.

When it comes to the desired quality criteria for usable language resources, by far the most important criterion (84%) was unsurprisingly domain relevance. Indeed, small customer- or domain-specific language models for MT are typically considered to outperform general models by a very large factor. This suggests that some serious effort will need to go into pinpointing domain relevance in any language resource supply platform, rather than rely, say, on volume as a virtue in itself.

Appropriately, there was also considerable emphasis on leveraging the semantic characteristics of language data needed for MT. Semantically enriched data, as proposed by such EC-funded projects as LIDER (a Linked Open Data-based ecosystem of free, interlinked, and semantically interoperable language resources) and BabelNet (multilingual dictionary underpinned by a rich semantic network) clearly have potential as a future resource. We therefore need to examine the fastest and most efficient way to transform this potential technology stack into an operational reality. We can also expect to hear much more from Coreon about multilingual knowledge management as a fundamental business tool.

So what can we expect for a more effective and efficient deployment of LR4MT? In general, respondents are looking towards new hybrid models of machine translation involving the integration of transfer/grammar and semantic modules into the plain vanilla statistical model as it exists today. This suggests that language technology and data resource quality will need to evolve closely in parallel.

They also expect deep learning to be applied to MT, together with such processes as continuous retraining during the MT post-editing phase. In other words, we are just at the beginning of a new cycle of more artificial intelligence-driven MT systems that will be able to learn as they go and leverage even more usability from relevant data resources. But as one respondent pointed out, the ultimate litmus test for the value of translation resource data is whether or not the original translation is any good. Tools to tame the elusive beast of rapid translation quality evaluation will still need to be part of the mix.

What specific needs for or constraints on MT data resources do you foresee in Europe in the near future? Tell us here or respond to our survey.

Jo Céline

25 comments:

Tech Institute 30 September, 2020 16:52
Fantastic article with valuable information waiting for next blog thanks for sharing you.
Data Science Course in Hyderabad
ReplyDelete
Replies
Tech Institute 12 October, 2020 05:55
This comment has been removed by the author.
ReplyDelete
Replies
Anonymous24 June, 2021 05:51
Great blog post,
SEO Training In Hyderabad
ReplyDelete
Replies
Data Science Training in Hyderabad27 July, 2021 12:57
Good work, unique site and interesting too… keep it up…looking forward for more updates. Good luck to all of you and thanks so much for your hard-work…

Data Science Training in Hyderabad
ReplyDelete
Replies
aziz khatri30 July, 2021 21:51
https://www.digitpro.co.uk/the-uppsala-internationalization-model-and-its-limitation-in-the-new-era/ nice article
ReplyDelete
Replies
aziz khatri30 November, 2021 10:20
It is very informative. Coomeet
ReplyDelete
Replies
aziz khatri30 November, 2021 13:13
This turned into an outstanding page for this type of difficult situation to speak about. chat random
ReplyDelete
Replies
aziz khatri30 November, 2021 14:38
A properly weblog continually comes-up with new and interesting statistics. ome tv
ReplyDelete
Replies
data science04 March, 2022 04:56
I have bookmarked your site since this site contains significant data in it. You rock for keeping incredible stuff. I am a lot of appreciative of this site.
ReplyDelete
Replies
PMP Training in Malaysia08 March, 2022 06:34
360DigiTMG, the top-rated organisation among the most prestigious industries around the world, is an educational destination for those looking to pursue their dreams around the globe. The company is changing careers of many people through constant improvement, 360DigiTMG provides an outstanding learning experience and distinguishes itself from the pack. 360DigiTMG is a prominent global presence by offering world-class training. Its main office is in India and subsidiaries across Malaysia, USA, East Asia, Australia, Uk, Netherlands, and the Middle East.
ReplyDelete
Replies
aziz khatri10 March, 2022 13:18
Extremely helpful post. This is my first time I visit here. I found so many fascinating stuff with regards to your blog particularly its conversation. Actually its extraordinary article. Keep it up
buy facebook post likes
ReplyDelete
Replies
traininginstitute30 March, 2022 08:17
Thanks for your post. I’ve been thinking about writing a very comparable post over the last couple of weeks, I’ll probably keep it short and sweet and link to this instead if thats cool. Thanks.
cyber security course malaysia
ReplyDelete
Replies
Career Academic institute24 June, 2022 13:55
Develop technical skills and become an expert in analyzing large sets of data by enrolling for the Best Data Science course in Bangalore. Gain in-depth knowledge in Data Visualization, Statistics, and Predictive Analytics along with the two famous programming languages and Python. Learn to derive valuable insights from data using skills of Data Mining, Statistics, Machine Learning, Network Analysis, etc, and apply the skills you will learn in your final Capstone project to get recognized by potential employers.

Data Science in Bangalore

ReplyDelete
Replies
Professional Programs Education27 March, 2023 13:37
I recommend everyone to read this blog as it has some of the best data science content you will find. The best part is that the writer presented the information in an engaging and engaging way. Each line gives you something new to learn, and that says a lot about the quality of the information presented here.

Kickstart your career by enrolling in this Data Science Certification Course in Chennai
ReplyDelete
Replies
data scientist course24 July, 2023 13:37
Join our internship programme for new graduates in data analytics to obtain practical experience in the fast-paced industry.data analytics internship for freshers
ReplyDelete
Replies
A1_Township12 May, 2025 06:50
Good blog very informative
top 10 real estate agents in hyderabad

ReplyDelete
Replies
MARKETING22 May, 2025 11:49
"The Future of Language Resources for Machine Translation (LR4MT) highlights the critical role of diverse, high-quality linguistic data in improving translation accuracy. As technology advances, expanding and refining these resources will be key to overcoming language barriers and enabling more natural, context-aware translations globally."

4.1-mini
"The Future of Language Resources for Machine Translation (LR4MT) highlights the critical role of diverse, high-quality linguistic data in improving translation accuracy. As technology advances, expanding and refining these resources will be key to overcoming language barriers and enabling more natural, context-aware translations globally."

4.1-mini

"The Future of Language Resources for Machine Translation (LR4MT) highlights the critical role of diverse, high-quality linguistic data in improving translation accuracy. As technology advances, expanding and refining these resources will be key to overcoming language barriers and enabling more natural, context-aware translations globally."

4.1-mini

<a href="https://digitalfloats.com/video-editing-course-in-hyderabad/">new
link</a>
ReplyDelete
Replies
Olive Mountain26 June, 2025 09:42
Good blog very informative
Lamb Chops near Lincoln Ave
ReplyDelete
Replies
SM FIBER LINKS04 July, 2025 08:33
Good blog very informative
eco friendly resorts in kochi
ReplyDelete
Replies
A1_Township04 July, 2025 09:50
Good blog very informative
open plots in shadnagar
ReplyDelete
Replies
Olive Mountain25 July, 2025 14:49
Good blog very informative
Chicken Shawarma near Lincoln Ave
ReplyDelete
Replies
radissonhotels06 September, 2025 12:43
Good blog very informative
banquet halls with backwater views in kochi
ReplyDelete
Replies
radissonhotels01 October, 2025 12:27
Good blog very informative
fine dining restaurants in kochi
ReplyDelete
Replies
SM FIBER LINKS09 October, 2025 08:47
Good blog very informative
internet service provider in Tolichowki
ReplyDelete
Replies
A1_Township29 October, 2025 05:33
Good blog very informative
Ongoing Residential plots for sell in Hyderabad
ReplyDelete
Replies

Add comment