What is Do-It-Yourself (DIY) Machine Translation?
Do-It-Yourself (DIY) Machine Translation, also known as “Self Service Machine Translation” has recently emerged as a popular model for customizing a machine translation engine. In the last 2-3 years, the DIY machine translation space has been populated by many “me too” vendors that offer little additional value over what can be downloaded as open source Moses tools and are difficult to differentiate from each other in terms of actual value add.
DIY machine translation provides only a subset of the customization capabilities of a full customization process. For a more detailed comparison see the article “Understanding the Difference between Full Customization and Do It Yourself Customization ”
The appeal of the DIY model is that it is perceived to give the user complete control over the customization process. Many DIY MT vendors have marketed heavily saying that they empower their users as they have the ability to upload their translation memories into a DIY machine translation system. However, in reality users are being stripped of the power to control the engine in these systems. Simplicity often hides reality.
Ironically, the DIY model is frequently referred to as “upload and pray”, where the user has no real control over their engines quality beyond the data they are uploading and in most cases do not even know the contents or quality of the translation memories that they are uploading. Frequently the data comes from many sources, even third parties. Many Language Service Providers (LSPs) upload translation memories that were passed onto them by their client or another LSP, or even data acquired from organizations such as TAUS. The net result is a mixing pot of inconsistent translations memories that are difficult to manage or remain unmanaged that are combined and in turn create inconsistent machine translation output. This issue is similar to what LSPs face when they use a third party translation memory or mix translation memories to pre-translate segments as part of a translation project.
The inherent issue with DIY machine translation is that it implies that the user knows how to do-it-themselves. Case studies like IOLAR’s demonstrate clearly that high quality machine translation requires considerably more effort, knowledge and skill than simply loading data into a system for training. Achieving a quality level that was useable for efficient post editing was clearly not the simple task that TAUS and third-party DIY proponents had conveyed. The article “Understanding the Difference between Full Customization and Do It Yourself Customization” has a section that lists of some skills necessary in order to build a high quality machine translation engine.
Simply being able to upload data is not user empowerment. Such DIY systems provide a “pretty” user interface that makes it very easy and fast to create low quality custom machine translation engines with only minimal levels of control of the data upon which the custom engine is built. User empowerment comes from having total control of the data and being able to manipulate and refine the data to get the optimal result.
There are many Natural Language Programming (NLP) specialists and computational linguists that have worked extensively with the Moses tools and also have an extensive research background. However, for most the focus has been on algorithmic improvements that offer a small overall improvement. While the algorithms are certainly important, the essence of a high quality translation system that requires the least amount of effort to post-edit and delivers the highest Return On Investment (ROI) lies in the data upon which the engine is customized. Data quality and appropriateness are areas that academia has not put a lot of effort into when compared to algorithms. One of the essential factors in delivering near human quality machine translation is the correct and most optimal data, which is the focus of the Clean Data SMT model.
There are three common forms of DIY MT:
Downloading the open source Moses SMT system either directly from the site or with a third party installer.
Using a third party Software-as-a-Service (SaaS) platform that has packaged Moses behind a web portal making it easy to use. Such services typically put a web portal over the top of Moses and allow users to upload data.
Licensing a packaged version of #2 above and installing it on your own computer systems.