One of the major challenges that enterprises have in the use of increased automation in business translation, is understanding the productivity and quality impact of any new automation strategy. As the discussion of quality and even productivity in the industry is often quite often vague and ill-defined, it is useful to show an example where a company understands with great precision what the impact is before and after the use of new translation production technology.
The key questions that one needs to understand are:
- What is my current productivity (time taken, words produced) to achieve a defined quality level?
- What impact does new automation e.g. an MT system, have on my existing productivity and final delivered quality?
A Profile of Effective Translation Quality & Productivity Measurement
Kevin Nelson, a Managing Director at Omnilingua Worldwide, recently presented part of the Omniscien Technologies webinar “Meaningful Metrics for Machine Translation”. Omnilingua is a company that prides itself in building teams that deliver definable quality and are recognized throughout the translation industry as a company that pays particular attention to process efficiency and accurate measurement.
Thus, when they embarked in their use of MT 5 years ago, they took great care to understand and carefully measure and establish that MT was in fact enhancing production efficiency before it was put in to production. In the same way before making any changes to their current MT deployment they wanted to make sure that any new initiative they embarked on was in fact a clear and measurable improvement over previous practice. As Kevin Nelson, Managing Director at Omnilingua said: “The understanding of positive change is only possible when you understand the current system in terms of efficiency.”
During the webinar, Kevin discussed how and why Omnilingua perform detailed metrics. To demonstrate the benefits of measurement to Omnilingua, Kevin presented a case study that measures and compares an Omniscien Technologies Language Studio™ custom MT engine with a competitors MT engine and also studies it impact on human translators.
Omnilingua first embarked in the use of MT 5 years ago with Language Weaver and took great care to carefully measure, understand and establish that MT was in fact enhancing production efficiency prior to using MT in production. Recently, Omnilingua reached a point where they had to reconsider retraining and upgrading their aging legacy MT engine or whether to invest in a new MT engine with Omniscien Technologies.
Omnilingua engaged Omniscien Technologies at the end of 2011 to build a custom MT engine in the technical automotive domain translating from English into Spanish using similar data to the legacy competitors MT system. As this was Omnilingua’s first Language Studio™ custom MT engine, Omnilingua wanted to make sure that any new MT initiative they embarked on was in fact a clear and measureable improvement over the competitor’s legacy MT technology before making any changes in their production environment.
Omnilingua has long-term experience in conducting valid “double-blind” studies that produce statistically relevant results that measure machine quality, human quality and effort. The same careful measurement process was embarked upon to determine if their new MT initiative with Omniscien was an improvement.
The understanding of positive change is only
possible when you understand the
current system in terms of efficiency.
Any conclusion about consistent,
meaningful, positive change in a process
must be based on objective measurements
otherwise conjecture and subjectivity can
steer efforts in the wrong direction.
– Kevin Nelson, Omnilingua Worldwide
At the heart of Omnilingua’s process and quality control procedures is a long-term and continuous use of the SAE J2450 quality assessment and measurement process. Long-term use of a metric like this provides trusted and well-understood quality benchmarks for projects, individual customers and also for MT quality that are more trusted than automated metrics like BLEU, TER and METEOR that are available with the free Language Studio™ Pro measurement tools from Omniscien Technologies.
While there is effort and expense involved in implementing SAE J2450 as actively as Omnilingua, the advantages provided by the measurements allow for a deep understanding of translation quality and the associated effort. Long-term use of such a metric also dramatically improves the conversation regarding translation quality between all the participants in a translation project, as it is very specific and impersonal and clear about what quality means.
Kevin listed the following benefits of the SAE J2450 measurement standard:
- Built as a Human Assessment System:
- Provides 7 defined and actionable error classifications.
- 2 severity levels to identify severe and minor errors.
- Provides a Measurement Score Between 1 and 0:
- A lower score indicates fewer errors.
- Objective is to achieve a score as close to 0 (no errors/issues) as possible.
- Provides Scores at Multiple Levels:
- Composite scores across an entire set of data.
- Scores for logical units such as sentences and paragraphs.
In order to determine if MT has been successful, production efficiencies and improvements must be measureable. This not only shows improvement in MT over time, but ensures that the MT based process is more efficient than the previous human only process while delivering a comparable translation quality. A recent survey by MemSource indicated that over 80% of MT users have no reliable way of measuring MT quality.
Omnilingua uses multiple metrics to precisely define the degree of effort required to post-edit MT to client deliverable quality. This quantification of the Post Edited MT (PEMT) effort includes raw SAE J2450 scores for MT Vs. the equivalent historical human quality SAE J2450 scores in addition to Time Study measurements and Omnilingua’s own proprietary effort metric, OmniMT EffortScore™, which is based on 5 years of measuring PEMT effort at a segment level. These different metrics are combined and triangulated to deliver very reliable and trusted measurements of the effort needed for each PEMT project.
Omnilingua is able to understand through the above 3 metrics that the changes in their production process are measurably greater than the cost of deploying MT. Omnilingua also makes efforts to “share cost savings and benefits across the value chain with clients and translators”. Through this approach, Omnilingua has been able to keep the same team of post-editors working with them for 5 years continuously. This possibly is the greatest benefit of understanding what you are doing and what impact it has.
Omnilingua used the SAE J2450 standard to measure the improvement of the new Language Studio™ custom engine over the competitor’s legacy MT engine. SAE J2450 measurements were made on both the raw MT and the final output after post-editing the MT from both custom engines.
Measurement Results and Comparisons
After reviewing the detailed measurement data Omnilingua made the following conclusions:
- There were far fewer errors produced by the Language Studio™ custom MT engine than the competitor’s legacy MT engine.
- Notably, there were fewer wrong meanings, structural errors, and wrong terms in the Language Studio™ custom MT engine, which were “typical SMT problems” in the competitor’s legacy MT engine.
- 52% of the raw MT output from the Language Studio™ custom MT engine had no errors at all compared to the competitor’s legacy MT engine which had 26.8%.
- The Language Studio™ custom MT engine measured was the very first iteration of the engine, with no improvements or corrective feedback applied.
- Many of the errors from the Language Studio™ custom MT engine were minor spelling errors relating to capitalization. A majority of the “spelling errors” were traced back to a legacy portion of the client-supplied translation memory historically used for case-insensitive leverage.
- Omnilingua found the errors easy to correct with tools provided by Omniscien Technologies.
- The final translation quality after post-editing was better with the new Language Studio™ custom MT engine than the competitor’s legacy MT engine and also better than a human only translation approach.
- Terminology was more consistent with a combined Language Studio™ custom MT engine plus human post-editing approach.
- When surveyed, post-editors perceived that both MT engines were about the same quality and effort to edit. However, human perceptions can often overlook what objective measurements capture.
- The measured results show that the Language Studio™ custom MT engine was considerably better in terms of translator productivity and produced a final product that had fewer errors because of the higher quality raw MT output provided to the post-editors.
- The following table summarizes the key results for both the raw MT and the final post-edited MT:
Omnilingua has already seen translation quality from the first version of their Language Studio™ custom MT engine improve beyond the above levels by providing basic feedback using the tools provided by Omniscien Technologies. As Omnilingua continues to periodically measure quality, it is expected that the metrics will show further improvement in the metrics specified above.
We found that 52% of the raw original output from Omniscien Technologies had no errors at all
– which is great for an initial engine.
The final translation quality after post-editing was better with the new Language Studio™ custom MT engine than the competitor’s legacy MT engine and also better than a human only translation approach. Terminology was more consistent with a combined Language Studio™ custom MT engine plus human post-editing approach.
There were far fewer errors produced by the Language Studio™ custom MT engine than the competitor’s legacy MT engine. Notably, there were fewer wrong meanings, structural errors and wrong terms in the Language Studio™ custom MT engine, which were “typical SMT problems” in the competitor’s legacy MT engine.
– Kevin Nelson, Omnilingua Worldwide
|Location:||United States of America|
|Deployment Model:||Language Studio Cloud|
|Translation Volume:||Multiple millions of words|
|Language Pairs:||EN-ES, EN-FR|
|Domains:||Automotive, Life Sciences|
|Challenge:||Bespoke MT solution was delivering satisfactory results and had been matured over 5 years. Omnilingua wanted to ensure that they were achieving the best the could from machine translation and if there were alternative technologies that could deliver better results.|
|Solution:||Omnlingua provided the same data to Omniscien that they provided to the competitor for customization. Detailed SAE J2450 and productivity metrics were performed and compared.|