As this was Sajan’s first project with Omniscien Technologies’ (formerly Asia Online) Language Studio, a small amount of work was performed by Sajan’s technical team to integrate Sajan’s Global Communication Management System (GCMS). This ensured a seamless workflow for project team members such as project managers and post-editors, as well as Sajan’s customers. XLIFF files were transmitted between the Sajan GCMS and Language Studio via the Language Studio API. As this is an industry-standard format, Sajan was able to develop their connector and have it production ready in a very short time. For more information, see the Sajan GCMS Architecture at the end of this case study.
Post Editing Machine Translation for IT Technical Documentation
One of Sajan’s many clients is a large multinational corporation that operates in nearly every country globally with a wide range of IT products ranging from database software to computer hardware. Sajan’s client had many millions of words that needed to be translated into Simplified Chinese across these products.
As Sajan’s client has been in business for many years, they had around 3 million segments of translation memories that could be leveraged. Language Studio Data Cleaning was used to analyze the data and remove bad or inconsistent content. Approximately 26% of the content in the translation memories was discarded as either low quality or not suitable to training statistical models that Language Studio Custom Translation Engines are built upon. The remaining content was analyzed and normalized where appropriate. Language Studio Advanced Data Manufacturing was used to analyze and create additional data to fill in gaps in data coverage.
|The end client also has its own machine translation technology that they license to other companies, which has been packaged as a branded product for another Language Service Provider. However, after an initial pilot project in Simplified Chinese, it was determined that Language Studio provided a 60% cost saving and a 77% time saving when post-editing based on the client’s own metrics and validated by Sajan’s technical team.|
“The client was a global multinational who had their own proprietary machine translation system, but they were not happy with certain languages. The client determined that they wanted to try different systems to see what happened with the outcome.
The overall result was that the client received a
60% reduction in cost and a 77% time-saving.”
Professional Services, Sajan
Of particular note was that due to the technical nature of the translations, the machine translation was often higher quality and faster to edit than a first pass human translator. Additionally, due to the terminology accuracy and consistency, post editing the raw machine translation output from Language Studio was twice as fast as post editing the human translator. While in the first iteration of the engine, there were frequent grammar errors, terminology was exceptionally accurate. After receiving post editing feedback, the quality of the engine rapidly improved and many of the early grammatical errors were resolved.
The end client was very impressed with the results of the Simplified Chinese pilot; specifically, the client was impressed with the time and cost savings, and terminology accuracy when compared to their own SMT based technology. At the time of publishing this case study in September 2011, more than 27 million words had been processed by the client using Language Studio. The client has since proceeded to introduce Language Studio machine translation across a number of country or region specific business units and continues to process many millions of words each year.
The video of this case study was presented as part of the annual Localization Research Center (LRC) conference and was preceded by a conceptual overview that was presented by Dion Wiggins, Omniscien Technologies’ (formerly Asia Online’s) CEO. The full video including both the Sajan and Asia Online presentation can be seen on YouTube at http://bit.ly/trsyhg. Sajan’s slides from this presentation are available for download from http://bit.ly/r6BPkT
Sajan GCMS Architecture
|Location:||United States of America|
|Deployment Model:||Software-as-a-Service (SaaS)|
|Translation Volume:||100 million+ words|
|Language Pairs:||EN-ES, EN-FR, EN-JA, EN-TH, EN-ZH|
|Domains:||IT / Technical Product Documentation|
|Challenge:||Wide range of products with very complex technical terms, most of which are specific to the client.|
|Solution:||Terminology extraction and normalization were key to this project’s success. When combined with Language Studio™ Data Cleaning and Advanced Data Manufacturing, superior results were possible, even in complex languages such as Simplified Chinese.|