While the use of machine translation (MT) in the localization industry is commonplace today, it is still largely unused for the localization of subtitles and closed captioning. This is partially due to the fact that very few localization providers for the media industry use localization technology such as CAT tools and Translation Memory Systems (TMS), but also due to the fact that providing automation support for localization of subtitles is technically more complex than for “standard” localization. Due to the lack of CAT and TMS systems that are specialized to the media industry, some of the largest translation providers that specialize in the media content have built their own tools to address the gap, but have limited, if any, support for machine translation.
There are a number of reasons for this added complexity when processing subtitles and closed captions, but primarily it has to do with the fact that subtitles are typically very short dialogues, rather than longer sentences that are more common place in documents. Dialog is often more informal and often lacks end of sentence markers. In order to train a high-quality MT engine, the training data must match the purpose. In the case of subtitles, dialog training data is needed. Data derived from documents is notably easier to obtain than dialog. Additionally, subtitles break up a sentence into small fragments per video frame that need to be re-assembled as a full sentence in order for a machine (or a human) to be able to provide a good quality translation in context. Without this important step, only portions of a sentence can be seen by the machine (i.e. half a sentence) which cannot be translated properly.
Further complexity is added as each country, language and end customer may have their own specific style guides that need to be followed such as speaker changes, pauses or interruptions, scene changes adjustments, profanity handling, reading speed for a language and much more.
This impacts how sentences are analyzed for translation are split apart and restructured for display after translation. For example, sentence within a frame must be split at the optimal position ensuring that names are not broken across lines and that a number of language specific rules are adhered. Examples are the role of articles, prepositions and punctuation in the determining where a sentence should be split.
However, recent production grade large-scale deployments of the Omniscien Technologies’ Language Studio language processing and machine translation workflow in subtitling and closed captioning environments have proven that machine translation can be effectively applied to the localization of subtitles and can provide substantial productivity improvements.
This white paper discusses both the challenges as well as the opportunities that the application of technology provides to subtitling localization providers.
Read more by downloading our white paper “Effective Application of Machine Translation to Subtitling Localization”