Subtitle Optimized Machine Translation
Subtitle optimized machine translation is not intended as a replacement for human translators. It is a powerful tool that increases productivity. Customer case studies have shown that subtitle optimized machine translation can consistently produce machine-translated outputs where more than 50 percent of the output sentences are either perfect or near-perfect (just 1 character or word change).
A human editor is always needed to review as a human has additional knowledge on topics and can see what is happening on the screen. This additional information is needed to perfect the translated output and fine-tune it to specifically match the video contents, style, and genre.
Media Studio provides detailed metric reports and tools for tracking costs and productivity. Real-world case studies that have collectively measured thousands of subtitles being processed over several years have provided clear proof-points of productivity gains. It is commonplace to exceed more than 200 percent productivity gains.
Subtitle optimized machine translations has been developed by Omniscien over several years taking into account the detailed feedback of customers using earlier forms of subtitle optimization. The Omniscien development team has worked hard to address as many pain points and bottlenecks in human workflows with the focused goal of reducing the amount of human effort required in order to published.
Subtitle Optimized Machine Translation is available via Media Studio Project Management and Editing Platform user interface and subtitle editor or via Media Studio Data Processing Platform user interface or as a REST API for application integration.
Available as two Platform Editions specifically designed to match different business needs.
- Product Overview
- Features
- Benefits of Media Studio (White Paper)
- Subtitle Optimized Machine Translation
- Data Security & Privacy
- Secure by Design
- Project Management and Editing Platform
Project, People, Resource, Video, and Subtitle Management
- Data Processing Platform
Data Creation, Analysis, Cleaning, and Organization
What is Subtitle Optimized Machine Translation?
Subtitle optimized machine translation is more than just translation. There are 4 core stages of processing:
-
Pre-Processing: Sentence Joining
For machine translation to deliver the best possible translation quality, it is especially important to have complete sentences to translate. Many sentences span multiple subtitle boundaries. This process uses a combination of artificial intelligence and rules to join text fragments to make complete sentences.
-
Pre-Processing: Metric Conversions and Text Fine Tuning
User configurable rules determine how metrics, dates and other content should be prepared for processing so that they are translated correctly into the target language. This includes converting metrics like inches to centimeters, miles to kilometers, and fahrenheit to centigrade. Additional user configurable rules change the source text to remove unwanted vocabulary such as “Um”, “Oh” or similar. This is an important part of user configured protocols and style guides.
-
Machine Translation
Most machine translation platforms perform very poorly on subtitle content as they are trained primarily on documents. Subtitles are full of dialog, conversation, slang, and idioms. Media Studio subtitle machine translation engines have been specifically trained on tens of millions of bilingual sentences that are based around dialog and discussion. This produces significantly higher translation quality and ensures the resulting translation sounds natural. Different video genres can also be applied to produce different styles of output.
-
Post-Processing: Sentence Splitting and Time Cue Adjustments
Configurable settings allow for time cues to be adjusted based on target language reading speed, locked to the original times, and a range of other adjustable parameters. Translated sentences are split across subtitles in natural positions using where a human would split them using artificial intelligence to make the subtitle read naturally within the user-configurable parameters such as Characters Per Line (CPL). User-configurable parameters also determine whether the time cues should be locked/unchanged or should be automatically adjusted based on the target language text length using parameters such as reading speed.
Benefits of Subtitle Optimized Machine Translation
Greater productivity not only delivers output faster, but it also translates directly into lower costs and bottom-line savings.
Fewer changes result in considerably less time spent editing, greater productivity, and faster delivery. More work output can be produced per workday as a result. Each task is underpinned with artificial intelligence to produce human-like output that requires less effort to review and correct. Each feature and activity are designed to optimize overall productivity and output quality improvements.
- 200+ percent productivity gains.
- Faster delivery times.
- Machine + human translations deliver higher quality translation output than human only.
- Deliver rush jobs more quickly.
- Accept urgent or larger jobs with confidence.
- Reduced project management effort.
- Reduce or eliminate external third-parties.
- Fewer inserts, deletes, merges and time cue changes.
- Lower costs across the board from editors, quality assurance, project management and more.
Real World Example: Duration to Complete Translation, Editing and Quality Assurance
Specialized Types of Subtitle Optimized Machine Translation
Subtitle-optimized machine translation delivers faster, cheaper, higher quality, and more consistent translation output than a human-only solution. Fine-tune for Characters Per Line (CPL), part-of-speech, structure balancing, and more. Powerful artificial intelligence driven processes reduce human effort and increase productivity.
Subtitle Translation
Synopsis Translation
Document Translation
Translate Microsoft Office (Word, Excel, PowerPoint, Outlook EMails), Open Office, Adobe PDF, Images, HTML, Plain Text, XML, and many other standard document formats.
User Review Translation
Genre Specific Translation Styles within a Single MT Engine
User Configurable Formatting Controls
How It Works
Subtitle optimized machine translation is a comprehensive set of 4 core processes that are optimized for translating subtitles and dialog-based content and their associated file formats. While high-quality translation is part of the value of the process, the 3 pre- and post-processing stages further reduce the amount of human effort required to publish a translated subtitle by fine tuning the data to better translate, meet style guides and protocols, and adjusting the file formatting such as time cues.
Pre-Processing: Sentence Joining
Media Studio uses a comprehensive set of user-configurable rules, artificial intelligence, and user-defined parameters to determine how to create complete sentences.
For machine translation to deliver the best possible translation quality, it is especially important to have complete sentences to translate. Sentences within a subtitle are often incomplete and are split into sentence fragments. Sentence fragments could be split within 1 subtitle or across 2 or more subtitles. For a human or a machine to understand a sentence properly, the entire sentence needs to be translated as a single task, rather than translated in fragments, to get as much context as possible.
Consider the following example:
In this example, the sentence fragments are split across 2 lines in the first subtitle and 1 line in the second subtitle. A human or a machine could translate “I went to the bank” into a good translation. However, without knowing the text on the 2 blanked-out lines the context of the word “bank” may be incorrect.
Depending on the context, the word for “bank” in another language may be totally different. If translated in isolation as a sentence fragment, “I went to the bank” would translate into the most statistically common contextual use. For example, in Spanish, it would translate to “Fui al banco.” If the context had been climbing out of the river, then the word bank would be incorrect. The correct translation would be “Fui a la orilla del río y salí.” The technical term for this is "Word Sense Disambiguation". You can learn more about this on Wikipedia.
Additionally, many target languages may not follow a similar word order as the source language. Having a complete sentence to translate allows for words to be positioned and reordered into the correct position within the translated sentence. With translating sentence fragments, word order can only within the fragment rather than the complete sentence.
Pre-Processing: Metric Conversions and Text Fine Tuning
User-configurable rules determine how metrics, dates, and other content should be prepared for processing so that they are translated correctly into the target language. This includes converting metrics like inches to centimeters, miles to kilometers, and Fahrenheit to Centigrade.
Style guides and protocols provide a set of rules for translators and quality assurance specialists to follow that keep translations consistent and are part of delivering an overall higher quality subtitle. Media Studio accepts user-configurable rules that guide the translation process.
Bilingual glossaries, non-translatable terms and phrases, and metric conversions can be specified at a global, project or job level. Metrics amounts and units can be automatically detected, converted, translated, and formatted exactly matching a style guide. Metrics include inches to centimeters, miles to kilometers, Fahrenheit to Centigrade, and many other commonly found metrics. Numeric, decimal points, rounding, date, and time formats can all be defined to produce consistently high-quality translations.
Additional user-configurable rules change the source text to remove unwanted vocabulary such as "Um", "Oh" or similar. The same tools can apply adjustments that better guide the machine translation engine to produce higher quality translations. For example, changing “AC” to “air conditioner” would give a better context.
Media Studio provides tools for extracting terminology, names, organizations, and other glossary phrases that can be reviewed prior to creating generating bilingual glossaries.
Machine Translation
Most machine translation platforms perform very poorly on subtitle content as they are trained primarily on documents. Subtitles are full of dialog, conversation, slang, and idioms. Media Studio subtitle machine translation engines have been specifically trained on tens of millions of bilingual sentences that are based around dialog and discussion. This produces significantly higher translation quality and ensures the resulting translation sounds natural.
Because the range of topics that could be contained within a video is infinite, further translation quality improvements can be achieved by customizing a machine translation engine with your own existing subtitles. Media Studio is underpinned by the advanced language processing and machine translation engine features of Language Studio. This allows for the rapid customization and creation of custom machine translation engines. Language Studio machine translation engines use a hybrid machine translation approach that incorporates the benefits of the latest advances in neural machine translation and statistical machine translation.
Learn more about Hybrid Machine Translation
Tools and processes specific to media data files have been created as part of the Media Studio Data Processing Platform that extracts, refines, matches, and synthesizes data. Many organizations have very large volumes of legacy data files that may not be well organized or structured. Media Studio Data Processing platform will prepare, normalize, clean, and organize them in hours.
Even when you have little or no data for customization, Media Studio and Language Studio tools can be used to synthesize and manufacture millions of bilingual sentences, increasing translation quality to match the nature and purpose of your content. Unique to Media Studio, custom machine translation engines can include the ability to switch between different genres. This allows a single machine translation engine to switch between comedy, drama, sci-fi, action, and documentary writing styles.
Post-Processing: Sentence Splitting and Stucturing
As complete sentences are translated, after translation the sentences must be split back into the correct structure within the subtitles. There are many factors to consider when splitting sentences into sentence fragments and distributing them across subtitles.
A combination of user-configurable rules such as Characters Per Line (CPL) determine the basic parameters for splitting. Artificial intelligence is guided by user-configurable parameters to determine how best to fit the text into lines within one or more subtitles. When splitting, the focus is on finding the optimal position where a human editor would choose to split a sentence so that it looks natural and does not require any change by the editing or quality assurance team.
User-configurable parameters enforce a wide range of target language protocol and stylistic requirements. These take into account speaker change markers, multiple sentences on 1 line, original source language split structure, and many other parameters to determine the most appropriate human-like splitting and formatting. User-configurable parameters also determine if 2 shorter lines can be merged into a single line, whether the time cues should be locked/unchanged or should be automatically adjusted based on the target language text length using parameters such as reading speed. Correctly splitting sentences across subtitles can increase productivity by as much as 30 percent.
Additionally, post-processing allows for the normalization of content based on user-configured style guides. As an example, a customer deployed a rule when translating from English to Indonesian where in the event that the target Indonesia sentence started with the word "Dan" (English "and"), then the first word must be replaced with "Lalu", an alternate preferred synonym.
Bonus Subtitle Optmization Feature
Project Management and Editing and Quality Assurance
While the above 4 advanced processing stages provide significant productivity gains and cost savings, even greater savings can be made by using the Media Studio Project Management and Editing Platform. Project management becomes much easier and streamlined, increasing the number of projects and translation professionals that can be managed by each project manager.
The Media Studio Subtitle Editor enables multiple editors and quality assurance team members to work on the same file at the same time. Delivery times are substantially reduced with style guides, glossaries and rules enforced by the editing and quality assurance environment.