Media Studio

Data Processing Platform Feature Overview

A complete toolkit for processing media-related files,
data extraction, normalization, and organization.

Human Language Technology Enhanced by Artificial Intelligence
Media Studio Data Processing Platform provides an essential and unique set of data processing tools that are specifically designed to create value from your legacy language assets and optimize manual data preparation tasks via artificial intelligence and automation. Many organizations have vast amounts of data assets that can be leveraged for a wide variety of purposes ranging from training data for custom machine translation engines to translation memories, and automated glossaries and dialogs.

Designed for media service providers, OTT providers, and translation organizations that have their project management and editing platform, and occasional/ad hoc users of services. Media Studio Data Processing Platform offers all the core processing functions as both a REST API and a comprehensive web portal.

Media Studio Data Processing Platform provides a comprehensive toolkit for processing, analyzing, organizing, preparing, and creating subtitles and media assets.

Everything you need to organize, clean, normalize, and structure your subtitle and media assets is provided in a single package. A comprehensive set of data processing and analysis tools reduce human effort and increase productivity and accuracy for data preparation tasks such as source language template creation.

Legacy data can be analyzed, organized, matched, and extracted to create new data assets that can be further leveraged as translation memories, or in machine learning and artificial intelligence tasks. Upload your legacy language assets and convert them into data gold.

Subtitle Optimized Machine Translation

Subtitle optimized machine translation is more than just translation. Most machine translation platforms perform very poorly on subtitle content as they are trained primarily on data from documents. Subtitles are full of dialog, conversation, slang, and idioms. Media Studio subtitle engines have been specifically trained on tens of millions of bilingual sentences that are based around dialog and discussion. This produces significantly higher translation quality and ensures the resulting translation sounds natural. Different video genres can also be applied to produce different styles of output.

Time savings go far beyond automatically translating subtitles. Configurable settings allow for time cues to be adjusted based on target language reading speed, locked to the original times, and a range of other adjustable parameters. Translated sentences are split across subtitles in natural positions using where a human would split them using artificial intelligence to make the subtitle read naturally within the user-configurable parameters such as Characters Per Line (CPL).

Together these features more than double productivity for translator and editors. Used in conjunction with Media Studio’s Subtitle Editor, even greater productivity gains are possible.

Data Processing ServerMedia Studio Data Processing Platform is included with Media Studio Project Management and Editing Platform.

WorkflowWorkflow Studio is included with Media Studio to extend custom workflow capabilities and advanced automation tasks.

Available as two Platform Editions specifically designed to match different business needs.

Feature Overview

Each feature is built on a core of Artificial Intelligence, Machine Learning and Natural Language Processing

Machine learning enables machines to work more like humans so that humans don't have to work more like machines. Each feature is designed to augment human intelligence, enhance productivity, increase quality, and reduce cost. Artificial intelligence enables processing and organization of data that simply not be cost-effective or feasible with a human only approach.

Subtitle-Optimized Machine Translation

Subtitle-optimized machine translation delivers faster, cheaper, higher quality, and more consistent translation output than a human-only solution. Fine-tune for Characters Per Line (CPL), part-of-speech, structure balancing, and more. Powerful artificial intelligence driven processes reduce human effort and increase productivity.

Learn More about Subtitle-Optimized Machine Translation

Subtitle Translation

Automatically translate subtitles, adjusting time cues, and complying with rules such as characters per line. This translation feature is specifically optimized for subtitle translation with automated line joining and splitting with support for SRT, TTML, DFXP, and many other subtitle formats.

Synopsis Translation

Automatically translate video synopsis with machine translation engines specifically optimized for purpose.

Document Translation

Translate Microsoft Office (Word, Excel, PowerPoint), Open Office, Adobe PDF, Images, HTML, Plain Text, XML, and many other standard document formats.

User Review Translation

User reviews have their own unique style that includes slang and colloquialisms. Translate within the context of movies producing natural-sounding reviews across languages.

Genre Specific Translation Styles within a Single MT Engine

Media Studio has a unique feature for customized engines that allows the translation genre to be specified at the time of translation and the resulting translation to be stylized to match that genre (i.e. comedy vs. documentary). A full range of genres can be built into a single machine translation engine.

User Configurable Formatting Controls

Control how subtitles are structured after translation to another language using client-specific rules. Specify Characters Per Line (CPL), Speaker Change Indicators, Time Cue Adjustments, Reading Speed, and artificial intelligence based splitting logic to distribute phrases based on logical boundaries to split sentences that span multiple lines and subtitles in the same manner as a professional human translator.

Extract and Normalize Dialogs, Scripts and Screenplays

Extract unstructured and non-standardized dialogs, scripts, and screenplays into useful formats such as dialogs, SRT, or other standardized and normalized document structures.

Dialog Extraction

Automatically extract text from Directors Scripts to create dialogs, subtitles or captions. Extract from raw unstructured document formats such as PDF, images, Microsoft Word and more.

Format Normalization

Convert a wide variety of script and subtitle formats into a standardized layout and structure. Convert your raw non-standard dialogs to structured as-spoken documents and other user-template driven formats.

Transcribe and Synchronize Video and Audio

Automated Speech Recognition (ASR) technologies can reduce human effort and time to transcribe a video. When used in conjunction with a user-provided dialog or extracted dialog from Media Studio tools, accurate transcription of a 1-hour video is reduced from 8-10 hours to between 2-3 hours.

Transcribe Audio

Transcribe directly from video or audio files to produce a timed-text output.

Dialog Synchronization

Synchronize dialog sentences with spoken times from a video or audio track to create timed-text files. Overcome voice recognition inaccuracies with synchronized dialog and highly accurate timed-text file output.


Compare video, audio, and subtitles to produce an automated Edit Decision List (EDL) report. Match time code and other changes with the original video and audio source material to produce to accurately determine changes.

Extract and Translate Glossaries and Terminology

Reduce the time to create a glossary for a series from hours to minutes with language professionals validating and fine-tuning the automated results. When creating bilingual glossaries, language professionals need only check and adjust suggested glossary phrases, increasing their productivity from an average of 20 to more than 160 terms per hour.

Glossary and Term Extraction

Analyze an individual subtitle or an entire series to automatically extract glossary terms in seconds.

Bilingual Glossary Creation

Automatically resolve glossary terminology across languages. Recall existing user defined terminology translations and suggest new terminology translations.

Match and Synthesize Bilingual Data

Turn your legacy language data assets into “data gold”. Many media service providers have tens of thousands of files that have been built using multiple platforms and with sometimes non-optimal management controls and filing systems. Simply upload files and automatically identify and match their bilingual document pairs and extract high-quality bilingual sentences. Even monolingual data can be leveraged as a driver for synthetic data. Use your data assets to bolster your own custom machine translation engine.

Bilingual Subtitle Pair Matching

Upload thousands of subtitles in different languages and formats. Artificial Intelligence is used to determine pairs of subtitles that are translations of each other.

Bilingual Sentence Matching

Automatically extract matching bilingual sentences pairs of bilingual documents. Produces output in common translation memory formats such as TMX, XLIFF, CSV, and tab pair.

Data Synthesis

Synthesize artificial learning data to assist in teaching machine translation systems your style, genre, and vocabulary.

Customize your own Machine Translation Engine

Leverage your bilingual and monolingual data assets to create your own high-quality custom machine translation engines. Custom engines produce output in your vocabulary, writing style, and context, reducing human effort and increasing productivity.

Learn More about Custom MT Engines

Advanced Subtitle Analysis and Processing

Basic organization of subtitle and data files is often a challenge, especially for legacy or acquired third-party data. Automatically analyze, convert, extract, compare, measure, normalize, and rename data in bulk.

Language Identification

Automatically detect the language of a subtitle and organize the files into folders grouped by language.

Bulk Renaming

Leverage data that you have on file to rename all your files so that they are named consistently.

Genre Identification

Automatically detect multiple genres for any subtitle based on the subtitle text. Genres include Action, Adventure, Animation, Biography, Comedy, Crime, Documentary, Drama, Family, Fantasy, Film Noir, Game Show, History, Horror, Music, Musical, Mystery, News, Reality TV, Romance, Sci-Fi, Short Film, Sport, Sports, Superhero, Talk Show, Thriller, War, and Western.

Restructure and Reformat Subtitles

Automatically normalize, restructure and reformat subtitles and captions to meet specific requirements. Change Characters Per Line (CPL), minimum and maximum display time, gaps between subtitles, and formatting.

Burn and Encode

Burn captions or subtitles into video. Render MP4/MOV (h.264), ProRes HQ, etc.

Burned-In Subtitle Recognition

Analyze video for burned-in subtitles and convert them to SRT and other subtitle formats.

Bulk Encoding Conversion

Automatically detect the encoding of a subtitle and convert it to a specified encoding such as UTF-8.

Bulk Format Conversion

Convert between a wide variety of caption and subtitle formats such as SRT, TTML, DFXP, WebVTT, and more.

Compare Multiple Subtitles

Compare 2 subtitles to determine their differences and then accept the changes that your prefer.

Measure Change Effort

Compare and measure multiple subtitle versions with scored metrics. Start with a source language file then add the machine translation, post-edited, and quality assurance versions for a full set of comprehensive metrics.
FREE WEBINAR: AI and Language Processing Innovation – What Is It Good For? Real-World Use CasesWatch the Replay