
When using Automated Speech Recognition (ASR), a slight lag might occur due to the captioner’s necessity to listen and transcribe the words and the time taken by the computer to process the information. Real-time captioning can be utilized in events that do not have written scripts or captions, such as lectures, classes, congressional or council meetings, news programs, non-broadcast meetings organized by professional associations, and video conferencing such as with Microsoft Teams, Zoom, WebEx, GotoMeeting and similar. In the case of live television broadcasts, there may also be a delay in other processing of the video and audio, enabling the ASR to have time to process the spoken words and then have the live closed caption re-inserted into the video stream using time syncronization, giving the appearance of real-time close captioning.
Remote real-time captions are generated from a remote location and sent to the location of the event or broadcast. For instance, an instructor in a lecture hall can speak into a microphone connected via the Internet to a remote ASR server. The ASR server sends the captioned text via the internet to a specified location where it can be transmitted to the intended audience.
Due to the real-time nature of captioning live events and live broadcast content, automated speech recognition is a little less accurate than when processing batch files or pre-recorded audio files. This is because the pre-recorded audio file has the benefit of the entire sentence or sentences that were spoken and a longer time window to process the data. Real-time captioning software such as Language Studio’s ASR features has less time to process the data and in some cases may have to guess the context of a word and its spelling when there are multiple words that sound the same. For example “Thai” and “tie” both sound identical. With a full sentence the ASR is more likely to have the context to determine which form to use.
Closed captioning also serves as a benefit for individuals who comprehend written language better than spoken language in which a presentation is delivered, as well as for people who are watching the program in either a noisy environment (such as an airport or sports bar) or a quiet one (such as a work cubicle or public transport). Non-real-time captions include those on television programming and pre-recorded videos that can be rented or purchased. Such captions are structured different for timing and line length when compared to subtitles. Subtitles purpose is different and intended for use when content is translated from one language to another.
Related Links
Pages
- Introduction to Machine Translation at Omniscien
- Hybrid Neural and Statistical Machine Translation
- Custom Machine Translation Engines
- Powerful Tools for Data Creation, Preparation, and Analysis
- Clean Data Machine Translation
- Industry Domains
- Ways to Translate
- Supported Languages
- Supported Document Formats
- Deployment Models
- Data Security & Privacy
- Secure by Design
- Localization Glossary - Terms that you should know
Products
- Language Studio
Enterprise-class private and secure machine translation. - Media Studio
Media and Subtitle Translation, Project Management and Data Processing - Workflow Studio
Enterprise-class private and secure machine translation.
FAQ
- FAQ Home Page
- Primer; Different Types of Machine Translation
- What is Custom Machine Translation?
- What is Generic Machine Translation?
- What is Do-It-Yourself (DIY) Machine Translation?
- What is Hybrid Machine Translation?
- What is Neural Machine Translation?
- What is Statistical Machine Translation?
- What is Rules-Based Machine Translation?
- What is Syntax-Based Machine Translation?
- What is the difference between “Clean Data MT” and “Dirty Data MT”?