Omniscien » FAQ » What is real-time captioning?

What is real-time captioning?

Live captioningCaptions, also known as closed captions, are made up of text, are designed to assist individuals who are deaf or have hearing difficulties in accessing audio content. Real-time captions, also referred to as Computer Assisted Real-time Translation (CART), are produced in real-time during an event. While the primary purpose of real-time captioning is to assist the hard of hearing viewers, there are also other use cases where real-time captioning may be useful. One such case is for live sports events where many viewers have challenges hearing the content due to the location that they watch the sports in such as bars, restaurants, public transport or in an office environment. See “Adding Real-Time Captioning to Sports Commentary in Live Video Streams” for a real-world example.

When using Automated Speech Recognition (ASR), a slight lag might occur due to the captioner’s necessity to listen and transcribe the words and the time taken by the computer to process the information. Real-time captioning can be utilized in events that do not have written scripts or captions, such as lectures, classes, congressional or council meetings, news programs, non-broadcast meetings organized by professional associations, and video conferencing such as with Microsoft Teams, Zoom, WebEx, GotoMeeting and similar. In the case of live television broadcasts, there may also be a delay in other processing of the video and audio, enabling the ASR to have time to process the spoken words and then have the live closed caption re-inserted into the video stream using time syncronization, giving the appearance of real-time close captioning.

Remote real-time captions are generated from a remote location and sent to the location of the event or broadcast. For instance, an instructor in a lecture hall can speak into a microphone connected via the Internet to a remote ASR server. The ASR server sends the captioned text via the internet to a specified location where it can be transmitted to the intended audience.

Due to the real-time nature of captioning live events and live broadcast content, automated speech recognition is a little less accurate than when processing batch files or pre-recorded audio files. This is because the pre-recorded audio file has the benefit of the entire sentence or sentences that were spoken and a longer time window to process the data. Real-time captioning software such as Language Studio’s ASR features has less time to process the data and in some cases may have to guess the context of a word and its spelling when there are multiple words that sound the same. For example “Thai” and “tie” both sound identical. With a full sentence the ASR is more likely to have the context to determine which form to use.

Closed captioning also serves as a benefit for individuals who comprehend written language better than spoken language in which a presentation is delivered, as well as for people who are watching the program in either a noisy environment (such as an airport or sports bar) or a quiet one (such as a work cubicle or public transport). Non-real-time captions include those on television programming and pre-recorded videos that can be rented or purchased. Such captions are structured different for timing and line length when compared to subtitles. Subtitles purpose is different and intended for use when content is translated from one language to another.

FREE WEBINAR: AI and Language Processing Innovation – What Is It Good For? Real-World Use CasesWatch the Replay