First, a note.
ASL interpretation and English captioning serve two different audiences. One is not a substitute for the other, and both should be provided whenever possible. This document is intended only to address issues related to captioning for events, and does not take any position on whether ASL or captioning should be provided if only one is possible.
Who uses captioning?
It’s not just Deaf and hard-of-hearing people; captions benefit a broad spectrum of people. For individuals whose native language is not English, it is often easier to understand content by reading rather than by listening. Many non-neurotypical people also benefit from captions — those with ADHD may find captions help them focus, autistics may find them useful if they are using ear protection, and so forth. Captions can even help staff members who may be multitasking – a stage manager with an earpiece can use captions to keep up with onstage content even as they continue headset communications.
Live captions by a human are always preferred.
The reason for this is simple: voice recognition technology cannot hope to approach the accuracy of a human transcriptionist. Accents, overlapping conversation, puns, muffled audio, speech disfluencies, and other factors can be handled adeptly by a human, who will still produce an accurate, legible transcription. Computers struggle with all of these, and will create unreadable captions that deprive people of access to the content. Even when the words are captured accurately, punctuation is generally sub-par, making the captions confusing and hard to follow.
Isn’t AI transcription getting better?
Actually, it is – YouTube’s original transcription produced completely illegible captions. Today, if you turn on auto-captions, the content is largely intelligible. Microsoft Teams produces pretty good captions, Google Live Transcribe does very well (albeit without punctuation), and Otter.ai also has a fairly good product that is now used on Zoom. But all of these assume a best case scenario for readable captions, and they are designed for computer-mediated interactions. Attempting to use AI transcription for in-person events is asking for trouble, and even for virtual events speakers must speak at a measured pace, must not overlap each other, must be free of all background noise, and must have strong and consistent internet connectivity.
What are some examples of poor AI transcription?
The 2019 Hugo Awards for science fiction and fantasy works were live-captioned in the Dublin Conference Centre by an AI. They produced only about 60% accurate transcription, which left presenters and awardees flustered as the distracted audience laughed at the captioning errors on screen rather than paying attention to the ceremony. When an author took the stage to receive an award, the AI turned their name into a sexual reference, and the captions were shut off entirely after that, leaving anyone relying on the captions in the dark…including a DeafBlind author, who also won an award that night, but was deprived of access to the full ceremony due to the AI transcription’s failures.
Cost is not as relevant as you may think.
AI transcription companies promise access at a fraction of the cost of live captioning. Unfortunately, their promise cannot be fulfilled, as the technology they use cannot provide access. While some companies may indeed aspire to more than just a quick buck, and their employees may believe they’re helping people, AI transcription is still not ready for prime time. In the US, where laws specifically call for effective accommodations, modern AI simply cannot fulfill this mission. The requirement that accommodations be “reasonable” does include cost, but AI is likely enough to fail that accommodations will then not be provided at all — which is a violation of the ADA.
Providing readable, useful captions is simply the right thing to do, to include everyone in our community who wishes to be there.