Speech Recognition: What’s Left? Dr Michael Picheny 12 November 2019

Duration: 59 mins 29 secs
Share this media item:
Embed this media item:


About this item
Image inherited from collection
Description: This talk examines speech recognition issue, comparing and contrasting them to what is known about human perception. With recent advances in Deep Learning, it is suggested that it is now achievable for Word Error Rates to be comparable to human listeners. This talk specifically highlights issues with accented, noisy speech, different speaking styles, multilingual speech recognition and more. And through demonstrations in comparison to human perception, there is still significant work in speech recognition research from the community.
 
Created: 2019-11-25 13:43
Collection: Information Engineering Distinguished Lecture Series
Publisher: University of Cambridge
Copyright: Dr Michael Picheny
Language: eng (English)
 
Abstract: Recent speech recognition advances on the SWITCHBOARD corpus suggest that because of recent advances in Deep Learning, we now achieve Word Error Rates comparable to human listeners. Does this mean the speech recognition problem is solved and the community can move on to a different set of problems? In this talk, we examine speech recognition issues that still plague the community and compare and contrast them to what is known about human perception. We specifically highlight issues in accented speech, noisy/reverberant speech, speaking style, rapid adaptation to new domains, and multilingual speech recognition. We try to demonstrate that compared to human perception, there is still much room for improvement, so significant work in speech recognition research is still required from the community.
Available Formats
Format Quality Bitrate Size
MPEG-4 Video 1280x720    2.99 Mbits/sec 1.30 GB View Download
MPEG-4 Video 640x360    1.93 Mbits/sec 864.45 MB View Download
WebM 1280x720    2.35 Mbits/sec 1.02 GB View Download
WebM 640x360    406.25 kbits/sec 177.04 MB View Download
iPod Video 480x270    520.17 kbits/sec 226.63 MB View Download
MP3 44100 Hz 249.75 kbits/sec 108.93 MB Listen Download
Auto * (Allows browser to choose a format it supports)