AVFormer: Injecting vision into frozen speech models for zero-shot AV-ASR

Favorite Posted by Arsha Nagrani and Paul Hongsuck Seo, Research Scientists, Google Research Automatic speech recognition (ASR) is a well-established technology that is widely adopted for various applications such as conference calls, streamed video transcription and voice commands. While the challenges for this technology are centered around noisy audio inputs,
Read More Shared by Google AI Technology June 2, 2023