Evaluating speech synthesis in many languages with SQuId

Favorite Posted by Thibault Sellam, Research Scientist, Google Previously, we presented the 1,000 languages initiative and the Universal Speech Model with the goal of making speech and language technologies available to billions of users around the world. Part of this commitment involves developing high-quality speech synthesis technologies, which build upon

Read More
Shared by Google AI Technology June 7, 2023

Visual captions: Using large language models to augment video conferences with dynamic visuals

Favorite Posted by Ruofei Du, Research Scientist, and Alex Olwal, Senior Staff Research Scientist, Google Augmented Reality Recent advances in video conferencing have significantly improved remote video communication through features like live captioning and noise cancellation. However, there are various situations where dynamic visual augmentation would be useful to better

Read More
Shared by Google AI Technology June 6, 2023

Retrieval-augmented visual-language pre-training

Favorite Posted by Ziniu Hu, Student Researcher, and Alireza Fathi, Research Scientist, Google Research, Perception Team Large-scale models, such as T5, GPT-3, PaLM, Flamingo and PaLI, have demonstrated the ability to store substantial amounts of knowledge when scaled to tens of billions of parameters and trained on large text and

Read More
Shared by Google AI Technology June 1, 2023