RO-ViT: Region-aware pre-training for open-vocabulary object detection with vision transformers

Favorite Posted by Dahun Kim and Weicheng Kuo, Research Scientists, Google The ability to detect objects in the visual world is crucial for computer vision and machine intelligence, enabling applications like adaptive autonomous agents and versatile shopping systems. However, modern object detectors are limited by the manual annotations of their

Read More
Shared by Google AI Technology August 28, 2023

Responsible AI at Google Research: Perception Fairness

Favorite Posted by Susanna Ricco and Utsav Prabhu, co-leads, Perception Fairness Team, Google Research Google’s Responsible AI research is built on a foundation of collaboration — between teams with diverse backgrounds and expertise, between researchers and product developers, and ultimately with the community at large. The Perception Fairness team drives

Read More
Shared by Google AI Technology August 25, 2023

Teaching language models to reason algorithmically

Favorite Posted by Hattie Zhou, Graduate Student at MILA, Hanie Sedghi, Research Scientist, Google Large language models (LLMs), such as GPT-3 and PaLM, have shown impressive progress in recent years, which have been driven by scaling up models and training data sizes. Nonetheless, a long standing debate has been whether

Read More
Shared by Google AI Technology August 24, 2023

Language to rewards for robotic skill synthesis

Favorite Posted by Wenhao Yu and Fei Xia, Research Scientists, Google Empowering end-users to interactively teach robots to perform novel tasks is a crucial capability for their successful integration into real-world applications. For example, a user may want to teach a robot dog to perform a new trick, or teach

Read More
Shared by Google AI Technology August 22, 2023

Google at Interspeech 2023

Favorite Posted by Catherine Armato, Program Manager, Google This week, the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) is being held in Dublin, Ireland, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing. Experts in speech-related research

Read More
Shared by Google AI Technology August 21, 2023

Autonomous visual information seeking with large language models

Favorite Posted by Ziniu Hu, Student Researcher, and Alireza Fathi, Research Scientist, Google Research, Perception Team There has been great progress towards adapting large language models (LLMs) to accommodate multimodal inputs for tasks including image captioning, visual question answering (VQA), and open vocabulary recognition. Despite such achievements, current state-of-the-art visual

Read More
Shared by Google AI Technology August 18, 2023

Neural network pruning with combinatorial optimization

Favorite Posted by Hussein Hazimeh, Research Scientist, Athena Team, and Riade Benbaki, Graduate Student at MIT Modern neural networks have achieved impressive performance across a variety of applications, such as language, mathematical reasoning, and vision. However, these networks often use large architectures that require lots of computational resources. This can

Read More
Shared by Google AI Technology August 17, 2023