Vid2Seq: a pretrained visual language model for describing multi-event videos

Favorite Posted by Antoine Yang, Student Researcher, and Arsha Nagrani, Research Scientist, Google R
You must Subscribe to read our archived content. Already subscribed? log in here.

Leave a Reply

Your email address will not be published. Required fields are marked *

Shared by: Google AI Technology

Tags: ,