Vid2Seq: a pretrained visual language model for describing multi-event videos
Favorite Posted by Antoine Yang, Student Researcher, and Arsha Nagrani, Research Scientist, Google R
previous - next

Leave a Reply