End-to-end Generative Pre-training for Multimodal Video Captioning
Posted by Paul Hongsuck Seo and Arsha Nagrani, Research Scientists, Google Research, Perception Team
previous - next

Leave a Reply