Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Yang Jin1,
Zhicheng Sun1,
Kun Xu2,
Kun Xu2,
Liwei Chen2,
Hao Jiang1,
Quzhe Huang1,
Chengru Song2,
Yuliang Liu2,
Di Zhang2,
Yang Song2,
Kun Gai2,
Yadong Mu1
1Peking University,
2Kuaishou Technology