Follow
Zhan Tong
Title
Cited by
Cited by
Year
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Z Tong, Y Song, J Wang, L Wang
36th Conference on Neural Information Processing Systems (NeurIPS), 2022
9332022
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
S Chen, C Ge, Z Tong, J Wang, Y Song, J Wang, P Luo
36th Conference on Neural Information Processing Systems (NeurIPS), 2022
4782022
TDN: Temporal Difference Networks for Efficient Action Recognition
L Wang, Z Tong, B Ji, G Wu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1895-1904, 2021
4592021
Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations
Y Liang, C Ge, Z Tong, Y Song, J Wang, P Xie
International Conference on Learning Representations (ICLR), 2022
2792022
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
L Wang, B Huang, Z Zhao, Z Tong, Y He, Y Wang, Y Wang, Y Qiao
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
2712023
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
MMA Contributors
https://github.com/open-mmlab/mmaction2, 2020
1912020
MGSampler: An Explainable Sampling Strategy for Video Action Recognition
Y Zhi, Z Tong, L Wang, G Wu
IEEE/CVF International Conference on Computer Vision (ICCV), 1513-1522, 2021
742021
Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning
C Ge, J Wang, Z Tong, S Chen, Y Song, P Luo
International Conference on Learning Representations (ICLR), 2023
282023
Advancing Vision Transformers with Group-Mix Attention
C Ge, X Ding, Z Tong, L Yuan, J Wang, Y Song, P Luo
arXiv preprint arXiv:2311.15157, 2023
122023
Efficient Video Action Detection with Token Dropout and Context Refinement
L Chen, Z Tong, Y Song, G Wu, L Wang
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
122023
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens
Z Gao, Z Tong, L Wang, MZ Shou
International Conference on Learning Representations (ICLR), 2024
72024
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Z Zeng, Z Tong, X Liu, B Chen, ST Xia, Y Ge
arXiv preprint arXiv:2305.14173, 2023
62023
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action Detection
L Chen, Z Tong, Y Song, G Wu, L Wang
arXiv preprint arXiv:2303.16118, 2023
42023
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Q Liu, K Zheng, W Wu, Z Tong, Y Liu, W Chen, Z Wang, Y Shen
arXiv preprint arXiv:2312.14149, 2023
32023
Contextual AD Narration with Interleaved Multimodal Sequence
H Wang, Z Tong, K Zheng, Y Shen, L Wang
arXiv preprint arXiv:2403.12922, 2024
12024
Bootstrapping SparseFormers from Vision Foundation Models
Z Gao, Z Tong, KQ Lin, J Chen, MZ Shou
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
2024
SpeedAug: A Simple Co-Augmentation Method for Unsupervised Audio-Visual Pre-training
J Wang, J Jiao, Y Song, S James, Z Tong, C Ge, P Abbeel, YH Liu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Sight …, 2023
2023
The system can't perform the operation now. Try again later.
Articles 1–17