STATISTICS

Viewed27

Downloads22

Object-centric Video Prediction Algorithm Based on Dynamic Memory and Motion Information
[1]HAN Chenchen,LU Xiankai,WANG Zhicheng,et al.Object-centric Video Prediction Algorithm Based on Dynamic Memory and Motion Information[J].Journal of Zhengzhou University (Engineering Science),2025,46(05):51-59.[doi:10.13705/j.issn.1671-6833.2025.02.011]
Copy
References:
[1]李卫军, 张新勇, 高庾潇, 等. 基于门控时空注意力的视频帧预测模型[J]. 郑州大学学报(工学版), 2024, 45(1): 70-77, 121. 
LI W J, ZHANG X Y, GAO Y X, et al. Video frame prediction model based on gated spatio-temporal attention [J]. Journal of Zhengzhou University (Engineering Science), 2024, 45(1): 70-77, 121. 
[2]MARTINEZ J, BLACK M J, ROMERO J. On human motion prediction using recurrent neural networks[C]∥ 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2891-2900. 
[3]CASTREJON L, BALLAS N, COURVILLE A. Improved conditional VRNNs for video prediction[C]∥2019 IEEE/ CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7608-7617. 
[4]DAI K, LI X T, YE Y M, et al. MSTCGAN: multiscale time conditional generative adversarial network for longterm satellite image sequence prediction [J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 1-16. 
[5]SUN F, BAI C, SONG Y, et al. MMINR: multi-frameto-multi-frame inference with noise resistance for precipitation nowcasting with radar[C]∥The 26th International Conference on Pattern Recognition. Piscataway: IEEE, 2022: 97-103. 
[6]PAN T, JIANG Z Q, HAN J N, et al. Taylor saves forlater: disentanglement for video prediction using Taylor representation [J]. Neurocomputing, 2022, 472: 166-174. 
[7]LEE W, JUNG W, ZHANG H, et al. Revisiting hierarchical approach for persistent long-term video prediction[EB/ OL]. (2021-04-14)[2024-08-10].https:∥doi.org/10. 48550/arXiv.2104.06697. 
[8]LOCATELLO F, WEISSENBORN D, UNTERTHINER T, et al. Object-centric learning with slot attention[J]. Advances in Neural Information Processing Systems, 2020, 33: 11525-11538. 
[9]LIN Z H, LI M M, ZHENG Z B, et al. Self-attention ConvLSTM for spatiotemporal prediction[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11531-11538. 
[10]WANG Y B, LONG M S, WANG J M, et al. PredRNN: recurrent neural networks for predictive learning using spatiotemporal LSTMs[J]. Advances in Neural Information Processing Systems, 2017, 30: 879-888. 
[11] VILLEGAS R, YANG J M, HONG S, et al. Decomposing motion and content for natural video sequence prediction[EB/OL]. (2017-07-25)[2024-08-10].https:∥ doi.org/10.48550/arXiv.1706.08033. 
[12] VOLETI V S, JOLICOEUR-MARTINEAU A, PAL C. MCVD: masked conditional video diffusion for prediction, generation, and interpolation[J]. Advances in Neural Information Processing Systems, 2022, 36: 23371-23385. 
[13] AKAN A K, ERDEM E, ERDEM A, et al. SLAMP: stochastic latent appearance and motion prediction[C]∥ 2021 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2021: 14708-14717. 
[14]WANG T C, LIU M Y, ZHU J Y, et al. Video-to-video synthesis[EB/OL]. (2018-08-20)[2024-08-10].https:∥doi.org/10.48550/arXiv.1808.06601. 
[15] BEI X Z, YANG Y C, SOATTO S. Learning semanticaware dynamics for video prediction[C]∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2021: 902-912. 
[16]WU Y, GAO R R, PARK J, et al. Future video synthesis with object motion prediction[C]∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 5538-5547. 
[17]WU Y F, YOON J, AHN S. Generative video transformer: can objects be the words? [EB/OL]. (2021-07-20) [2024-08-10]. https:∥doi. org/10. 48550/arXiv. 2107.09240 
[18]WU Z Y, DVORNIK N, GREFF K, et al. SlotFormer: unsupervised visual dynamics simulation with object-centric models[EB/OL]. (2022-10-12)[2024-08-10]. https:∥doi.org/10.48550/arXiv.2210.05861. 
[19] VILLAR-CORRALES A, WAHDAN I, BEHNKE S. Object-centric video prediction via decoupling of object dynamics and interactions[C]∥2023 IEEE International Conference on Image Processing (ICIP). Piscataway: IEEE, 2023: 570-574. 
[20] ELSAYED G F, MAHENDRAN A, VAN STEENKISTE S, et al. SAVi++: towards end-to-end object-centric learning from real-world videos[EB/OL]. (2022-0615)[2024-08-10]. https:∥doi. org/10.48550/arXiv. 2206.07764. 
[21]WATTERS N, MATTHEY L, BURGESS C P, et al. Spatial broadcast decoder: a simple architecture for learning disentangled representations in VAEs[EB/OL]. (201906-21)[2024-08-10].https:∥doi.org/10.48550/arXiv. 1901.07017. 
[22] ZHONG Y Q, LIANG L M, ZHARKOV I, et al. MMVP: motion-matrix-based video prediction[C]∥2023 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2023: 4250-4260. 
[23] LIN Z X, WU Y F, PERI S, et al. Improving generative imagination in object-centric world models[EB/OL]. (2020-10-05)[2024-08-10]. https:∥doi. org/10. 48550/arXiv.2010.02054. 
[24] YI K X, GAN C, LI Y Z, et al. CLEVRER: CoLlision events for video REpresentation and reasoning[EB/OL]. (2019-10-03)[2024-08-10]. https:∥doi. org/10. 48550/arXiv.1910.01442. 
[25] ZADAIANCHUK A, SEITZER M, MARTIUS G. Objectcentric learning for real-world videos by predicting temporal feature similarities[J]. Advances in Neural Information Processing Systems, 2023, 36: 61514-61545. 
[26]WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. 
[27] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 586595. 
[28] JIN B B, HU Y, TANG Q K, et al. Exploring spatialtemporal multi-frequency analysis for high-fidelity and temporal-consistency video prediction[C]∥2020 IEEE/ CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 4553-4562. 
[29] GAO Z Y, TAN C, WU L R, et al. SimVP: simpler yet better video prediction[C]∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 3160-3170.
Similar References:
Memo

-

Last Update: 2025-09-19
Copyright © 2023 Editorial Board of Journal of Zhengzhou University (Engineering Science)