|
广西师范大学学报(自然科学版) ›› 2022, Vol. 40 ›› Issue (5): 90-103.doi: 10.16088/j.issn.1001-6600.2022022501
梁启花1, 胡现韬1, 钟必能1*, 于枫1,2, 李先贤1
LIANG Qihua1, HU Xiantao1, ZHONG Bineng1*, YU Feng1,2, LI Xianxian1
摘要: 目标跟踪是计算机视觉领域中最为核心的基础研究问题之一,其能够协同高层视频应用分析和研究,具有重要的理论价值、广泛的实用价值和多学科交叉性,成为学术界、工业界以及国家战略的关注焦点。由于跟踪场景复杂度高、干扰强,目标表观变化多样性以及多模态信息融合等因素,使得跟踪器需要均衡鲁棒性、准确性以及实时性等性能衡量指标。目前,已有很多工作从不同视角解决目标跟踪领域中的挑战,但是在多维度性能指标的衡量下,仍然不能很好地克服复杂场景下的跟踪问题。本文通过基于孪生网络的目标跟踪算法,回顾领域发展现状,探讨存在的挑战,展望未来值得关注的研究方向,为该领域未来的研究工作提供借鉴和参考。
中图分类号:
[1]陈云芳, 吴懿, 张伟. 基于孪生网络结构的目标跟踪算法综述[J]. 计算机工程与应用, 2020, 56(6): 10-18. DOI: 10.3778/j.issn.1002-8331.1911-0127. [2]高文, 朱明, 贺柏根, 等. 目标跟踪技术综述[J]. 中国光学, 2014, 7(3): 365-375. DOI: 10.3788/CO.20140703.0365. [3]葛宝义, 左宪章, 胡永江. 视觉目标跟踪方法研究综述[J]. 中国图象图形学报, 2018, 23(8): 1091-1107. DOI: 10.11834/jig.170604. [4]朱强, 王超毅, 张吉庆, 等. 基于事件相机的无人机目标跟踪算法[J]. 浙江大学学报(理学版), 2022, 49(1): 10-18. DOI: 10.3785/j.issn.1008-9497.2022.01.002. [5]管皓, 薛向阳, 安志勇. 深度学习在视频目标跟踪中的应用进展与展望[J]. 自动化学报, 2016, 42(6): 834-847. DOI: 10.16383/j.aas.2016.c150705. [6]KRISTAN M, LEONARDIS A, MATAS J, et al. The sixth visual object tracking VOT2018 challenge results[C]// Computer Vision-ECCV 2018 Workshops: LNCS Volume 11129. Cham: Springer, 2019: 3-53. DOI: 10.1007/978-3-030-11009-3_1. [7]WU Y, LIM J W, YANG M H. Object tracking benchmark[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1834-1848. DOI: 10.1109/TPAMI.2014.2388226. [8]KRISTAN M, MATAS J, LEONARDIS A, et al. The seventh visual object tracking VOT2019 challenge results[C]// 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Los Alamitos, CA: IEEE Computer Society, 2019: 2206-2241. DOI: 10.1109/ICCVW.2019.00276. [9]MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for UAV tracking[C]// Computer Vision-ECCV 2016: LNCS Volume 11129. Cham: Springer, 2016: 445-461. DOI: 10.1007/978-3-319-46448-0_27. [10]HUANG L H, ZHAO X, HUANG K Q. GOT-10k: a large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562-1577. DOI: 10.1109/TPAMI. 2019.2957464. [11]FAN H, LIN L T, YANG F, et al.LaSOT: a high-quality benchmark for Large-Scale single object tracking[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2019: 5369-5378. DOI: 10.1109/CVPR.2019.00552. [12]VALMADRE J, BERTINETTO L, HENRIQUES J F, et al. Long-term tracking in the wild: a benchmark[C]// Computer Vision-ECCV 2018: LNCS Volume 11207. Cham: Springer, 2018: 692-707. DOI: 10.1007/978-3-030-01219-9_41. [13]MOUDGIL A, GANDHI V. Long-term visual object tracking benchmark[C]// Computer Vision-ACCV 2018: LNCS Volume 11362. Cham: Springer, 2019: 629-645. DOI: 10.1007/978-3-030-20890-5_40. [14]REAL E, SHLENS J, MAZZOCCHI S, et al. YouTube-BoundingBoxes: a large high-precision human-annotated data set for object detection in video[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 7464-7473. DOI: 10.1109/CVPR.2017.789. [15]CHEN Z D, ZHONG B N, LI G R, et al. Siamese box adaptive network for visual tracking[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 6667-6676. DOI: 10.1109/CVPR42600.2020.00670. [16]XU Y D, WANG Z Y, LI Z X, et al.SiamFC++: towards robust and accurate visual tracking with target estimation guidelines[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12549-12556. DOI: 10.1609/aaai.v34i07.6944. [17]GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2017: 1781-1789. DOI: 10.1109/ICCV.2017.196. [18]BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2019: 6181-6190. DOI: 10.1109/ICCV.2019.00628. [19]CHEN X, YAN B, ZHU J W, et al. Transformer tracking[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 8122-8131. DOI: 10.1109/CVPR46437.2021.00803. [20]YAN B, PENG H W, FU J L, et al. Learningspatio-temporal transformer for visual tracking[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2021: 10428-10437. DOI: 10.1109/ICCV48922.2021.01028. [21]YAN B, PENG H W, WU K, et al.LightTrack: finding lightweight neural networks for object tracking via one-shot architecture search[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 15175-15184. DOI: 10.1109/CVPR46437.2021.01493. [22]BORSUK V, VEI R, KUPYN O, et al. FEAR: fast, efficient, accurate and robust visualtracker[EB/OL]. (2021-12-15) [2022-02-25]. https://arxiv.org/abs/2112.07957. DOI: 10.48550/arXiv.2112.07957. [23]FENG Q, ABLAVSKY V, BAI Q X, et al. Siamese natural language tracker: tracking by natural language descriptions with Siamese trackers[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 5847-5856. DOI: 10.1109/CVPR46437.2021.00579. [24]DAI K N, ZHANG Y H, WANG D, et al. High-performance long-term tracking with meta-updater[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 6297-6306. DOI: 10.1109/CVPR42600.2020.00633. [25]GUO D Y, WANG J, CUI Y, et al.SiamCAR: Siamese fully convolutional classification and regression for visual tracking[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 6268-6276. DOI: 10.1109/CVPR42600.2020.00630. [26]ZHANG Z P, PENG H W, FU J L, et al. Ocean: object-aware anchor-free tracking[C]// Computer Vision-ECCV 2020: LNCS Volume 12366. Cham: Springer, 2020: 771-787. DOI: 10.1007/978-3-030-58589-1_46. [27]BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]// Computer Vision-ECCV 2016 Workshops: LNCS Volume 9914. Cham: Springer, 2016: 850-865. DOI: 10.1007/978-3-319-48881-3_56. [28]VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlationfilter based tracking[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 5000-5008. DOI: 10.1109/CVPR.2017.531. [29]DANELLJAN M, BHAT G, KHAN F S, et al. ECO: efficient convolution operators for tracking[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 6931-6939. DOI: 10.1109/CVPR.2017.733. [30]LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2018: 8971-8980. DOI: 10.1109/CVPR.2018.00935. [31]LI B, WU W, WANG Q, et al.SiamRPN++: evolution of Siamese visual tracking with very deep networks[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2019: 4277-4286. DOI: 10.1109/CVPR.2019.00441. [32]WANG G T, LUO C, XIONG Z W, et al. SPM-tracker: series-parallel matching for real-time visual object tracking[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2019: 3638-3647. DOI: 10.1109/CVPR.2019.00376. [33]卢湖川, 李佩霞, 王栋. 目标跟踪算法综述[J]. 模式识别与人工智能, 2018, 31(1): 61-76. DOI: 10.16451/j.cnki. issn1003-6059.201801006. [34]李玺, 查宇飞, 张天柱, 等. 深度学习的目标跟踪算法综述[J]. 中国图象图形学报, 2019, 24(12): 2057-2080. [35]ZHANG L C, GONZALEZ-GARCIA A, VAN DE WEIJER J, et al. Learning the model update for Siamese trackers[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2019: 4009-4018. DOI: 10.1109/ICCV.2019.00411. [36]ZHU Z, WANG Q, LI B, et al. Distractor-aware Siamese networks for visual object tracking[C]// Computer Vision-ECCV 2018: LNCS Volume 11213. Cham: Springer, 2018: 103-119. DOI: 10.1007/978-3-030-01240-3_7. [37]THRUN S. Is learning then-th thing any easier than learning the first?[C]// Advances in Neural Information Processing Systems 8 (NIPS 1995). Cambridge, MA: MIT Press, 1995: 640-646. [38]LI F F, FERGUS R, PERONA P. One-shot learning of object categories[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594-611. DOI: 10.1109/TPAMI.2006.79. [39]SANTORO A, BARTUNOV S, BOTVINICK M, et al. Meta-learning with memory-augmented neural networks[C]// Proceedings of the 33rd International Conference on International Conference on Machine Learning: PMLR Volume 48. New York, NY:PMLR, 2016: 1842-1850. [40]FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]// Proceedings of the 34th International Conference on Machine Learning: PMLR Volume 70. Sydney:PMLR, 2017: 1126-1135. [41]ORESHKIN B, RODRÍGUEZ LÓPEZ P, LACOSTE A. TADAM: task dependent adaptive metric for improved few-shot learning[C]// Advances in Neural Information Processing Systems 31 (NeurIPS 2018). Red Hook: Curran Associates, Inc., 2018: 721-731. [42]SUNG F, YANG Y X, ZHANG L, et al. Learning tocompare: relation network for few-shot learning[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2018: 1199-1208. DOI: 10.1109/CVPR.2018.00131. [43]HUANG L H, ZHAO X, HUANG K Q. Bridging the gap between detection and tracking: a unified approach[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2019: 3998-4008. DOI: 10.1109/ICCV.2019.00410. [44]WANG G T, LUO C, SUN X Y, et al. Tracking by instance detection: a meta-learning approach[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 6287-6296. DOI: 10.1109/CVPR42600.2020.00632. [45]CHENG S Y, ZHONG B N, LI G R, et al. Learning to filter: Siamese relation network for robust tracking[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 4419-4429. DOI: 10.1109/CVPR46437.2021.00440. [46]WU Y, LIM J W, YANG M H. Online object tracking: a benchmark[C]// 2013 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2013: 2411-2418. DOI: 10.1109/CVPR.2013.312. [47]VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems 30 (NIPS 2017). Red Hook, NY: Curran Associates, Inc., 2017: 6000-6010. [48]韩光, 刘旭辉, 刘佶鑫, 等. 基于并行空频注意力引导的任务感知目标跟踪算法[J]. 南京邮电大学学报(自然科学版), 2022, 42(1): 62-72. DOI: 10.14132/j.cnki.1673-5439.2022.01.009. [49]冯琪堯, 张惊雷. 基于混合注意力机制的目标跟踪算法[J]. 计算机工程与科学, 2022, 44(2): 276-282. DOI: 10.3969/j.issn.1007-130X.2022.02.012. [50]朱张莉, 饶元, 吴渊, 等. 注意力机制在深度学习中的研究进展[J]. 中文信息学报, 2019, 33(6): 1-11. DOI: 10.3969/j.issn.1003-0077.2019.06.001. [51]WANG N, ZHOU W G, WANG J, et al. Transformer meets tracker: exploiting temporal context for robust visual tracking[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 1571-1580. DOI: 10.1109/CVPR46437.2021.00162. [52]ZHAO M J, OKADA K, INABA M.TrTr: visual tracking with transformer[EB/OL]. (2021-05-09)[2022-02-25]. https://arxiv.org/abs/2105.03817v1. DOI: 10.48550/arXiv.2105.03817. [53]BLATTER P, KANAKIS M, DANELLJAN M, et al. Efficient visual tracking with exemplartransformers[EB/OL]. (2021-12-17)[2022-02-25]. https://arxiv.org/abs/2112.09686v1. DOI: 10.48550/arXiv.2112.09686. [54]LI Z Y, TAO R, GAVVES E, et al. Tracking by natural language specification[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 7350-7358. DOI: 10.1109/CVPR.2017.777. [55]WANG X, LI C L, YANG R, et al. Describe and attend to track: learning natural language guided structural representation and visual attention for objecttracking[EB/OL]. (2018-11-27)[2022-02-25]. https://arxiv.org/abs/1811. 10014. DOI: 10.48550/arXiv.1811.10014. [56]FENG Q, ABLAVSKY V, BAI Q X, et al. Robust visual object tracking with natural language region proposalnetwork[EB/OL]. (2019-12-04)[2022-02-25]. https://arxiv.org/abs/1912.02048v1. DOI: 10.48550/arXiv.1912.02048. [57]WANG X, SHU X J, ZHANG Z P, et al. Towards more flexible and accurate object tracking with natural language: algorithms and benchmark[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 13758-13768. DOI: 10.1109/CVPR46437.2021.01355. [58]LUKEZIC A, ZAJC L C, VOJIR T, et al. Performance evaluation methodology for long-term single-object tracking[J]. IEEE Transactions on Cybernetics, 2021, 51(12): 6305-6318. DOI: 10.1109/TCYB.2020.2980618. [59]JUNG I, SON J, BAEK M, et al. Real-timeMDNet[C]// Computer Vision-ECCV 2018: LNCS Volume 11208. Cham: Springer, 2018: 89-104. DOI: 10.1007/978-3-030-01225-0_6. [60]ZHANG Z P, PENG H W. Deeper and wider Siamese networks for real-time visual tracking[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2019: 4586-4595. DOI: 10.1109/CVPR.2019.00472. [61]ZHANG Y H, WANG D, WANG L J, et al. Learning regression and verification networks for long-term visual tracking [EB/OL]. (2018-11-19)[2022-02-25]. https://arxiv.org/abs/1809.04320. DOI: 10.48550/arXiv.1809.04320. [62]YAN B, ZHAO H J, WANG D, et al. ‘Skimming-perusal’ tracking: a framework for real-time and robust long-term tracking[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2019: 2385-2393. DOI: 10.1109/ICCV.2019.00247. [63]HUANG L H, ZHAO X, HUANG K Q.GlobalTrack: a simple and strong baseline for long-term tracking[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11037-11044. DOI: 10.1609/aaai.v34i07.6758. [64]ZHANG Z K, ZHONG B N, ZHANG S P, et al. Distractor-aware fast tracking via dynamic convolutions and MOT philosophy[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 1024-1033. DOI: 10.1109/CVPR46437.2021.00108. [65]DU X Z, LIN T Y, JIN P C, et al.SpineNet: learning scale-permuted backbone for recognition and localization[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 11589-11598. DOI: 10.1109/CVPR42600.2020.01161. [66]GHIASI G, LIN T Y, LE Q V. NAS-FPN: learning scalable feature pyramid architecture for object detection[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2019: 7029-7038. DOI: 10.1109/CVPR.2019.00720. [67]XU H, YAO L W, ZHANG W, et al. Auto-FPN: automatic network architecture adaptation for object detection beyond classification[C]// 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2019: 6648-6657. DOI: 10.1109/ICCV.2019.00675. |
[1] | 杜锦丰, 王海荣, 梁焕, 王栋. 基于表示学习的跨模态检索方法研究进展[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 1-12. |
[2] | 晁睿, 张坤丽, 王佳佳, 胡斌, 张维聪, 韩英杰, 昝红英. 中文多模态知识库构建[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 31-39. |
[3] | 马新娜, 赵猛, 祁琳. 基于卷积脉冲神经网络的故障诊断方法研究[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 112-120. |
[4] | 薛其威, 伍锡如. 基于多模态特征融合的无人驾驶系统车辆检测[J]. 广西师范大学学报(自然科学版), 2022, 40(2): 37-48. |
[5] | 张灿龙, 李燕茹, 李志欣, 王智文. 基于核相关滤波与特征融合的分块跟踪算法[J]. 广西师范大学学报(自然科学版), 2020, 38(5): 12-23. |
[6] | 马先兵, 孙水发, 覃音诗, 郭青, 夏平. 基于粒子滤波的on-line boosting目标跟踪算法[J]. 广西师范大学学报(自然科学版), 2013, 31(3): 100-105. |
|
版权所有 © 广西师范大学学报(自然科学版)编辑部 地址:广西桂林市三里店育才路15号 邮编:541004 电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn 本系统由北京玛格泰克科技发展有限公司设计开发 |