Journal of Guangxi Normal University(Natural Science Edition) ›› 2025, Vol. 43 ›› Issue (6): 69-79.doi: 10.16088/j.issn.1001-6600.2024121902
• Intelligence Information Processing • Previous Articles Next Articles
XIE Sheng1, MA Haifei1, ZHANG Canlong1,2*, WANG Zhiwen3, WEI Chunrong4
| [1] 李小宝. 无监督行人重识别方法研究[D]. 北京: 北京交通大学, 2023. DOI: 10.26944/d.cnki.gbfju.2023.003631. [2] 罗浩, 姜伟, 范星, 等. 基于深度学习的行人重识别研究进展[J]. 自动化学报, 2019, 45(11): 2032-2049. DOI: 10.16383/j.aas.c180154. [3] LI S, XIAO T, LI H S, et al. Person search with natural language description[C] //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 5187-5196. DOI: 10.1109/CVPR.2017.551. [4] 陈浩彬. 基于深度学习的跨域跨模态复杂场景下的行人重识别[D]. 深圳: 中国科学院大学(中国科学院深圳先进技术研究院), 2022. DOI: 10.27822/d.cnki.gszxj.2022.000164. [5] ZHANG Y, LU H C. Deep cross-modal projection learning for image-text matching[C] //Computer Vision-ECCV 2018: LNCS Volume 11205. Cham: Springer Nature Switzerland AG, 2018: 707-723. DOI: 10.1007/978-3-030-01246-5_42. [6] SARAFIANOS N, XU X, KAKADIARIS I. Adversarial representation learning for text-to-image matching[C] //2019 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2019: 5813-5823. DOI: 10.1109/iccv.2019.00591. [7] CHENG Y, WANG H Y, LIU X K. Pose-guided neural network with hybrid representation for person re-identification[C] //2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP). Piscataway,NJ: IEEE, 2019: 1-6. DOI: 10.1109/icsidp47821.2019.9173449. [8] 罗浩. 基于深度学习的行人重识别算法研究: 从无遮挡到遮挡[D]. 杭州: 浙江大学, 2020. DOI: 10.27461/d.cnki.gzjdx.2020.001378. [9] 匡澄, 陈莹. 基于多粒度特征融合网络的行人重识别[J]. 电子学报, 2021, 49(8): 1541-1550. DOI: 10.12263/DZXB.20200974. [10] NIU K, HUANG Y, OUYANG W L, et al. Improving description-based person re-identification by multi-granularity image-text alignments[J]. IEEE Transactions on Image Processing, 2020, 29: 5542-5556. DOI: 10.1109/TIP.2020.2984883. [11] WANG C J, LUO Z M, LIN Y J, et al. Text-based person searchvia multi-granularity embedding learning[C] //Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21). Montreal: IJCAI, 2021: 1068-1074. DOI: 10.24963/ijcai.2021/148. [12] WANG Z, FANG Z Y, WANG J, et al. ViTAA: visual-textual attributes alignment in person search by natural language[C] //Computer Vision-ECCV 2020: LNCS Volume 12357. Cham: Springer Nature Switzerland AG, 2020: 402-420. DOI: 10.1007/978-3-030-58610-2_24. [13] GAO C Y, CAI G Y, JIANG X Y, et al. Contextual non-local alignment over full-scale representation for text-based person search[EB/OL].(2021-01-08)[2024-12-19]. https://arxiv.org/abs/2101.03036. DOI: 10.48550/arXiv.2101.03036. [14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C] //Advances in Neural Information Processing Systems 30 (NIPS 2017). Red Hook, NY: Curran Associates Inc., 2017: 6000-6010. [15] LIU Q, HE X H, TENG Q Z, et al.BDNet: a BERT-based dual-path network for text-to-image cross-modal person re-identification[J]. Pattern Recognition, 2023, 141: 109636. DOI: 10.1016/j.patcog.2023.109636. [16] KE X, LIU H, XU P R, et al. Text-based person search via cross-modal alignment learning[J]. Pattern Recognition, 2024, 152: 110481. DOI: 10.1016/j.patcog.2024.110481. [17] WANG Z J, ZHU A C, XUE J Y, et al. CAIBC: capturing all-round information beyond color for text-based person retrieval[C] //MM’22: Proceedings of the 30th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2022: 5314-5322. DOI: 10.1145/3503161.3548057. [18] 吕敬钦. 视频行人检测及跟踪的关键技术研究[D]. 上海: 上海交通大学, 2013. [19] 郑可成. 面向开放场景的行人重识别关键技术研究[D]. 合肥: 中国科学技术大学, 2022. DOI: 10.27517/d.cnki.gzkju.2022.000618. [20] 刘志刚, 黄朝, 谢东军, 等. 抑制背景干扰的行人重识别方法[J]. 计算机辅助设计与图形学学报, 2022, 34(4): 563-569. DOI: 10.3724/SP.J.1089.2022.18927. [21] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C] //Proceedings of the 38th International Conference on Machine Learning: PMLR 139. Cambridge, MA: JMLR, 2021: 8748-8763. [22] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[EB/OL]. (2021-06-03)[2024-12-19]. https://arxiv.org/abs/2010.11929. DOI: 10.48550/arXiv.2010.11929. [23] OQUAB M, DARCET T, MOUTAKANNI T, et al. DINOv2: learning robust visual features without supervision[EB/OL].(2024-02-02)[2024-12-19]. https://arxiv.org/abs/2304.07193. DOI: 10.48550/arXiv.2304.07193. [24] 杨静, 张灿龙, 李志欣, 等. 集成空间注意力和姿态估计的遮挡行人再辨识[J]. 计算机研究与发展, 2022, 59(7): 1522-1532. DOI: 10.7544/issn1000-1239.20200949. [25] LI F, ZHANG H, SUN P Z, et al. Semantic-SAM: segment and recognize anything at any granularity[EB/OL].(2023-07-10)[2024-12-19]. https://arxiv.org/abs/2307.04767. DOI: 10.48550/arXiv.2307.04767. [26] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C] //Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423. [27] DING Z F, DING C X, SHAO Z Y, et al. Semantically self-aligned network for text-to-image part-aware person re-identification[EB/OL].(2021-08-09)[2024-12-19]. https://arxiv.org/abs/2107.12666. DOI: 10.48550/arXiv.2107.12666. [28] ZHU A C, WANG Z J, LI Y F, et al. DSSL: deep surroundings-person separation learning for text-based person retrieval[C] //MM’21: Proceedings of the 29th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2021: 209-217. DOI: 10.1145/3474085.3475369. [29] CHEN C Q, YE M, JIANG D. Towards modality-agnostic person re-identification with descriptive query[C] //2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2023: 15128-15137. DOI: 10.1109/CVPR52729.2023.01452. [30] WANG Z J, ZHU A C, XUE J Y, et al. SUM: serialized updating and matching for text-based person retrieval[J]. Knowledge-Based Systems, 2022, 248: 108891. DOI: 10.1016/j.knosys.2022.108891. [31] SHAO Z Y, ZHANG X Y, DING C X, et al. Unified pre-training with pseudo texts for text-to-image person re-identification[C] //2023 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2023: 11140-11150. DOI: 10.1109/ICCV51070.2023.01026. [32] YOO J, AHN N, SOHN K A. Rethinking data augmentation for image super-resolution: a comprehensive analysis and a new strategy[C] //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 8372-8381. DOI: 10.1109/CVPR42600.2020.00840. [33] WEI Y X, GU S H, LI Y W, et al. Unsupervised real-world image super resolution via domain-distance aware training [C] //2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2021: 13380-13389. DOI: 10.1109/CVPR46437.2021.01318. [34] CHEN Z, ZHANG Y L, GU JJ, et al. Dual aggregation transformer for image super-resolution[C] //2023 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2023: 12278-12287. DOI: 10.1109/ICCV51070.2023.01131. |
| [1] | LUO Zengli, ZHANG Canlong, LI Zhixin, WANG Zhiwen, WEI Chunrong. Cross-modal Semantic Collaborative Learning for Text-based Person Re-identification [J]. Journal of Guangxi Normal University(Natural Science Edition), 2025, 43(5): 145-157. |
| [2] | WANG Xuyang, WANG Changrui, ZHANG Jinfeng, XING Mengyi. Multimodal Sentiment Analysis Based on Cross-Modal Cross-Attention Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2024, 42(2): 84-93. |
| [3] | DU Jinfeng, WANG Hairong, LIANG Huan, WANG Dong. Progress of Cross-modal Retrieval Methods Based on Representation Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 1-12. |
|