|
|
广西师范大学学报(自然科学版) ›› 2025, Vol. 43 ›› Issue (5): 145-157.doi: 10.16088/j.issn.1001-6600.2024112901
罗赠丽1, 张灿龙1,2*, 李志欣1,2, 王智文3, 韦春荣4
LUO Zengli1, ZHANG Canlong1,2*, LI Zhixin1,2, WANG Zhiwen3, WEI Chunrong4
摘要: 现有的基于文本的行人重识别方法主要受限于特征对齐和语义歧义问题。针对该问题,本文提出一种跨模态语义协作的行人重识别方法(CMSC),通过学习图像与文本的共性语义信息,构建局部视觉与文本的对应约束关系,提升图像与文本的匹配效率。首先,引入文本语义聚类模块,自动提取与局部视觉语义相关的文本信息,并通过图像自监督学习增强局部特征的语义表达;然后,构建共性语义协作模块,捕捉图像与描述的差异和共性,在嵌入空间中建立语义一致性的映射关系;最后,引入语义约束推理模块,通过图像与文本的语义一致性得分进行检索,从而提高效率。在3个基准数据集上的实验表明,本文方法能有效提升模型的性能,在Rank-1指标上较现有方法分别提升0.75、1.43和0.88个百分点,精度分别提升0.64、2.56及3.96个百分点。
中图分类号: TP391.1
| [1] JING Y, WANG W, WANG L, et al. Cross-modal cross-domain moment alignment network for person search[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2020: 10675-10683. DOI: 10.1109/CVPR42600.2020.01069. [2] WANG Z, FANG Z Y, WANG J, et al.ViTAA: visual-textual attributes alignment in person search by natural language[C]// Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 402-420. DOI: 10.1007/978-3-030-58610-2_24. [3] 肖逸群, 宋树祥, 夏海英. 基于多特征的快速行人检测方法及实现[J]. 广西师范大学学报(自然科学版), 2019, 37(4): 61-67. DOI: 10.16088/j.issn.1001-6600.2019.04.007. [4] 周东明, 张灿龙, 唐艳平, 等. 联合语义分割与注意力机制的行人再识别模型[J]. 计算机工程, 2022, 48(2): 201-206. DOI: 10.19678/j.issn.1000-3428.0060416. [5] WU Y S, YAN ZZ, HAN X G, et al. LapsCore: language-guided person search via color reasoning[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2021: 1604-1613. DOI: 10.1109/ICCV48922.2021.00165. [6] GAO C Y, CAI G Y, JIANG X Y, et al. Contextual non-local alignment over full-scale representation for text-based person search[EB/OL].(2021-01-08)[2024-11-29]. https://arxiv.org/abs/2101.03036. DOI: 10.48550/arXiv.2101.03036. [7] 李大伟, 曾智勇. 基于动态双注意力机制的跨模态行人重识别模型[J]. 计算机应用, 2022, 42(10): 3200-3208. DOI: 10.11772/j.issn.1001-9081.2021081510. [8] 邓淑雅, 李浩源. 基于注意力特征融合的跨模态行人重识别[J]. 计算机系统应用, 2024, 33(9): 269-275. DOI: 10.15888/j.cnki.csa.009604. [9] YANG X, WANG X Q, WANG N N, et al. Address the unseen relationships: attribute correlations in text attribute person search[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(11): 16916-16926. DOI: 10.1109/TNNLS.2023.3300582. [10] 张雯欣, 刘玉杰, 王兆勇, 等. 基于原型分散网络的端到端行人搜索方法[J]. 计算机工程, 2025, 51(1): 269-276. DOI: 10.19678/j.issn.1000-3428.0068462. [11] ZHENG Y W, ZHAO X P, LAN C L, et al. CPCL: cross-modal prototypical contrastive learning for weakly supervised text-based person re-identification[EB/OL].(2024-01-18)[2024-11-29]. https://arxiv.org/abs/2401.10011. DOI: 10.48550/arXiv.2401.10011. [12] SHAO Z Y, ZHANG X Y, DING C X, et al. Unified pre-training with pseudo texts for text-to-image person re-identification[C]// 2023 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2023: 11140-11150. DOI: 10.1109/ICCV51070.2023.01026. [13] YANG S Y, ZHOU Y N, ZHENG Z D, et al. Towards unified text-based person retrieval: a large-scale multi-attribute and language search benchmark[C]// MM'23: Proceedings of the 31st ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2023: 4492-4501. DOI: 10.1145/3581783.3611709. [14] 郭玉彬, 文向, 刘攀, 等. 基于双流结构的跨模态行人重识别关系网络[J]. 计算机应用, 2023, 43(6): 1803-1810. DOI: 10.11772/j.issn.1001-9081.2022050665. [15] 贾军营, 杨芯茹, 杨海波, 等. 改进CLIP-ReID的跨模态行人重识别[J]. 计算机系统应用, 2025, 34(1): 153-160. DOI: 10.15888/j.cnki.csa.009741. [16] 何嘉明, 杨巨成, 吴超, 等. 基于多模态图卷积神经网络的行人重识别方法[J]. 计算机应用, 2023, 43(7): 2182-2189. DOI: 10.11772/j.issn.1001-9081.2022060827. [17] 李灏, 唐敏, 林建武, 等. 基于改进困难三元组损失的跨模态行人重识别框架[J]. 计算机科学, 2020, 47(10): 180-186. DOI:10.11896/jsjkx.191100061. [18] 姜定, 叶茫. 面向跨模态文本到图像行人重识别的Transformer网络[J]. 中国图象图形学报, 2023, 28(5): 1384-1395. [19] LI S, XIAO T, LI H S, et al. Person search with natural language description[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 5187-5196. DOI: 10.1109/CVPR.2017.551. [20] ZHU A C, WANG Z J, LI Y F, et al. DSSL: deep surroundings-person separation learning for text-based person retrieval[C]// MM'21: Proceedings of the 29th ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2021: 209-217. DOI: 10.1145/3474085.3475369. [21] DING Z F, DING C X, SHAO Z Y, et al. Semantically self-aligned network for text-to-image part-aware person re-identification[EB/OL].(2021-08-09)[2024-11-29]. https://arxiv.org/abs/2107.12666. DOI: 10.48550/arXiv.2107.12666. [22] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019: 4171-4186. DOI: 10.18653/v1/N19-1423. [23] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]// Proceedings of the 38th International Conference on Machine Learning: PMLR 139. Cambridge MA: JMLR, 2021: 8748-8763. [24] YAN S L, DONG N, ZHANG L Y, et al. CLIP-driven fine-grained text-image person re-identification[J]. IEEE Transactions on Image Processing, 2023, 32: 6032-6046. DOI: 10.1109/TIP.2023.3327924. [25] JIANG D, YE M. Cross-modal implicit relation reasoning and aligning for text-to-image person retrieval[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2023: 2787-2797. DOI: 10.1109/CVPR52729.2023.00273. [26] ZHOU J F, HUANG B G, FAN W J, et al. Text-based personsearch via local-relational-global fine grained alignment[J]. Knowledge-Based Systems, 2023, 262: 110253. DOI: 10.1016/j.knosys.2023.110253. [27] CAO M, BAI Y, ZENG Z Y, et al. An empirical study of CLIP for text-based person search[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(1): 465-473. DOI: 10.1609/aaai.v38i1.27801. [28] LI Z, SI L J, GUO C L, et al. Data augmentation for text-based person retrieval using large language models[EB/OL].(2024-05-20)[2024-11-29]. https://arxiv.org/abs/2405.11971. DOI: 10.48550/arXiv.2405.11971. [29] BAO L P, WEI L H, ZHOU W G, et al.Multi-granularity matching transformer for text-based person search[J]. IEEE Transactions on Multimedia, 2023, 26: 4281-4293. DOI: 10.1109/TMM.2023.3321504. [30] LI S S, XU X, SHEN F M, et al. Multi-granularity separation network for text-based person retrieval with bidirectional refinement regularization[C]// ICMR'23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. New York, NY: Association for Computing Machinery, 2023: 307-315. DOI: 10.1145/3591106.3592253. [31] GAO L Y, NIU K, JIAO B L, et al. Addressing information inequality for text-based person search via pedestrian-centric visual denoising and bias-aware alignments[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(12): 7884-7899. DOI: 10.1109/TCSVT.2023.3273719. [32] MA Y W, SUN X S, JI J Y, et al. Beat: bi-directional one-to-many embedding alignment for text-based person retrieval[C]// MM'23: Proceedings of the 31st ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2023: 4157-4168. DOI: 10.1145/3581783.3611768. [33] CHEN Y C, LI L J, YU L C, et al. UNITER: universal image-text representation learning[C]// Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 104-120. DOI: 10.1007/978-3-030-58577-8_7. [34] LI J N, SELVARAJU R, GOTMARE A, et al. Align before fuse: vision and language representation learning with momentum distillation[C]// Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Red Hook, NY: Curran Associates Inc., 2021: 9694-9705. [35] ZANG X H, GAO W, LI G, et al. A baseline investigation: transformer-based cross-view baseline for text-based person search[C]// MM'23: Proceedings of the 31st ACM International Conference on Multimedia. New York, NY: Association for Computing Machinery, 2023: 7737-7746. DOI: 10.1145/3581783.3611916. [36] YAN S L, TANG H, ZHANG L Y, et al. Image-specific information suppression and implicit local alignment for text-based person search[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(12): 17973-17986. DOI: 10.1109/TNNLS.2023.3310118. [37] CHEN W H, XU X Z, JIA J, et al. Beyond appearance: a semantic controllable self-supervised learning framework for human-centric visual tasks[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2023: 15050-15061. DOI: 10.1109/CVPR52729.2023.01445. [38] LIU H T, LI C Y, WU Q Y, et al. Visual instruction tuning[C]// Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Red Hook, NY: Curran Associates Inc., 2023: 1516. [39] HU E J, SHEN Y L, WALLIS P, et al. LoRA: low-rank adaptation of large language models[EB/OL]. (2021-10-16)[2024-11-29]. https://arxiv.org/abs/2106.09685. DOI: 10.48550/arXiv.2106.09685. [40] DEB K, GUPTA H. Searching for robust Pareto-optimal solutions in multi-objective optimization[C]// Evolutionary Multi-Criterion Optimization. Berlin: Springer, 2005: 150-164. DOI: 10.1007/978-3-540-31880-4_11. [41] CARON M, TOUVRON H, MISRA I, et al. Emerging properties in self-supervised vision transformers[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2021: 9630-9640. DOI: 10.1109/ICCV48922.2021.00951. [42] LI J N, LI D X, XIONG C M, et al. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation[C]// Proceedings of the 39th International Conference on Machine Learning: PMLR 162. Cambridge MA: JMLR, 2022: 12888-12900. [43] HUANG L, YU W J, MA W T, et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions[J]. ACM Transactions on Information Systems, 2025, 43(2):42. DOI: 10.1145/3703155. |
| [1] | 孙旭, 沈彬, 严馨, 张金鹏, 徐广义. 基于Transformer和TextRank的微博观点摘要方法[J]. 广西师范大学学报(自然科学版), 2023, 41(4): 96-108. |
| [2] | 杜锦丰, 王海荣, 梁焕, 王栋. 基于表示学习的跨模态检索方法研究进展[J]. 广西师范大学学报(自然科学版), 2022, 40(3): 1-12. |
|
|
版权所有 © 广西师范大学学报(自然科学版)编辑部 地址:广西桂林市三里店育才路15号 邮编:541004 电话:0773-5857325 E-mail: gxsdzkb@mailbox.gxnu.edu.cn 本系统由北京玛格泰克科技发展有限公司设计开发 |