Journal of Guangxi Normal University(Natural Science Edition) ›› 2022, Vol. 40 ›› Issue (5): 418-432.doi: 10.16088/j.issn.1001-6600.2022013101
Previous Articles Next Articles
LI Zhixin*, SU Qiang
CLC Number:
[1]李志欣, 魏海洋, 张灿龙, 等. 图像描述生成研究进展[J]. 计算机研究与发展, 2021, 58(9): 1951-1974. DOI:10.7544/issn1000-1239.2021.20200281. [2]石义乐, 杨文忠, 杜慧祥, 等. 基于深度学习的图像描述综述[J]. 电子学报, 2021, 49(10): 2048-2060. DOI:10.12263/DZXB.20200669. [3]许昊, 张凯, 田英杰, 等. 深度神经网络图像描述综述[J]. 计算机工程与应用, 2021, 57(9): 9-22. DOI:10.3778/j.issn.1002-8331.2012-0539. [4]李志欣, 魏海洋, 黄飞成, 等. 结合视觉特征和场景语义的图像描述生成[J]. 计算机学报, 2020, 43(9): 1624-1640. DOI:10.11897/SP.J.1016.2020.01624. [5]HOSSAIN Z, SOHEL F, SHIRATUDDIN M F, et al. A comprehensive survey of deep learning for image captioning[J]. ACM Computing Surveys, 2019, 51(6): 118. DOI:10.1145/3295748. [6]VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE, 2015: 3156-3164. DOI:10.1109/CVPR.2015.7298935. [7]JIA X, GAVVES E, FERNANDO B, et al. Guiding the long-short term memory model for image caption generation[C]// 2015 IEEE International Conference on Computer Vision (ICCV). Piscataway, NJ: IEEE, 2015: 2407-2415. DOI:10.1109/ICCV.2015.277. [8]YANG Z L, YUAN Y, WU Y X, et al. Encode, review, and decode: reviewer module for caption generation[EB/OL]. (2016-06-07)[2022-01-31]. https://arxiv.org/abs/1605.07912v3. DOI:10.48550/arXiv.1605.07912. [9]MATHEWS A, XIE L X, HE X M. SentiCap: generating image descriptions with sentiments[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2016, 30(1): 3574-3580. DOI:10.1609/aaai.v30i1.10475. [10]RAMANISHKA V, DAS A, ZHANG J M, et al. Top-down visual saliency guided by captions[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 3135-3144. DOI:10.1109/CVPR.2017.334. [11]XU K, BA J L, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[J]. Proceedings of Machine Learning Research. 2015, 37: 2048-2057. [12]YOU Q Z, JIN H L, WANG Z W, et al. Image captioning with semantic attention[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2016: 4651-4659. DOI:10.1109/CVPR.2016.503. [13]LIU C X, MAO J H, SHA F, et al. Attention correctness in neural image captioning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 4176-4182. DOI:10.1609/aaai.v31i1.11197. [14]LU J S, XIONG C M, PARIKH D, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 3242-3250. DOI:10.1109/CVPR.2017.345. [15]韦人予, 蒙祖强. 基于注意力特征自适应校正的图像描述模型[J]. 计算机应用, 2020, 40(增刊1): 45-50. [16]张家硕, 洪宇, 李志峰, 等. 基于双向注意力机制的图像描述生成[J]. 中文信息学报, 2020, 34(9): 53-61. DOI:10.3969/j.issn.1003-0077.2020.09.008. [17]李文惠, 曾上游, 王金金. 基于改进注意力机制的图像描述生成算法[J]. 计算机应用, 2021, 41(5): 1262-1267. DOI:10.11772/j.issn.1001-9081.2020071078. [18]LI L H, TANG S, DENG L X, et al. Image caption with global-local attention[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 4133-4139. DOI:10.1609/aaai.v31i1.11236. [19]ANDERSON P, HE X D, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2018: 6077-6086. DOI:10.1109/CVPR.2018.00636. [20]REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// Advances in Neural Information Processing Systems 28 (NIPS 2015). Red Hook, NY: Curran Associates, Inc., 2015: 91-99. [21]盛豪, 易尧华, 汤梓伟. 融合图像场景与目标显著性特征的图像描述生成方法[J]. 计算机应用研究, 2021, 38(12): 3776-3780. DOI:10.19734/j.issn.1001-3695.2021.02.0124. [22]HENDRICKS L A, VENUGOPALAN S, ROHRBACH M, et al. Deep compositional captioning: describing novel object categories without paired training data[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2016: 1-10. DOI:10.1109/CVPR.2016.8. [23]YAO T, PAN Y W, LI Y H, et al. Incorporating copying mechanism in image captioning for learning novel objects[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 5263-5271. DOI:10.1109/CVPR.2017.559. [24]LI Y H, YAO T, PAN Y W, et al. Pointing novel objects in image captioning[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2019: 12489-12498. DOI:10.1109/CVPR.2019.01278. [25]ZHOU Y M, SUN Y W, HONAVAR V. Improving image captioning by leveraging knowledge graphs[C]// 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). Los Alamitos, CA: IEEE Computer Society, 2019: 283-293. DOI:10.1109/WACV.2019.00036. [26]陈开阳, 徐凡, 王明文. 基于知识图谱和图像描述的虚假新闻检测研究[J]. 江西师范大学学报(自然科学版), 2021, 45(4): 398-402. DOI:10.16357/j.cnki.issn1000-5862.2021.04.12. [27]RANZATO M, CHOPRA S, AULI M, et al. Sequence level training with recurrent neural networks[EB/OL]. (2016-05-06)[2022-01-31]. https://arxiv.org/abs/1511.06732v7. DOI:10.48550/arXiv.1511.06732. [28]REN Z, WANG X Y, ZHANG N, et al. Deep reinforcement learning-based image captioning with embedding reward[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 1151-1159. DOI:10.1109/CVPR.2017.128. [29]RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 1179-1195. DOI:10.1109/CVPR.2017.131. [30]QIN Y, DU J J, ZHANG Y H, et al. Look back and predict forward in image captioning[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2019: 8359-8367. DOI:10.1109/CVPR.2019.00856. [31]PARK C C, KIM B C, KIM G H. Attend to you: personalized image captioning with context sequence memory networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 6432-6440. DOI:10.1109/CVPR.2017.681. [32]KIM B, LEE Y H, JUNG H, et al. Distinctive-attribute extraction for image captioning[C]// Computer Vision-ECCV 2018 Workshops: LNCS Volume 11132. Cham: Springer, 2018: 133-144. DOI:10.1007/978-3-030-11018-5_12. [33]BAHDANAU D, CHO K H, BENGIOY. Neural machine translation by jointly learning to align and translate[EB/OL]. (2016-05-19)[2022-01-31]. https://arxiv.org/abs/1409.0473. DOI:10.48550/arXiv.1409.0473. [34]SPEER R, CHIN J, HAVASI C. Conceptnet 5.5: an open multilingual graph of general knowledge[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 4444-4451. DOI:10.1609/aaai.v31i1.11164. [35]BENGIO S, VINYALS O, JAITLY N, et al. Scheduled sampling for sequence prediction with recurrent neural networks[C]// Advances in Neural Information Processing Systems 28 (NIPS 2015). Red Hook, NY: Curran Associates, Inc., 2015: 1171-1179. [36]LIU S Q, ZHU Z H, YE N, et al. Optimization of image description metrics using policy gradient methods[EB/OL]. (2016-12-01)[2022-01-31]. https://arxiv.org/abs/1612.00370v1. DOI:10.48550/arXiv.1612.00370. [37]LIU S Q, ZHU Z H, YE N, et al. Improved image captioning via policy gradient optimization of SPIDEr[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2017: 873-881. DOI:10.1109/ICCV.2017.100. [38]JOHNSON J, KARPATHY A, LI F F. Densecap: fully convolutional localization networks for dense captioning[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2016: 4565-4574. DOI:10.1109/CVPR.2016.494. [39]PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2002: 311-318. DOI:10.3115/1073083.1073135. [40]BANERJEE S, LAVIE A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments[C]// Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Stroudsburg, PA: Association for Computational Linguistics, 2005: 65-72. [41]VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr: consensus-based image description evaluation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway NJ: IEEE, 2015: 4566-4575. DOI:10.1109/CVPR.2015.7299087. [42]LIN C Y. ROUGE: a package for automatic evaluation of summaries[C]// Text Summarization Branches Out. Stroudsburg, PA: Association for Computational Linguistics, 2004: 74-81. [43]PEDERSOLI M, LUCAS T, SCHMID C, et al. Areas of attention for image captioning[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Los Alamitos, CA: IEEE Computer Society, 2017: 1251-1259. DOI:10.1109/ICCV.2017.140. [44]CORNIA M, BARALDI L, SERRA G, et al. Paying more attention to saliency: image captioning with saliency and context attention[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2018, 14(2): 48. DOI:10.1145/3177745. [45]CHEN L, ZHANG H W, XIAO J, et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 6298-6306. DOI:10.1109/CVPR.2017.667. [46]FU K, JIN J Q, CUI R P, et al. Aligning where to see and what to tell: image caption with region-based attention and scene-specific contexts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2321-2334. DOI:10.1109/TPAMI.2016.2642953. |
[1] | WANG Yuhang, ZHANG Canlong, LI Zhixin, WANG Zhiwen. Image Captioning According to User’s Intention and Style [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(4): 91-103. |
[2] | CHEN Gaojian, WANG Jing, LI Qianwen, YUAN Yunjing, CAO Jiachen. Data-driven Method for Automatic Machine Learning Pipeline Generation [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 185-193. |
[3] | TANG Fengzhu, TANG Xin, LI Chunhai, LI Xiaohuan. Dynamic Task Allocation Method for UAVs Based on Deep Reinforcement Learning [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(6): 63-71. |
|