Journal of Guangxi Normal University(Natural Science Edition) ›› 2022, Vol. 40 ›› Issue (4): 91-103.doi: 10.16088/j.issn.1001-6600.2021101803
Previous Articles Next Articles
WANG Yuhang1, ZHANG Canlong1*, LI Zhixin1, WANG Zhiwen2
CLC Number:
[1] 李志欣, 魏海洋, 黄飞成, 等. 结合视觉特征和场景语义的图像描述生成[J]. 计算机学报, 2020, 43(9): 1624-1640. DOI: 10.11897/SP.J.1016.2020.01624. [2]周东明, 张灿龙, 李志欣, 等. 基于多层级视觉融合的图像描述模型[J]. 电子学报, 2021, 49(7): 1286-1290. DOI: 10.12263/DZXB.20191296. [3]魏忠钰, 范智昊, 王瑞泽, 等. 从视觉到文本:图像描述生成的研究进展综述[J]. 中文信息学报, 2020, 34(7): 19-29. DOI: 10.3969/j.issn.1003-0077.2020.07.002. [4]HARSH A, KARAN D, CHEN X, et al. nocaps: novel object captioning at scale[C]// Proceedings of the IEEE International Conference on Computer Vision. Los Alamitos, CA: IEEE Computer Society, 2019: 8947-8956. DOI: 10.1109/ICCV.2019.00904. [5]CORNIA M, BARALDI L, CUCCHIARA R. Show, control and tell: a framework for generating controllable and grounded captions[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Los Alamitos, CA: IEEE Computer Society, 2019: 8299-8308. DOI: 10.1109/CVPR.2019.00850. [6]王俊豪, 罗轶凤. 通过细粒度的语义特征与Transformer丰富图像描述[J]. 华东师范大学学报(自然科学版), 2020(5): 56-67. DOI: 10.3969/j.issn.1000-5641.202091004. [7]ENGILBERGE M, CHEVALLIER L, PREZ P, et al. Finding beans in burgers: Deep semantic-visual embedding with localization[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2018: 3984-3993. DOI: 10.1109/CVPR.2018.00419. [8]GU J, CAI J, JOTY S, et al. Look, imagine and match: improving textual-visual cross-modal retrieval with generative models[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2018: 7181-7189. DOI: 10.1109/CVPR.2018.00750. [9]张家硕, 洪宇, 李志峰, 等. 基于双向注意力机制的图像描述生成[J]. 中文信息学报, 2020, 34(9): 53-61. DOI: 10.3969/j.issn.1003-0077.2020.09.008. [10]马书磊, 张国宾, 焦阳,等. 一种改进的全局注意机制图像描述方法[J]. 西安电子科技大学学报, 2019, 46(2): 17-22. DOI: 10.19665/j.issn1001-2400.2019.02.004. [11]马坤阳, 林金朝, 庞宇. 结合引导解码和视觉注意力的图像语义描述模型[J]. 计算机应用研究, 2020, 37(11): 3504-3506, 3515. DOI: 10.19734/j.issn.1001-3695.2019.06.0243. [12]黄友文, 游亚东, 赵朋. 融合卷积注意力机制的图像描述生成模型[J]. 计算机应用, 2020, 40(1): 23-27. DOI: 10.11772/j.issn.1001-9081.2019050943. [13]石义乐, 杨文忠, 杜慧祥, 等. 基于深度学习的图像描述综述[J]. 电子学报, 2021, 49(10): 2048-2060. DOI: 10. 12263/DZXB.20200669. [14]邓珍荣, 张宝军, 蒋周琴, 等. 融合word2vec和注意力机制的图像描述模型[J]. 计算机科学, 2019, 46(4): 268-273. DOI: 10.11896/j.issn.1002-137x.2019.04.042. [15]黄远, 白琮, 李宏凯, 等. 基于条件生成对抗网络的图像描述生成方法[J]. 计算机辅助设计与图形学学报, 2020, 32(6): 911-918. DOI: 10.3724/SP.J.1089.2020.18003. [16]KARPATHY A, FEI-FEI L. Deep visual-semantic alignments for generating image descriptions[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2015: 3128-3137. DOI: 10.1109/CVPR.2015.7298932. [17]VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell:lessons learned from the 2015 mscoco image captioning challenge[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(4): 652-663. DOI: 10.1109/ TPAMI.2016.2587640. [18]WU Q, SHEN C, LIU L, et al. What value do explicit high level concepts have in vision to language problems?[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2016: 203-212. DOI: 10.1109/CVPR.2016.29. [19]XU K V, BA J L, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]// Proceedings of the 32nd International Conference on Machine Learning: Volume 37. Lile: PMLR, 2015: 2048-2057. [20]LU J, XIONG C, PARIKH D, et al. Knowing when to look:adaptive attention via a visual sentinel for image captioning[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2017: 375-383. DOI: 10.1109/CVPR.2017.345. [21]CHEN L, ZHANG H W, XIAO J, et al. SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2017: 5659-5667. DOI: 10.1109/CVPR.2017.667. [22]RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2017: 7008-7024. DOI: 10.1109/CVPR.2017.131. [23]ANDERSON P, HE X, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// 2018 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society,2018: 6077-6086. DOI: 10.1109/CVPR.2018.00636. [24]WANG Y S, LIU C X, ZENG X H, et al. Scene graph parsing as dependency parsing[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA: Association for Computational Linguistics, 2018: 397-407. DOI: 10.18653/v1/N18-1037. [25]ZELLERS R, YATSKAR M, THOMSON S, et al. Neural motifs:scene graph parsing with global context[C]// 2018 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2018: 5831-5840. DOI: 10.1109/CVPR.2018.00611. [26]YANG X, TANG K H, ZHANG H W, et al. Auto-encoding scene graphs for image captioning[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2019: 10677-10686. DOI: 10.1109/CVPR.2019.01094. [27]PARK C C, KIM B C, KIM G H. Attend to you: personalized image captioning with context sequence memory networks[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society, 2017: 895-903. DOI: 10.1109/CVPR.2017.681. [28]GAN C, GAN Z, HE X, et al. Stylenet: Generating attractive visual captions with styles[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2017: 3137-3146. DOI: 10.1109/CVPR.2017.108. [29]SHUSTER K, HUMEAU S, HU H, et al. Engaging image captioning via personality[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2019: 12516-12526. DOI: 10.1109/CVPR.2019.01280. [30]CHEN S, JIN Q, WANG P, et al. Say as you wish:fine-grained control of image caption generation with abstract scene graphs[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2020: 9962-9971. DOI: 10.1109/CVPR42600.2020.00998. [31]LIU B, FU J, KATO M P, et al. Beyond narrative description: generating poetry from images by multi-adversarial training[C]// Proceedings of the 26th ACM international conference on Multimedia. New York, NY: Association for Computing Machinery, 2018: 783-791. DOI: 10.1145/3240508.3240587. [32]VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr: consensus-based image description evaluation[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2015: 4566-4575. DOI: 10.1109/CVPR.2015.7299087. [33]ANDERSON P, FERNANDO B, JOHNSON M, et al. SPICE:semantic propositional image caption evaluation[C]// Computer Vision– ECCV 2016: Lecture Notes in Computer Science 9909. Cham: Springer International Publishing AG, 2016: 382-398. DOI: 10.1007/978-3-319-46454-1_24. [34]LIN C Y. ROUGE: a package for automatic evaluation of summaries[C]// Text Summarization Branches Out. Stroudsburg, PA: Association for Computational Linguistics, 2004: 74-81. [35]WANG Q, CHAN A B. Describing like humans: on diversity in image captioning[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2019: 4195-4203. DOI: 10.1109/CVPR.2019.00432. [36]CORNIA M, STEFANINI M, BARALDI L, et al. Meshed-memory transformer for image captioning[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Los Alamitos, CA: IEEE Computer Society, 2020: 10578-10587. DOI: 10.1109/CVPR42600.2020.01059. |
[1] | LI Zhengguang, CHEN Heng, LIN Hongfei. Identification of Adverse Drug Reaction on Social Media Using Bi-directional Language Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 40-48. |
[2] | WAN Liming, ZHANG Xiaoqian, LIU Zhigui, SONG Lin, ZHOU Ying, LI Li. CT Image Segmentation of UNet Pulmonary Nodules Based on Efficient Channel Attention [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 66-75. |
[3] | ZHANG Ping, XU Qiaozhi. Segmentation of Lung Nodules Based on Multi-receptive Field and Grouping Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 76-87. |
[4] | WU Jun, OUYANG Aijia, ZHANG Lin. Phosphorylation Site Prediction Model Based on Multi-head Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 161-171. |
[5] | LI Weiyong, LIU Bin, ZHANG Wei, CHEN Yunfang. An Automatic Summarization Model Based on Deep Learning for Chinese [J]. Journal of Guangxi Normal University(Natural Science Edition), 2020, 38(2): 51-63. |
[6] | WANG Jian, ZHENG Qifan, LI Chao, SHI Jing. Remote Supervision Relationship Extraction Based on Encoder and Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(4): 53-60. |
[7] | WU Wenya,CHEN Yufeng,XU Jin’an,ZHANG Yujie. High-level Semantic Attention-based Convolutional Neural Networks for Chinese Relation Extraction [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 32-41. |
[8] | YUE Tianchi, ZHANG Shaowu, YANG Liang, LIN Hongfei, YU Kai. Stance Detection Method Based on Two-Stage Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2019, 37(1): 42-49. |
|