Journal of Guangxi Normal University(Natural Science Edition) ›› 2023, Vol. 41 ›› Issue (2): 1-18.doi: 10.16088/j.issn.1001-6600.2022083002
YANG Shuozhen1, ZHANG Long1*, WANG Jianhua2, ZHANG Hengyuan1
CLC Number:
[1] TAN E L, KARNAPI F A, NG L J, et al. Extracting urban sound information for residential areas in smart cities using an end-to-end IoT system[J]. IEEE Internet of Things Journal, 2021, 8(18): 14308-14321. DOI: 10.1109/JIOT.2021.3068755. [2] PANDYA S, GHAYVAT H. Ambient acoustic event assistive framework for identification, detection, and recognition of unknown acoustic events of a residence[J]. Advanced Engineering Informatics, 2021, 47: 101238. DOI: 10.1016/j.aei.2020.101238. [3] 李玲俐. 家庭保健监测系统中环境声音事件的识别[J]. 重庆师范大学学报(自然科学版), 2016, 33(4): 118-122. [4] 张丽君. 公共场所异常声音识别算法设计与研究[D]. 重庆: 重庆大学, 2017. [5] ARSLAN Y, CANBOLAT H. Sound based alarming based video surveillance system design[J]. Multimedia Tools and Applications, 2022, 81(6): 7969-7991. DOI: 10.1007/s11042-022-12028-6. [6] MOUAWAD P, DUBNOV T, DUBNOV S. Robust detection of COVID-19 in cough sounds[J]. SN Computer Science, 2021, 2(1): 34. DOI: 10.1007/s42979-020-00422-6. [7] 苏映新. 自适应粒子群优化匹配追踪声音事件识别算法[J]. 激光与光电子学进展, 2020, 57(10): 101502. DOI: 10.3788/LOP57.101502. [8] TANG T T, LIANG Y H, LONG Y H. Two improved architectures based on prototype network for few-shot bioacoustic event detection[R/OL]. (2021-06-10)[2022-08-30].https://dcase.community/documents/challenge2021/technical_reports/DCASE2021_Tang_54_task5.pdf. [9] HEITTOLA T, MESAROS A, VIRTANEN T, et al. Supervised model training for overlapping sound events based on unsupervised source separation[C]// 2013 IEEE international conference on acoustics, speech and signal processing. Piscataway, NJ: IEEE, 2013: 8677-8681. DOI: 10.1109/ICASSP.2013.6639360. [10] DE BENITO-GORRON D, SEGOVIA S, RAMOS D, et al. Multiple feature resolutions for different polyphonic sound detection score scenarios in DCASE 2021 Task 4[C]// Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021). Barcelona: DCASE, 2021: 65-69. DOI: 10.5281/zenodo.5770113. [11] 石自强, 韩纪庆, 郑铁然. 鲁棒声学事件检测综述[J]. 智能计算机与应用, 2012, 2(6): 31-35. [12] DANG A, VU T H, WANG J C. A survey of deep learning for polyphonic sound event detection[C]// 2017 International Conference on Orange Technologies (ICOT). Piscataway, NJ: IEEE, 2017: 75-78. DOI: 10.1109/ICOT.2017.8336092. [13] XIA X J, TOGNERI R, SOHEL F, et al. A survey: neural network-based deep learning for acoustic event detection[J]. Circuits, Systems, and Signal Processing, 2019, 38(8): 3433-3453. DOI: 10.1007/s00034-019-01094-1. [14] EDDY S R. What is a hidden Markov model?[J]. Nature biotechnology, 2004, 22(10): 1315-1316. DOI: 10.1038/nbt1004-1315. [15] REYNOLDS D. Gaussian mixture models[M]// LI S Z, JAIN A K. Encyclopedia of Biometrics. Boston, MA: Springer, 2009: 659-663. DOI: 10.1007/978-0-387-73003-5_196. [16] XIANG Y, SHI L M, HØJVANG J L, et al. A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2022, 2022: 22. DOI: 10.1186/s13636-022-00256-5. [17] MESAROS A, HEITTOLA T, ERONEN A, et al. Acoustic event detection in real life recordings[C]// 2010 18th European Signal Processing Conference. Piscataway, NJ: IEEE, 2010: 1267-1271. [18] MAHMOOD A, KÖSE U. Speech recognition based on convolutional neural networks and MFCC algorithm[J]. Advances in Artificial Intelligence Research, 2021, 1(1): 6-12. [19] FORNEY G D. The viterbi algorithm[J]. Proceedings of the IEEE, 1973, 61(3): 268-278. DOI: 10.1109/PROC.1973.9030. [20] HEITTOLA T, MESAROS A, ERONEN A, et al. Context-dependent sound event detection[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013, 2013: 1. DOI: 10.1186/1687-4722-2013-1. [21] ERONEN A J, PELTONEN V T, TUOMI J T, et al. Audio-based context recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(1): 321-329. DOI: 10.1109/TSA.2005.854103. [22] RYYNANEN M P, KLAPURI A. Polyphonic music transcription using note event modeling[C]// IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005. Piscataway, NJ: IEEE, 2005: 319-322. DOI: 10.1109/ASPAA.2005.1540233. [23] 徐慧敏. 非负矩阵分解算法及应用研究[D]. 无锡:江南大学, 2020. DOI: 10.27169/d.cnki.gwqgu.2020.000755. [24] HEITTOLA T, MESAROS A, VIRTANEN T, et al. Sound event detection in multisource environments using source separation[C]// First International Workshop on Machine Listening in Multisource Environments (CHiME 2011). Florence: CHiME,2011: 36-40. [25] CAKIR E, HEITTOLA T, HUTTUNEN H, et al. Polyphonic sound event detection using multi label deep neural networks[C]// 2015 International Joint Conference on Neural Networks (IJCNN). Piscataway, NJ: IEEE, 2015: 1-7. DOI: 10.1109/IJCNN.2015.7280624. [26] 李先苦. 基于深度学习的声学场景分类与声音事件检测[D]. 广州:华南理工大学, 2019. DOI: 10.27151/d.cnki.ghnlu.2019.001370. [27] 杨利平, 郝峻永, 辜小花, 等. 音频标记一致性约束 CRNN 声音事件检测[J]. 电子与信息学报, 2022, 44(3): 1102-1110. DOI: 10.11999/JEIT210131. [28] HEITTOLA T, MESAROS A, ERONEN A, et al. Audio context recognition using audio event histograms[C]// 2010 18th European Signal Processing Conference. Piscataway, NJ: IEEE, 2010: 1272-1276. [29] PARASCANDOLO G, HUTTUNEN H, VIRTANEN T. Recurrent neural networks for polyphonic sound event detection in real life recordings[C]// 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP). Piscataway, NJ: IEEE, 2016: 6440-6444. DOI: 10.1109/ICASSP.2016.7472917. [30] XIA X J, TOGNERI R, SOHEL F, et al. Confidence based acoustic event detection[C]// 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2018: 306-310. DOI: 10.1109/ICASSP.2018.8461845. [31] VESPERINI F, GABRIELLI L, PRINCIPI E, et al. Polyphonic sound event detection by using capsule neural networks[J]. IEEE Journal of Selected Topics in Signal Processing, 2019, 13(2): 310-322. DOI: 10.1109/JSTSP.2019.2902305. [32] SABOUR S, FROSST N, HINTON G E. Dynamic routing between capsules[C]// Advances in Neural Information Processing Systems30 (NIPS 2017). Red Hook, NY: Curran Associates Inc.,2017:3859-3869. [33] 杨巨成, 韩书杰, 毛磊, 等. 胶囊网络模型综述[J]. 山东大学学报(工学版), 2019, 49(6): 1-10. [34] 刘亚明. 基于深层神经网络的多声音事件检测方法研究[D]. 合肥:中国科学技术大学, 2019. [35] CAKIR E, PARASCANDOLO G, HEITTOLA T, et al. Convolutional recurrent neural networks for polyphonic sound event detection[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(6): 1291-1303. DOI: 10.1109/TASLP.2017.2690575. [36] WANG Y B, ZHAO G H, XIONG K, et al. Multi-scale and single-scale fully convolutional networks for sound event detection[J]. Neurocomputing, 2021, 421: 51-65. DOI: 10.1016/j.neucom.2020.09.038. [37] WANG Y B, ZHAO G H, XIONG K, et al. MSFF-Net: Multi-scale feature fusing networks with dilated mixed convolution and cascaded parallel framework for sound event detection[J]. Digital Signal Processing, 2022, 122: 103319. DOI: 10.1016/j.dsp.2021.103319. [38] XIA X J, TOGNERI R, SOHEL F, et al. Auxiliary classifier generative adversarial network with soft labels in imbalanced acoustic event detection[J]. IEEE Transactions on Multimedia, 2018, 21(6): 1359-1371. DOI: 10.1109/TMM.2018.2879750. [39] CRESWELL A, WHITE T, DUMOULIN V, et al. Generative adversarial networks: an overview[J]. IEEE signal processing magazine, 2018, 35(1): 53-65. DOI: 10.1109/MSP.2017.2765202. [40] TARVAINEN A, VALPOLA H. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results[C]// Advances in Neural Information Processing Systems30 (NIPS 2017). Red Hook, NY: Curran Associates Inc.,2017: 1195-1204. [41] NING X, WANG X R, XU S H, et al. A review of research on co-training[J/OL]. Concurrency and Computation: Practice and Experience, 2021: e6276[2022-08-30].https://onlinelibrary.wiley.com/doi/10.1002/cpe.6276. [42] MABUDE K, MALELA-MAJIKA J C, CASTAGLIOLA P, et al. Generally weighted moving average monitoring schemes: overview and perspectives[J]. Quality and Reliability Engineering International, 2021, 37(2): 409-432. DOI: 10.1002/qre.2765. [43] KIM N K, KIM H K. Polyphonic sound event detection based on residual convolutional recurrent neural network with semi-supervised loss function [J]. IEEE Access, 2021, 9: 7564-7575. DOI: 10.1109/ACCESS.2020.3048675. [44] LIU Y Z, CHEN H T, ZHAO Q W, et al. Master-Teacher-Student: a weakly labelled semi-supervised framework for audio tagging and sound event detection[J]. IEICE Transactions on Information and Systems, 2022, 105(4): 828-831. DOI: 10.1587/transinf.2021EDL8082. [45] ZHENG X, FU C, XIE H Y, et al. Uncertainty-aware deep co-training for semi-supervised medical image segmentation[J]. Computers in Biology and Medicine, 2022, 149: 106051. DOI: 10.1016/j.compbiomed.2022.106051. [46] ZHENG X, SONG Y, DAI L R, et al. An effective mutual mean teaching based domain adaptation method for sound event detection[C]// Proceedings of Interspeech 2021. Baixas: International Speech Communication Association,2021: 556-560. DOI: 10.21437/Interspeech.2021-281. [47] DONG S S, LIU C. Sentiment classification for financial texts based on deep learning[J]. Computational Intelligence and Neuroscience, 2021, 2021: 9524705. DOI: 10.1155/2021/9524705. [48] FARAHANI A, VOGHOEI S, RASHEED K, et al. A brief review of domain adaptation[C]// Advances in Data Science and Information Engineering: Proceedings from ICDATA 2020 and IKE 2020. Cham: Springer Nature Switzerland AG, 2021: 877-894. DOI: 10.1007/978-3-030-71704-9_65. [49] IMOTO K, MISHIMA S, ARAI Y, et al. Impact of data imbalance caused by inactive frames and difference in sound duration on sound event detection performance[J]. Applied Acoustics, 2022, 196: 108882. DOI: 10.1016/j.apacoust.2022.108882. [50] 郑伟哲, 仇鹏, 韦娟. 弱标签环境下基于多尺度注意力融合的声音识别检测[J]. 计算机科学, 2020, 47(5): 120-123. [51] KIM S J, CHUNG Y J. Multi-scale features for transformer model to improve the performance of sound event detection[J]. Applied Sciences, 2022, 12(5): 2626. DOI: 10.3390/app12052626. [52] ZHOU Q, WANG J, LIU J, et al. RSANet: towards real-time object detection with residual semantic-guided attention feature pyramid network[J]. Mobile Networks and Applications, 2021, 26(1): 77-87. DOI: 10.1007/s11036-020-01723-z. [53] KOH C Y, CHEN Y S, LIU Y W, et al. Sound event detection by consistency training and pseudo-labeling with feature-pyramid convolutional recurrent neural networks[C]// 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2021: 376-380. DOI: 10.1109/ICASSP39728.2021.9414350. [54] VERMA V, KAWAGUCHI K, LAMB A, et al. Interpolation consistency training for semi-supervised learning[J]. Neural Networks, 2022, 145: 90-106. DOI: 10.1016/j.neunet.2021.10.008. [55] JIN Y, WANG M, LUO L Y, et al. Polyphonic sound event detection using temporal-frequency attention and feature space attention[J]. Sensors, 2022, 22(18): 6818. DOI: 10.3390/s22186818. [56] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems30 (NIPS 2017). Red Hook, NY: Curran Associates Inc.,2017: 6000-6010. [57] MIYAZAKI K, KOMATSU T, HAYASHI T, et al. Weakly-supervised sound event detection with self-attention[C]// 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2020: 66-70. DOI: 10.1109/ICASSP40776.2020.9053609. [58] GEMMEKE J F, ELLIS D P W, FREEDMAN D, et al. Audio set: an ontology and human-labeled dataset for audio events[C]// 2017 IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2017: 776-780. DOI: 10.1109/ICASSP.2017.7952261. [59] MESAROS A, HEITTOLA T, VIRTANEN T. TUT database for acoustic scene classification and sound event detection[C]// 2016 24th European Signal Processing Conference (EUSIPCO). Piscataway, NJ: IEEE, 2016: 1128-1132. DOI: 10.1109/EUSIPCO.2016.7760424. [60] SALAMON J, MACCONNELL D, CARTWRIGHT M, et al. Scaper: a library for soundscape synthesis and augmentation[C]// 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Piscataway, NJ: IEEE, 2017: 344-348. DOI: 10.1109/WASPAA.2017.8170052. [61] DEKKERS G, LAUWEREINS S, THOEN B, et al. The SINS database for detection of daily activities in a home environment using an acoustic sensor network[C]// Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017). Tampere: Tampere University of Technology, 2017: 32-36. [62] MIYAZAKI K, KOMATSU T, HAYASHI T, et al. Convolution-augmented transformer for semi-supervised sound event detection[R/OL]. (2020-06-10)[2022-08-30].https://dcase.community/documents/challenge2020/technical_reports/DCASE2020_Miyazaki_108.pdf. [63] KÜÇÜKBAY S E, YAZICI A, KALKAN S. Hand-crafted versus learned representations for audio event detection[J]. Multimedia Tools and Applications, 2022, 81(21): 30911-30930. DOI: 10.1007/s11042-022-12873-5. |
[1] | WANG Luna, DU Hongbo, ZHU Lijun. Stacked Capsule Autoencoders Optimization Algorithm Based on Manifold Regularization [J]. Journal of Guangxi Normal University(Natural Science Edition), 2023, 41(2): 76-85. |
[2] | HAO Yaru, DONG Li, XU Ke, LI Xianxian. Interpretability of Pre-trained Language Models: A Survey [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(5): 59-71. |
[3] | ZHANG Ping, XU Qiaozhi. Segmentation of Lung Nodules Based on Multi-receptive Field and Grouping Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 76-87. |
[4] | LI Yongjie, ZHOU Guihong, LIU Bo. Fusion Algorithm of Face Detection and Head Pose Estimation Based on YOLOv3 Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 95-103. |
[5] | WU Jun, OUYANG Aijia, ZHANG Lin. Phosphorylation Site Prediction Model Based on Multi-head Attention Mechanism [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 161-171. |
[6] | YAN Longchuan, LI Yan, SONG Hu, ZOU Haodong, WANG Lijun. Web Traffic Prediction Based on Prophet-DeepAR [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 172-184. |
[7] | CHEN Gaojian, WANG Jing, LI Qianwen, YUAN Yunjing, CAO Jiachen. Data-driven Method for Automatic Machine Learning Pipeline Generation [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(3): 185-193. |
[8] | LIN Peiqun, HE Huohua, LIN Xukun. Multi-scale Prediction of Expressways' Arrival Volume of Large and Medium-sized Trucks Based on System Relevance [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(2): 15-26. |
[9] | YANG Di, FANG Yangxin, ZHOU Yan. New Category Classification Research Based on MEB and SVM Methods [J]. Journal of Guangxi Normal University(Natural Science Edition), 2022, 40(1): 57-67. |
[10] | LU Kaifeng, YANG Yilong, LI Zhi. A Web Service Classification Method Using BERT and DPCNN [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(6): 87-98. |
[11] | WU Lingyu, LAN Yang, XIA Haiying. Retinal Image Registration Using Convolutional Neural Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(5): 122-133. |
[12] | CHEN Wenkang, LU Shenglian, LIU Binghao, LI Guo, LIU Xiaoyu, CHEN Ming. Real-time Citrus Recognition under Orchard Environment by Improved YOLOv4 [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(5): 134-146. |
[13] | YANG Zhou, FAN Yixing, ZHU Xiaofei, GUO Jiafeng, WANG Yue. Survey on Modeling Factors of Neural Information Retrieval Model [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(2): 1-12. |
[14] | DENG Wenxuan, YANG Hang, JIN Ting. A Dimensionality-reduction Method Based on Attention Mechanismon Image Classification [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(2): 32-40. |
[15] | XUE Tao, QIU Senhui, LU Hao, QIN Xingsheng. Exchange Rate Prediction Based on Empirical Mode Decomposition and Multi-branch LSTM Network [J]. Journal of Guangxi Normal University(Natural Science Edition), 2021, 39(2): 41-50. |
|