A multi-modal detection method for holding ladders in underground climbing operations

SUN Qing; YANG Chaoyu

doi:10.13272/j.issn.1671-251x.2024010068

Volume 50 Issue 5

May 2024

Turn off MathJax

Article Contents

Article Navigation > Journal of Mine Automation > 2024 > 50(5): 142-150

SUN Qing, YANG Chaoyu. A multi-modal detection method for holding ladders in underground climbing operations[J]. Journal of Mine Automation，2024，50（5）：142-150. doi: 10.13272/j.issn.1671-251x.2024010068

Citation:

PDF( 7336 KB)

A multi-modal detection method for holding ladders in underground climbing operations

doi: 10.13272/j.issn.1671-251x.2024010068

School of Artificial Intelligence, Anhui University of Science and Technology, Huainan 232001, China

Received Date: 2024-01-22
Rev Recd Date: 2024-05-20

Available Online: 2024-06-13

Abstract

Abstract

Currently, most research on recognizing unsafe behaviors of underground personnel focuses on improving precision through computer vision. However, underground areas are prone to occlusion, unstable lighting, and reflection, making it difficult to accurately recognize unsafe behaviors using computer vision technology alone. Especially, similar actions such as climbing ladders and holding ladders during climbing operations are easily confused during the recognition process, posing safety hazards. In order to solve the above problems, a multi-modal detection method for holding ladders in underground climbing operations is proposed. This method analyzes surveillance video data from two modalities: visual and audio. In terms of visual modality, the YOLOv8 model is used to detect the presence of ladder. If there is a ladder, the position coordinates of the ladder are obtained, and the video segment is put into the OpenPose algorithm for pose estimation to obtain the features of various skeletal joint points of the human body. These skeletal joint point sequences are then placed into improved spatial attention temporal graph convolutional networks(SAT-GCN) to obtain human action labels and their corresponding probabilities. In terms of audio modality, the PaddlePaddle automatic language recognition system is used to convert speech into text, and the bidirectional encoder representations from transformers (BERT) model is used to analyze and extract the features of text information, so as to obtain the text label and its corresponding probability. Finally, the information obtained from the visual and audio modalities is fused at the decision-making level to determine whether there is a dpersonnel holding ladders for underground climbing operations. The experimental results show that in action recognition based on skeleton data, the optimized SAT-GCN model improves the recognition precision of three types of actions: holding, climbing, and standing by 3.36%, 2.83%, and 10.71%, respectively. The multi-modal detection method has a higher recognition accuracy than the single modal method, reaching 98.29%.
- climbing operation,
- personnel holding ladders,
- multi-modal fusion,
- visual modality,
- audio modality,
- YOLOv8,
- OpenPose,
- SAT-GCN,
- BERT

FullText(HTML)

References(22)

References

[1]	张瑜,冯仕民,杨赛烽,等. 矿工不安全行为影响因素本体构建与推理研究[J]. 煤矿安全,2019,50(5):300-304. ZHANG Yu,FENG Shimin,YANG Saifeng,et al. Ontology construction and reasoning research on influencing factors of miners' unsafe behavior[J]. Safety in Coal Mines,2019,50(5):300-304.
[2]	登高作业操作规程[EB/OL]. (2021-11-08)[2023-10-08]. https://www.mkaq.org/html/2021/11/08/593666.shtml. Operation procedures for climbingoperations[EB/OL]. (2021-11-08)[2023-10-08]. https://www.mkaq.org/html/2021/11/08/593666.shtml.
[3]	刘浩,刘海滨,孙宇,等. 煤矿井下员工不安全行为智能识别系统[J]. 煤炭学报,2021,46(增刊2):1159-1169. LIU Hao,LIU Haibin,SUN Yu,et al. Intelligent recognition system of unsafe behavior of underground coal miners[J]. Journal of China Coal Society,2021,46(S2):1159-1169.
[4]	饶天荣,潘涛,徐会军. 基于交叉注意力机制的煤矿井下不安全行为识别[J]. 工矿自动化,2022,48(10):48-54. RAO Tianrong,PAN Tao,XU Huijun. Unsafe action recognition in underground coal mine based on cross-attention mechanism[J]. Journal of Mine Automation,2022,48(10):48-54.
[5]	王宇,于春华,陈晓青,等. 基于多模态特征融合的井下人员不安全行为识别[J]. 工矿自动化,2023,49(11):138-144. WANG Yu,YU Chunhua,CHEN Xiaoqing,et al. Recognition of unsafe behaviors of underground personnel based on multi modal feature fusion[J]. Journal of Mine Automation,2023,49(11):138-144.
[6]	赵登阁,智敏. 用于人体动作识别的多尺度时空图卷积算法[J]. 计算机科学与探索,2023,17(3):719-732. doi: 10.3778/j.issn.1673-9418.2106102 ZHAO Dengge,ZHI Min. Spatial multiple-temporal graph convolutional neural network for human action recognition[J]. Journal of Frontiers of Computer Science and Technology,2023,17(3):719-732. doi: 10.3778/j.issn.1673-9418.2106102
[7]	LI Peilin,WU Fan,XUE Shuhua,et al. Study on the interaction behaviors identification of construction workers based on ST-GCN and YOLO[J]. Sensors,2023,23(14). DOI: 10.3390/S23146318.
[8]	SHI Xiaonan,HUANG Jian,HUANG Bo. An underground abnormal behavior recognition method based on an optimized Alphapose-ST-GCN[J]. Journal of Circuits,Systems and Computers,2022,31(12). DOI: 10.1142/S0218126622502140.
[9]	苏晨阳,武文红,牛恒茂,等. 深度学习的工人多种不安全行为识别方法综述[J]. 计算机工程与应用,2024,60(5):30-46. SU Chenyang,WU Wenhong,NIU Hengmao,et al. Review of deep learning approaches for recognizing multiple unsafe behaviors in workers[J]. Computer Engineering and Applications,2024,60(5):30-46.
[10]	YAN Sijie,XIONG Yuanjun,LIN Dahua. Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. AAAI Conference on Artificial Intelligence,New Orleans,2018:7444-7452.
[11]	SONG Sijie,LAN Cuiling,XING Junliang,et al. Spatio-temporal attention-based LSTM networks for 3D action recognition and detection[J]. IEEE Transactions on Image Processing,2018,27(7):3459-3471. doi: 10.1109/TIP.2018.2818328
[12]	CAO Zhe,SIMON T,WEI S E,et al. Realtime multi-person 2D pose estimation using part affinity fields[C]. IEEE Conference on Computer Vision and Pattern Recognition,Honolulu,2017:1302-1310.
[13]	许奇珮. 基于ST−GCN的人体骨架动作识别方法研究[D]. 长春:长春工业大学,2023. XU Qipei. Research on human skeleton action recognition method based on ST-GCN[D]. Changchun:Changchun University of Technology,2023.
[14]	DEVLIN J,CHANG Mingwei,LEE K,et al. BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL]. (2018-10-11)[2023-10-08]. https://doi.org/10.48550/arXiv.1810.04805.
[15]	LIU Shu,QI Lu,QIN Haifeng,et al. Path aggregation network for instance segmentation[C]. IEEE Conference on Computer Vision and Pattern Recognition,Salt Lake City,2018:8759-8768.
[16]	李雯静,刘鑫. 基于深度学习的井下人员不安全行为识别与预警系统研究[J]. 金属矿山,2023(3):177-184. LI Wenjing,LIU Xin. Research on underground personnel unsafe behavior identification and early warning system based on deep learning[J]. Metal Mine,2023(3):177-184.
[17]	刘耀,焦双健. ST−GCN在建筑工人不安全动作识别中的应用[J]. 中国安全科学学报,2022,32(4):30-35. LIU Yao,JIAO Shuangjian. Application of ST-GCN in unsafe action identification of construction workers[J]. China Safety Science Journal,2022,32(4):30-35.
[18]	WOO S,PARK J,LEE J Y,et al. CBAM:convolutional block attention module[C]. European Conference on Computer Vision,Cham,2018:3-19.
[19]	景永霞,苟和平,刘强. 基于BERT语义分析的短文本分类研究[J]. 兰州文理学院学报(自然科学版),2023,37(6):46-49. JING Yongxia,GOU Heping,LIU Qiang. Classification study on online short text based on BERT semantic analysis[J]. Journal of Lanzhou University of Arts and Science(Natural Sciences),2023,37(6):46-49.
[20]	姜长三,曾桢,万静. 多源信息融合研究进展综述[J]. 现代计算机,2023,29(18):1-9,29. JIANG Changsan,ZENG Zhen,WAN Jing. A review of research advances in multi-source information fusion[J]. Modern Computer,2023,29(18):1-9,29.
[21]	ZHANG Pengfei,LAN Cuiling,XING Junliang,et al. View adaptive neural networks for high performance skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(8):1963-1978. doi: 10.1109/TPAMI.2019.2896631
[22]	SHI Lei,ZHANG Yifan,CHENG Jian,et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition[C]. IEEE/CVF Conference on Computer Vision and Pattern Recognition,Long Beach,2019:12018-12027.

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(13) / Tables(3)

Get Citation

PDF

XML

Article Metrics

Article views (97) PDF downloads(15)

A multi-modal detection method for holding ladders in underground climbing operations

doi: 10.13272/j.issn.1671-251x.2024010068

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

A multi-modal detection method for holding ladders in underground climbing operations

doi: 10.13272/j.issn.1671-251x.2024010068

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content