Abstract:
Tunnel scenarios such as mine roadways and traffic tunnels are often plagued by fire threats. It is of great significance to use image-based intelligent fire detection methods to rapidly identify the fire's location during its early stages. However, existing methods face the problem of times series consistency and are highly sensitive to changes in camera pose, resulting in decreased detection performance in complex and dynamic environments. To address this issue, a tunnel fire source depth estimation method based on infrared (IR) and visible light (RGB) image fusion was proposed. A pose network within a self-supervised learning framework was introduced to predict pose changes between adjacent frames. A two-stage training depth estimation network was constructed. Based on the UNet network architecture, IR and RGB features were extracted and fused at different scales, ensuring a balanced depth estimation process. A camera height loss was introduced to further enhance the accuracy and reliability of fire source detection in complex and dynamic environments. Experimental results on a self-constructed tunnel flame dataset demonstrated that when Resnet50 was used as the backbone network, the absolute relative error of the constructed tunnel fire source self-supervised monocular depth estimation network model was 0.102, the square relative error was 0.835, and the mean square error was 4.491, outperforming mainstream models such as Lite-Mono, MonoDepth, MonoDepth2, and VAD. The overall accuracy was optimal under accuracy thresholds of 1.25, 1.25
2, and 1.25
3. The model had better prediction results for objects in close-up and long range areas than DepthAnything, MonoDepth2, Lite-Mono models.