2025 volumne 46 Issue 05
ZHANG Zhen1, XIAO Zongrong2, LI Youhao3, HUANG Weitao3
Abstract: To address the safety risks posed by construction vehicles operations in highrisk areas near natural gas pipelines, particularly the physical impacts and environmental disturbances caused by heavy vehicles, in this study an improved YOLOv7-based construction vehicles recognition algorithm was proposed. Six common types of construction vehicles including dump trucks, rollers, mixers, forklifts, excavators, and loaders were selected as the research objects. A custom dataset, containing images captured in various environments and angles, was used to train the model, ensuring its performance. Firstly, the CBAM attention mechanism was introduced into the YOLOv7 head, and an improved GAM attention mechanism was added to the max pooling layer to enhance the model′s focus on key image features and improve detection accuracy. Secondly, the DySample dynamic upsampling module replaced the nearest neighbor interpolation, boosting precision. Finally, an improved SPPCSPC module was designed to enhance feature extraction efficiency, reduce computational costs, and accelerate inference. These modifications could enable the model to maintain high detection accuracy even in challenging scenarios such as low-quality images or distant targets. Experimental results demonstrated that the proposed algorithm achieved a precision P of 97.7%, recall R of 94.7%, mAP@0.5 of 98.6%, and mAP@0.5∶0.95 of 90.4%. Compared to the original YOLOv7 algorithm, these metrics improved by 1.3, 1.4, 1.4, and 3.7 percentage points, respectively.
WEI Mingjun1,2, CHEN Xiaoru1, LIU Ming1, LIU Yazhi1,2, LI Hui1
Abstract: Camouflaged object detection (COD) is facing significant challenges due to the high similarity between target objects and their background, such as blurred edge predictions, incomplete detection results, and interference caused by the insufficient use of edge and texture information. To address the issues of current COD, a novel edge-texture guided enhancement network (ETGENet) was proposed to further improve the performance of COD through explicit and sufficient edge-texture guidance strategies. Firstly, a key feature guided enhancement module (FGEM) was used in ETGENet, which could use parallel feature refinement branches to process and enhance object features. The guide branch could obtain object features by guiding attention correlation with edge and texture cues to enhance the network′s understanding of object details and suppress noise interference. While the self-enhancement branch could use the self-attention mechanism to refine the characteristics of camouflaged objects from a global perspective. Secondly, a feature interaction fusion module (FIFM) was also proposed to progressively fuse adjacent features. FIFM could utilize the attention interaction mechanism and weighted fusion strategy to learn complementary information between features to generate more complete predicted map. Experiments on three public datasets CAMO, COD10K, and NC4K demonstrate that the proposed network outperformed state-of-the-art methods in the field across metrics such as structure measure S, adaptive enhanced matching measure E, weighted F-measure, and mean absolute error M. Notably, on the largest test set, NC4K, the weighted F-measure surpassed the best-performing method among the 12 advanced COD methods, FSPNet, by 2.2 percentage points.
LIU Zhaoying1, CHEN Zhiyuan1, ZHANG Ting1, SHI Yanan2, CHEN Yingchun3
Abstract: Aiming at the problem of limited resources and low contrast of surface defect images in industrial scenarios, an improved YOLOv5 industrial product surface defect detection method was proposed. This method first introduced a receptive field enhancement module in the backbone network to extract richer visual features from different levels of receptive fields. Secondly, a shuffle attention module was added to the feature fusion network to more effectively fuse feature maps of different dimensions. Finally, a task decoupling detection head was adopted, allowing the classification and regression tasks to use independent networks for prediction, reducing mutual interference and improving detection accuracy. The experimental results showed that the parameter and computational complexity of this network were lower than models such as YOLOX, YOLOv7, and deformable DETR. On the pipeline Digital Ray (DR) defect image dataset and NEU-DET dataset, the mAP@0.5 were increased by 2.23 percentage points and 2.99 percentage points respectively, balancing the requirements for real-time and accurate defect detection in industrial scenarios.
CHEN Liwei, PENG Yifei, YU Renping, SUN Yuancheng
Abstract: Aiming at the problem in existing methods of automatic segmentation of hippocampus image, which can not make good use of the context information, might lead to the difficulty in improving the segmentation accuracy and large memory consumption in the process of training and detection, a new model called MVF-2.5D U-Net based on multi-view fusion and 2.5D U-Net was introduced. Firstly, this model improved the 2D U-Net by incorporating a Triplet Attention module and adjusting the depth of the network. Secondly, the traditional single-slice input was replaced by a three-channel 2.5D image composed of adjacent slices. Finally, a volume fusion network was constructed to replace the conventional majority voting scheme. This study was validated by cross-validation on the HarP dataset. The experimental results showed that the average Dice coefficient and Hausdorff distance of the model on the hippocampus image segmentation task were 0.902 and 3.02, respectively, the accuracy and stability was better than the traditional U-Net model and comparison algorithm, and it was also suitable for the resource-constrained situation, which proved that the proposed model could achieve hippocampus segmentation on MRI more effectively.
CAO Yangjie, WANG Weiping, LI Zhenqiang, XIE Jun, LYU Runfeng
Abstract: To address the issues of excessive reliance on annotated data and high computational complexity in 3D scene editing algorithms, in this study a multimodal scene editing method named CLIP2Gaussian was proposed, which integrated CLIP with 3D Gaussian. Firstly, the algorithm employed SAM to extract target masks from multiview images and introduced a bidirectional propagation strategy to ensure mask consistency across different views. Secondly, the extracted masks were assigned semantic labels using CLIP and mapped to 3D Gaussian points to enable semantic embedding in the 3D scene. Finally, a differentiable rendering mechanism was used to optimize the parameters of the 3D Gaussians, and a spatial consistency regularization strategy was introduced by applying clustering to enhance the consistency and stability of semantic labels in 3D space. Experimental results showed that CLIP2Gaussian achieved 61.23% IoU on the LERF dataset and a per-query response time of 0.57 seconds in semantic segmentation tasks, improving the speed by 54 times compared to LERF while achieving superior accuracy and efficiency. Ablation studies further verified that the proposed method enabled precisely editing of target regions with minimal disturbance to the original scene.
ZHANG Haibin1, WEI Hongji1, WANG Chao2, XIANG Changbo3, YANG Mingyang3, LI Xiaolong4
Abstract: In complex electromagnetic environments, interference signals can severely degrade the detection and recognition performance of frequency-hopping signals. To address issues of false detection, missed detection, and over-detection in traditional methods, in this study an improved time-frequency diagram-based signal detection and recognition algorithm was proposed by modifying the YOLOv5s network. Firstly, a composite dataset containing frequency hopping signals + interference signals was constructed, comprising 4 modulation types of frequency hopping signals and 6 interference types, with 300 high-resolution time-frequency diagram samples generated for each combination (totaling 7 200 groups). Secondly, considering the similar features between interference and signals in time-frequency diagrams, and recognizing that the frequency variation pattern of hopping signals could make background information around signals crucial for differentiation, a context hierarchy module was proposed to hierarchically process background information. This module employed depthwise separable convolution to extract surrounding background features and utilized a gated aggregation mechanism to perform weighted fusion of background information and signal features, thereby generating more discriminative composite features. Finally, the backbone network of YOLOv5s was modified by integrating the context hierarchy module and gated aggregation mechanism to develop an improved frequency hopping signal detector. Simulation results showed that compared with the original YOLOv5s network, the proposed method achieved 15.9 percentage points improvement in recall rate R, 8.9 percentage points enhancement in mean average precision mAP@0.5∶0.95, and 9 percentage points increase in F1, while significantly reducing false and missed detection occurrences.
HAN Chenchen, LU Xiankai, WANG Zhicheng, XIONG Xiaozhou
Abstract: In response to the challenges of maintaining structural and temporal consistency between video frames in video prediction tasks, an object-centric video prediction algorithm based on dynamic memory and motion information was proposed. Firstly, by introducing an object-centric model, the objects in the scene were decoupled to ensure the consistency and stability of long-term dynamic modeling of video objects, to effectively maintain the structural consistency of video objects. Secondly, an object dynamic memory module was designed to capture the longterm dependencies of videos and model object dynamics, to overcome the shortcomings of existing video prediction methods in predicting dynamic interactions between objects and enhancing the temporal consistency of video objects. Thirdly, the feature similarity matrix of adjacent frames was used to capture the motion information between frames and model the spatiotemporal relationships of the video sequence, further strengthened the temporal consistency of video objects. Finally, a cross-attention mechanism was utilized to integrate the temporal and structural information of video objects, further improved the video prediction performance. Experiments on video prediction were conducted on the Obj3D and CLEVRER datasets with complex object interactions. The results showed that compared to the state-of-the-art object-centric video prediction algorithms, the proposed algorithm increased performance on the PSNR and SSIM metrics by 4.5% and 1.4%, respectively, and also achieved a 20% reduction in the LPIPS metric.
DONG Weiyu1, LIU Pengkun2, LIU Chunling1, TANG Yonghe1, MA Yupu2
Abstract: In the field of automated penetration testing, most existing attack path decision algorithms are based on partially observable Markov decision processes (POMDP), with problems such as high algorithm complexity, slow convergence speed, and susceptibility to getting stuck in local optima. In this study a reinforcement learning algorithm NoisyNet-A3C was proposed based on Markov Decision Process (MDP). And it was applied to the field of automated penetration testing. This algorithm trained actor-critic through multiple threads, and the operation results of each thread were fed back to the main neural network. At the same time, the latest parameter updates were obtained from the main neural network, fully utilizing computer performance, reducing data correlation, and improving training efficiency. In addition, adding noise parameters and weight network training update parameters to the training network increased the randomness of the behavior strategy, facilitated faster exploration of effective paths, reduced the impact of data disturbances, and enhanced the robustness of the algorithm. The experimental results showed that compared with A3C, Q-learning, DQN, and NDSPI-DQN algorithms, the NoisyNet-A3C algorithm converged more than 30% faster, verifying that the algorithm proposed in this study converged faster.
HAN Huijian, XING Huaiyu, ZHANG Yunfeng, ZHANG Rui
Abstract: Addressing the challenges posed by the varying scales of steel surface defects and the limited multi-scale feature processing capabilitied and accuracy of existing detection algorithms, in this study a steel surface defect detection method that integrates hybrid sampling and multi-attention collaboration was proposed. Firstly, an efficient channel feature extraction backbone was constructed to emphasize defect feature extraction against the complex background of steel surfaces. Secondly, a dual-attention collaborative feature pyramid was introduced to expand the network′s receptive field, thereby enhancing the capture of multi-scale defect features and improving the detection performance for small targets. Finally, a Transformer-based hybrid sampling strategy was designed to dynamically perceive defect regions, thereby boosting the overall detection performance of the model. Experimental comparisons on the NEU-DET dataset revealed that, compared to the baseline DETR algorithm, the improved algorithm achieved a 6.1 percentage point increase in mean average precision, reaching 81.4%, thereby enhancing the model′s accuracy in detecting steel surface defects. Additionally, with a detection speed of 44.2 frame/s, the proposed algorithm strikes a commendable balance between detection speed and performance.
LIU Kai, WANG Jiaqin, LI Hantao
Abstract: Vehicle trajectory prediction (VTP) was a significant research subject in the transportation technology field. Traditional VTP methods require extensive feature engineering and struggle to adapt to complex and dynamic environments in real-time. Deep learning (DL) overcomes the limitations of traditional methods by achieving efficient data representation through multi-layer neural networks. Therefore, in this study a comprehensive review of DL-based VTP methods was carried out to explore their applications and performance in VTP. Firstly, the traditional VTP and DL-based VTP methods were explored, and the main consideration problems and problem formulations in VTP were introduced. Secondly, various VTP schemes, including input data, output results and prediction methods were analyzed and compared. Subsequently, commonly used evaluation metrics was introduced, and the experimental results of these VTP approaches were compared, the applications of VTP were analyzed, and the superior performance of DL in VTP were demonstrated. Finally, future research directions of VTP are discussed in terms of datasets, modeling approaches, and computational efficiency. It identifies that vehicle interaction collaborative modeling, model generalization, and multimodal fusion would constitute the primary challenges and research frontiers in the field.
QIN Dongchen, LUO Qingzhou, YANG Junjie, CHEN Jiangyi, WU Hongxia
Abstract: The issues of low charging speed, rapid temperature rise, lithium plating, and overcharging in lithiumion batteries were addressed in this study. A model predictive control (MPC) charging strategy based on an improved particle swarm optimization (PSO) algorithm was proposed.Firstly,an equivalent circuit-thermal-electrochemical-aging coupled model was established, combining the advantages of equivalent circuit and electrochemical models to accurately predict terminal voltage, temperature variations, and aging mechanisms (e. g., SEI film growth, active material loss, and capacity loss from lithium plating).Secondly,the coupled model was discretized to build a state-space model, with added safety constraints to prevent thermal runaway, lithium plating, and overcharging. Based on the state-space model, future battery states were predicted, and a cost function for charging time and energy loss was formulated.Finally, the improved PSO algorithm was used to solve for the optimal charging current sequence, enabling real-time charging optimization.MATLAB/Simulink simulations showed the strategy significantly reduced charging time while effectively controlling battery temperature, terminal voltage, and lithium plating overpotential, avoiding issues like thermal runaway, lithium plating and overcharging. Experimental comparisons with traditional strategies showed a reduction in charging time by 17.3% to 61.1% and capacity decay by 7.6% to 36%. This research provided a new direction for lithium-ion battery charging optimization.
LIU Haibo1,2, LIANG Peng1, LIU Ziqian1
Abstract: In view of the time-delay caused by signal transmission in the large-scale interconnected power systems, the robust stability of power systems with stochastic time-delay was investigated. Firstly, by considering the stochastic characteristics of time-delay and uncertain factors in practical systems, and assuming that the probability of timedelay obeyed Bernoulli distribution, a new system model was established. Secondly, to introduce more probability distribution information of stochastic time-delay, a new augmented vector and L-K functional with more state information and multiple integral terms were constructed. Then, by utilizing the generalized free weight matrix inequality (GFWMI) to handle the quadratic integral term, more accurate upper bound was obtained. As a consequence, a less conservative mean square asymptotic stability criterion was derived by using Schur lemma and linear matrix inequality method. Simulation results show that the proposed method improves the time-delay stability margin of the system by 73% and reduces the time to achieve stability by 40% under parameter disturbances. The conservatism of the results was significantly reduced in comparison with existing literature.
LI Qing, LI Hong, DONG Haiying, WANG Hao
Abstract: To solve the issues of high computational and communication costs, as well as the difficulty in achieving fully distributed control within multi-distributed power generation optimization scheduling, a source load collaborative scheduling method based on the supernode collaborative consistency algorithm was proposed in this study, which combined frequency modulation control to enhance the consistency algorithm. Firstly, the original distributed power network structure′s topology was reconstructed based on the single-hop sampling method. Super nodes were chosen for solving of the decision variables, while the local set of ordinary nodes within the partition only communicated with these supernodes. Secondly, an adaptive global correction coefficient frequency modulation control method for improving the discrete consistency algorithm was introduced to better adapt to real-time scheduling and fully distributed control of active distribution networks. This method was utilized to address the system′s minimum operating cost optimization scheduling model. Finally, the effectiveness of the proposed scheduling method in dealing with topology switching and source load mutation scenarios was verified through MATLAB simulation. The simulation results showed that rapid convergence of consistency incremental cost could be achieved by the suggested scheduling algorithm, effectively controlling each DG′s incremental cost below 8.95 yuan when the system underwent sudden changes. Without sudden changes in the system, 197 iterations were needed by the algorithm to achieve scheduling allocation, offering an effective solution for the scheduling department.
ZHOU Enze1, HUANG Daochun2, WANG Lei1, PENG Tianhao2, LIU Shuqin1, WANG Hao1, CHEN Chao1
Abstract: In the complex background of transmission line corridors, traditional early warning methods for detecting wildfires showed poor performance, slow detection speeds, and high rates of false positives and missed detections in image recognition. In this study a wildfire detection method was introduced for transmission corridor based on an improved YOLOv8s model. Firstly, through network collection and screening of existing datasets, a wildfire image dataset featuring wilderness backgrounds was obtained, providing a more suitable match for the target background. Secondly, the ODConv module was introduced, replacing the original C2f module with the C2f_OD module for feature extraction in the Backbone and Neck sections of the baseline model, thereby enhancing the model′s detection performance for flames and smoke. Secondly, the Head section was replaced with the DyHead module, integrating three attention mechanisms scale, spatial, and task to further improve detection accuracy. The WIoU loss function was employed to focus detection frame regression on prediction boxes of ordinary quality, enhancing the model′s generalization performance in complex backgrounds. Finally, three ablation experiments and one comparative experiment were designed. The results demonstrated that, compared to the original YOLOv8s model, the proposed algorithm achieved 5.6% increase in mAP@0.5, 4.51% increase in P, 5.41% increase in R, and a detection speed of 34.9 frames per second, meeting the requirements for accurate wildfire detection along transmission corridor.
ZHANG Bei, XU Shuo, ZHONG Yanhui, CAI Hongjian, ZANG Quansheng, LI Xiaolong
Abstract: To address the issues of low detection accuracy and speed of ground-penetrating radar for loose defects under complex environmental conditions, a loose defect recognition method based on an improved YOLOv8 algorithm (YOLOv8-DN) was proposed. A DN module was designed and replaced C2f module, integrating a dynamic deformable convolution module and a multi-scale feature fusion module. The receptive fields of the dynamic deformable convolution kernels were adapted to accommodate the morphological complexity of defect features, while the multi-scale feature fusion path was employed to enhance the model′s ability to capture small and blurred defect regions. By replacing the original C2f module with the DN module, the recognition capability for complex defects was significantly improved, and computational overhead was effectively reduced. It was shown by experimental results that compared with the original YOLOv8 algorithm, the improved algorithm achieved a 5.29 percentage point increase in mAP, a 5.2 percentage point reduction in missed detection rate, and a 4.9 frame/ms improvement in inference speed. In addition, the integrity and accuracy of the detected mask regions were significantly enhanced, which validated the effectiveness and feasibility of the proposed algorithm and provided a novel solution for the rapid and precise detection of loose defects in the semi-rigid base layers of asphalt pavements.
BAO Tengfei1, CHENG Jianyue1, XING Yu2, ZHOU Xiwu3, CHEN Yuting1, ZHAO Xiangyu1
Abstract: In view of the difficulty of balancing convergence speed and nonlinear regression accuracy of existing inversion methods, a new method combining gradient descent (GD) and particle swarm optimization (PSO) for inversion analysis of initial in-situ stress field was proposed. In this method, the gravity field and five tectonic stress fields that affects the initial in-situ stress field were considered as eight basic boundary conditions. The stress values of measuring points with each boundary condition were calculated by using finite element software. The measured in-situ stress values were taken as the target values, and the influence coefficients of each boundary condition were obtained by regression analysis using GD-PSO algorithm. The regression stress values of each point of the model were calculated and input into the 3D finite element model as the initial stress field to balance the in-situ stress. The example analysis showed that compared with the calculation results of PSO algorithm, the cubic regression polynomial obtained by GD-PSO algorithm had the highest accuracy, and the mean square error was 0.579. The regression results fit well with the measured ground stress values. After the in-situ stress balance, except for the vertical stress value, the difference between the calculated in-situ stress value at the measurement point and the measured value was relatively small, and the directional displacement of surrounding rock was basically zero, and the maximum displacement was only 5.26 mm.
FANG Wei1,2,3,4, WANG Haoxi1
Abstract: Addressing the limitations of existing deep learning-based radar echo extrapolation algorithms in spatialtemporal feature extraction and long-term dependency modeling, a novel SRU-Former model that integrated spatialtemporal reconstruction unit (SRU) and Transformer was proposed for radar echo extrapolation. Firstly, a newly designed SRU was introduced into the model′s encoder and decoder to extract fine-grained spatiotemporal features from radar images via separation, transformation, and reconstruction. Secondly, a variant architecture model of Transformer, Poolformer, was introduced between the encoder and decoder, using global average pooling to replace the self-attention mechanism, thereby assisting the model in modeling highly dynamic radar sequences. Finally, SRU-Former was trained and tested on two meteorological radar datasets from Jiangsu Province and Shanghai City, respectively, and compared with current mainstream deep learning models. In the 2-hour extrapolation task, SRUFormer achieved optimal values in four metrics: CSI, FAR, MSE, and SSIM. Specifically, CSI was improved by 0.020 on the Jiangsu Province dataset and by 0.048 on the Shanghai City dataset. Experimental results showed that SRU-Former effectively improved model prediction accuracy, with more precise capture of strong echo regions and clearer detail textures in later extrapolation stages.
Copyright © 2023 Editorial Board of Journal of Zhengzhou University (Engineering Science)