Submitted:
02 September 2025
Posted:
03 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Experimental Set Up
3. CNN Training and Inference from DDX Images
3.1. CNN Training
3.1.1. Training Environment
3.1.2. Manual Labeling of DDX Images
- Attenuation value
- Negative pulse
- PD level value
- Positive pulse
- Voltage value
3.1.3. Dataset Setup and Training
- Training : 594 images (71%).
- Validation: 146 images (17%).
- Test: 100 images (12%).
3.1.4. Analysis of Loss Curves
Box Loss
- : Value of the CIoU loss. The goal of YOLOv8 is to minimize it.
- : Intersection over Union, Eq. 2. It measures the overlap between the predicted and actual boxes. Its value ranges from 0 with no overlap to 1 with perfect overlap. It is calculated as:
- b: Bounding box predicted by the model (coordinates xcenter, ycenter, width, height).
- : Real bounding box (ground truth) (coordinates xcenter_gt, ycenter_gt, width, height gt).
- : Squared euclidean distance between the center points of the predicted box b and the actual box bgt. represents the distance.
- c: Length of the diagonal of the smallest bounding box that completely encloses both b and bgt. Normalizes the distance penalty.
- : Positive weighting parameter that adjusts the importance of the aspect ratio consistency term.
- : Measure of the consistency of the aspect ratio between the predicted and real box. It is calculated through Eq. 3:
Classification Loss
- : Classification Loss value.
- : Ground truth label for class i. =1 if the object belongs to class i , =0 if it does not.
- : Probability predicted by the model that the object belongs to class i. It is the result of a sigmoid function with value in [0,1].
- : Natural logarithm.

3.1.4.3. Distribution Focal Loss
- : DFL value.
- : Continue Ground truth coordinate of a box edge.
- : Label of the discrete container immediately to the left of y.
- : Label of the discrete container immediately to the right of y.
- : Probability predicted by the model for container
- : Probability predicted by the model for container .
- The terms and act as weights.
3.1.5. Performance Metrics
Precision and Recall in Training
3.1.5.2. mAP in the Validation
- : AP for a specific class.
- : Index of predictions ordered by confidence (from highest to lowest).
- : Total number of thresholds or data points considered.
- : Accuracy calculated at the kth Recall point or by considering the k highest confidence detections.
- : Change in Recall from point (k-1) to point k (i.e., .
3.2. Confusion Matrix
3.2.1. Confusion Matrix on the Validation Set
3.2.2. Confusion Matrix on the Test Set
3.3. Inference in Operational Scenarios
3.3.1. Inference Flowchart
- Startup and configuration: the process begins by loading user-defined configurations, such as the input video and YOLOv8 model paths, confidence thresholds, and a list of interest classes that will trigger optical character recognition (OCR).
- Engine loading: the two main inference engines, the YOLOv8 object detection model and the Python EasyOCR OCR engine, are initialized and loaded into memory. This loading is performed only once at startup to optimize system performance. The number of GPUs to be used is also determined.
- Opening files: the input video stream is opened and the output files are created, including the new video with the visual annotations and the text file that will record its detailed data.
- YOLOv8 inference: the current frame is fed into the YOLOv8 model, which identifies and locates all classes of interest that exceed the confidence threshold, returning their bounding boxes, class labels, and confidence scores.
- Detection loop: the system iterates through each of the detections found in the frame.
- OCR class certification: for each detection, a decision is made based on its class label. If the class is predefined as an OCR target —pd_level_value, voltage_value, and attenuation_value— the system proceeds with OCR inference.
-
OCR inference: this critical step extracts the quantitative data:
- a.
- ROI cropping: the exact portion of the image contained within the detection bounding box is extracted from the frame.
- b.
- OCR application: the OCR engine analyzes this small ROI to recognize the textual information present.
- c.
- Value interpretation: the extracted text is processed to convert it into a numerical value.
- Output log: all detection data is logged. Bounding boxes and corresponding labels —confidence and OCR value, if applicable— are drawn on the output video frame. Detailed information about each detection, including the numerical value analyzed by the OCR, is added as a new line to the text file.
3.3.2. Results and Discussion
- At 10 kV, the model detects moderate discharge activity, with well-defined but relatively compact green clusters of negative and magenta positive pulses. This corresponds to an instrumental reading of PD Level 0.426 nC and Voltage 10.1 kV in Figure 17a.
- At 13 kV, with increasing voltage, a significant increase in the density and spatial extent of detections is observed. Both the negative and positive cumulative pulses are visibly larger and denser. This increased visual activity directly correlates with the increased discharge level measured by the instrument, which now shows PD Level 0.701 nC and Voltage 13.1 kV in Figure 17b.
- At 16 kV, the phenomenon intensifies dramatically. The cumulative image shows a much larger and more saturated area of activity, indicating a very severe PD regime. This exponential increase in visual activity is consistent with the instrumental reading, which reaches a PD Level of 3.88 nC and a Voltage of 16.6 kV, as shown in Figure 17c.
4. CNN Training and Inference from HQ Images
4.1. CNN Training
4.1.1. Semi-Automatic Generation of the Dataset
Configuration and Initialization
PD Detection and Extraction
- Background establishment: the first frame of the video is assumed to represent the static background of the scene. This frame is converted to grayscale and stored for reference.
- Background subtraction: for each subsequent frame, the absolute difference with the background frame is calculated. The result is an image that highlights only the regions where changes have occurred (i.e., new PD).
- Thresholding and morphological cleaning: the resulting image is binarized using a fixed threshold to convert subtle changes into well-defined, white-on-black regions. A morphological operation is then applied to remove noise.
- Contour detection: on this last image, the OpenCV Python library algorithm [19] is applied to determine the contours of all the change regions. Each contour represents a candidate PD.
Filtering, Validation and Data Collection
- Minimum area filter: contours with an area smaller than a predefined threshold of 5 pixels are discarded to remove residual noise.
- Manual exclusion filter: the contour centroid is calculated. If this centroid falls within any of the manual exclusion zones defined in the configuration, the contour is classified as a false positive and discarded.
- Data collection: if a contour passes both of the above filters, it is considered a valid PD.
- For each valid PD, the following is extracted and stored:
- The bounding box.
- The centroid coordinates, area and average RGB color intensity in a .dat text file for further analysis.
- A copy of the original, unprocessed frame and the list of bounding boxes for all valid events found are saved. This pair (image, labels) is the input data in YOLOv8 format.
Generating the Dataset in YOLOv8 Format
- Once the entire video has been processed, the script uses the collection of frames with valid PDs to build the final dataset and the following steps are performed:
- Directory structuring: a folder structure compatible with YOLOv8 framework is created with the subdirectories train, valid, and test, each containing folders for images and labels.
- Data splitting: the data collection —images and their labels— is randomly shuffled and split into training, validation, and test sets according to the ratios defined above.
- File generation for each image: the original image is saved as an image_name.jpg in the corresponding images folder.
- An image_label.txt file is created in the corresponding labels folder. Within this file, each line represents an event detected in that image, in the format: [class_index, x_center_norm, y_center_norm, width_norm, height_norm]. All bounding box coordinates are normalized by dividing them by the frame width and height dimensions, as required by YOLOv8.
- Configuration file (data.yaml): finally, a data.yaml file is generated at the root of the dataset. This file is essential for YOLOv8 to locate the datasets and identify the number of classes and their names.
Summary Flowchart of the Process
- Setup and loading: in this initial phase, all resources are prepared. The script reads the file paths, uploads the video, and manually defines exclusion zones, which are key to filtering out known false positives.
- Video processing loop: this is the core of the script. It operates frame by frame, performing two main tasks in sequence:
- 3.
- YOLOv8 dataset generation: once the entire video has been analyzed, this final phase takes all the valid data collected and organizes it into the folder structure and file formats required by YOLOv8. This includes splitting the data into training/validation/test sets, normalizing the coordinates, and creating the .yaml configuration file.
4.1.2. Training Results
- Training set: 2,967 images.
- Validation set: 982 images.
- Test set: 508 images.
4.2. Inference in Operational Scenarios
- a)
- Correlation between voltage and discharge activity: there is a clear relationship between the voltage applied to the electrodes and the number of detected PDs. At 10 kV, 1,582 PDs were accumulated (Figure 23a). As the voltage is increased to 13 kV, the activity increases significantly, recording 2,050 PDs (Figure 23b). However, at 16 kV, the total number of detected PD drops slightly to 1,981 (Figure 23c). A reasonable hypothesis for this small decrease is that at higher energies the PDs are larger and may merge, being detected by the model as a single PD with a larger area instead of multiple smaller PDs.
- b)
- Spatial expansion of activity: the scatter plots shown in Figure 24, Figure 25 and Figure 26 visually confirm that the area of discharge activity expands with increasing voltage. The cluster of points, initially highly concentrated in the dielectric space at 10 kV, expands both vertically and horizontally at 13 kV and, more pronouncedly, at 16 kV. This suggests that at higher voltage levels in the dielectric, PDs are not only more frequent but also occupy a larger volume.
- c)
-
Increasing the detection area and correlation with PD confidence: the most revealing analysis comes from the direct comparison between the area and confidence of the PD in Figure 24, Figure 25 and Figure 26:
- Area distribution (Figure 24a, Figure 25a and Figure 26a): at 10 kV, the vast majority of PDs are small in area (blue and green dots). At 13 kV, a slight increase in the average area is observed. The change is important at 16 kV, where a significant presence of large-area PDs appears, represented by yellow and orange colors.
- Confidence distribution (Figure 24b, Figure 25b and Figure 26b): complementarily, the analysis of detection confidence provides a new layer of information. A strong positive correlation is observed between the area of a PD and the confidence with which it is detected. Larger PDs with warm colors in Figure 24a, Figure 25a and Figure 26a consistently correspond to high-confidence detections, with warm colors close to 1.0 in Figure 24b, Figure 25b and Figure 26b. This is physically consistent. Larger and more energetic PDs are visually clearer and therefore more confidently identified by the model. Conversely, low-confidence points –cool colors in Figure 24b, Figure 25b and Figure 26b– tend to correspond to smaller PDs, which are harder to distinguish from background noise.
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Saravanan, B.; D, P. K. M.; Vengateson, A. Benchmarking Traditional Machine Learning and Deep Learning Models for Fault Detection in Power Transformers, 2025. https://arxiv.org/abs/2505.06295.
- Zhang, R.; Zhang, Q.; Zhou, J.; Wang, S.; Sun, Y.; Wen, T. Partial Discharge Characteristics and Deterioration Mechanisms of Bubble-Containing Oil-Impregnated Paper. IEEE Trans. Dielectr. Electr. Insul. 2022, 29, 1282–1289. [Google Scholar] [CrossRef]
- IS IEC 60270:2000AMD1:2015 CSV; High-Voltage Test Techniques—Partial Discharge Measurements. Edition 3.1, 2015–11. Consolidated Version. International Electrotechnical Commission: Geneva, Switzerland, 2015.
- Thobejane, L. T.; Thango, B. A. Partial Discharge Source Classification in Power Transformers: A Systematic Literature Review. Appl. Sci. 2024, 14. [Google Scholar] [CrossRef]
- Madhar, S. A.; Mor, A. R.; Mraz, P.; Ross, R. Study of DC Partial Discharge on Dielectric Surfaces: Mechanism, Patterns and Similarities to AC. Int. J. Electr. Power Energy Syst. 2021, 126, 106600. [Google Scholar] [CrossRef]
- Wotzka, D.; Sikorski, W.; Szymczak, C. Investigating the Capability of PD-Type Recognition Based on UHF Signals Recorded with Different Antennas Using Supervised Machine Learning. Energies 2022, 15. [Google Scholar] [CrossRef]
- Sikorski, W. Development of Acoustic Emission Sensor Optimized for Partial Discharge Monitoring in Power Transformers. Sensors 2019, 19. [Google Scholar] [CrossRef] [PubMed]
- Shahsavarian, T.; Pan, Y.; Zhang, Z.; Pan, C.; Naderiallaf, H.; Guo, J.; Li, C.; Cao, Y. A Review of Knowledge-Based Defect Identification via PRPD Patterns in High Voltage Apparatus. IEEE Access 2021, 9, 77705–77728. [Google Scholar] [CrossRef]
- Khan, M. A. M. AI AND MACHINE LEARNING IN TRANSFORMER FAULT DIAGNOSIS: A SYSTEMATIC REVIEW. Am. J. Adv. Technol. Eng. Solut. 2025, 1, 290–318. [Google Scholar] [CrossRef]
- Khaleghi, B.; Khamis, A.; Karray, F. O.; Razavi, S. N. Multisensor Data Fusion: A Review of the State-of-the-Art. Inf. Fusion 2013, 14, 28–44. [Google Scholar] [CrossRef]
- Deng, X.; Jiang, Y.; Yang, L. T.; Lin, M.; Yi, L.; Wang, M. Data Fusion Based Coverage Optimization in Heterogeneous Sensor Networks: A Survey. Inf. Fusion 2019, 52, 90–105. [Google Scholar] [CrossRef]
- Xing, Z.; He, Y. Multi-Modal Information Analysis for Fault Diagnosis with Time-Series Data from Power Transformer. Int. J. Electr. Power Energy Syst. 2023, 144, 108567. [Google Scholar] [CrossRef]
- Guo, J.; Zhao, S.; Huang, B.; Wang, H.; He, Y.; Zhang, C.; Zhang, C.; Shao, T. Identification of Partial Discharge Based on Composite Optical Detection and Transformer-Based Deep Learning Model. IEEE Trans. Plasma Sci. 2024, 52, 4935–4942. [Google Scholar] [CrossRef]
- Yin, K.; Wang, Y.; Liu, S.; Li, P.; Xue, Y.; Li, B.; Dai, K. GIS Partial Discharge Pattern Recognition Based on Multi-Feature Information Fusion of PRPD Image. Symmetry 2022, 14. [Google Scholar] [CrossRef]
- Abubakar, A.; Zachariades, C. Phase-Resolved Partial Discharge (PRPD) Pattern Recognition Using Image Processing Template Matching. Sensors 2024, 24. [Google Scholar] [CrossRef] [PubMed]
- Dempster, A. P. Upper and Lower Probabilities Induced by a Multivalued Mapping. In Classic Works of the Dempster-Shafer Theory of Belief Functions; Yager, R. R., Liu, L., Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg, 2008. [Google Scholar] [CrossRef]
- Sentz, S., K. &. Ferson. Combination of Evidence in Dempster-Shafer Theory, 2002. https://www.stat.berkeley.edu/~aldous/Real_World/dempster_shafer.pdf. (accessed on 21 July 2025).
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection, 2016. https://arxiv.org/abs/1506.02640.
- Opencv-Python 4.12.0.88. https://pypi.org/project/opencv-python/. (accessed on 21 July 2025).
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger, 2016. https://arxiv.org/abs/1612.08242.
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement, 2018. https://arxiv.org/abs/1804.02767.
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y. M. YOLOv4: Optimal Speed and Accuracy of Object Detection, 2020. https://arxiv.org/abs/2004.10934.
- Ultralytics. Ultralytics/Yolov5: V7.0 - YOLOv5 SOTA Realtime Instance Segmentation, 2022. [CrossRef]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; Li, Y.; Zhang, B.; Liang, Y.; Zhou, L.; Xu, X.; Chu, X.; Wei, X.; Wei, X. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications, 2022. https://arxiv.org/abs/2209.02976.
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y. M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors, 2022. https://arxiv.org/abs/2207.02696.
- Jocher, G.; Qiu, J.; Chaurasia, A. Ultralytics YOLO, 2023. https://github.com/ultralytics/ultralytics.
- Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Alif, M. A. R.; Hussain, M. YOLOv1 to YOLOv10: A Comprehensive Review of YOLO Variants and Their Application in the Agricultural Domain, 2024. https://arxiv.org/abs/2406.10139.
- Kang, S.; Hu, Z.; Liu, L.; Zhang, K.; Cao, Z. Object Detection YOLO Algorithms and Their Industrial Applications: Overview and Comparative Analysis. Electronics 2025, 14, 1104. [Google Scholar] [CrossRef]
- Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-Time Flying Object Detection with YOLOv8, 2024. https://arxiv.org/abs/2305.09972.
- Monzón-Verona, J. M.; González-Domínguez, P.; García-Alonso, S. Characterization of Partial Discharges in Dielectric Oils Using High-Resolution CMOS Image Sensor and Convolutional Neural Networks. Sensors 2024, 24. [Google Scholar] [CrossRef] [PubMed]
- Raspberry Pi HQ Camera. Available online: https://www.raspberrypi.com/documentation/accessories/camera.html#hq-camera (accessed on 17 June 2025).
- Rasband, W. ImageJ, 1997. https://imagej.net/ij/. (accessed on 21 July 2025).
- Roboflow. Available online: https://www.roboflow.com (accessed on 21 July 2025).
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
- Janocha, K.; Czarnecki, W. M. On Loss Functions for Deep Neural Networks in Classification, 2017. https://arxiv.org/abs/1702.05659.
- Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection, 2020. https://arxiv.org/abs/2006.04388.
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. L. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, 2014; pp. 740–755. [Google Scholar]
- Che, Q.; Wen, H.; Li, X.; Peng, Z.; Chen, K. P. Partial Discharge Recognition Based on Optical Fiber Distributed Acoustic Sensing and a Convolutional Neural Network. IEEE Access 2019, 7, 101758–101764. [Google Scholar] [CrossRef]
- Rodgers, J. L.; Nicewander, W. A. Thirteen Ways to Look at the Correlation Coefficient. Am. Stat. 1988, 42, 59–66. [Google Scholar] [CrossRef]

























| Attribute 1 | Attribute 2 | Coefficient (r) |
|---|---|---|
| Strong positive correlations (r > 0.7) | ||
| ocr_voltage | ocr_pd_level | 0.90 |
| num_pulse_negatives | ocr_voltage | 0.77 |
| Area | Height | 0.90 |
| Area | Width | 0.78 |
| CenterX | CenterY | 0.77 |
| Significant negative correlations (r < -0.3) | ||
| ocr_voltage | Width | -0.41 |
| ocr_pd_level | Width | -0.39 |
| num_pulsos_negativos | Width | -0.34 |
| Other moderate positive correlations (0.5 < r < 0.7) | ||
| num_pulse_negatives | ocr_pd_level | 0.59 |
| Confidence | Width | 0.55 |
| Area | Confidence | 0.53 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).