Submitted:
18 December 2025
Posted:
19 December 2025
You are already at the latest version
Abstract
Keywords:
Introduction
Literature Review
Workflow Figure
Datasets Metadata & Data Cleaning
Exploratory Data Analysis (EDA)
Modeling and Methodology
Tools and Technologies
Isolation Forest
Anomaly Analysis and Reporting on 2024 Data
Anomalies by Customers – 2024 Data
Anomalies by Shipping Point – 2024 Data
Anomalies by Destination Country – 2024 Data
Model Accuracy Evaluation Using 2024 Data
Contamination Severity Analysis
Predicting Anomalies in 2025 Using the Trained 2024 Isolation Forest Model
Overview of Predicted 2025 Anomalies
Root Cause Analysis of 2025 Shipment Anomalies
Shipping-Point Contribution to 2025 Anomalies
Distribution Country Contribution to 2025 Anomalies
One-Class SVM
One-Class SVM Modeling Approach
One Class SVM - Root Cause Analysis of 2025 Shipment Anomalies
Model Comparison
Conclusions
Bias, Limitations, and Ethical Considerations
Future Work and Practical Recommendations
Acknowledgments
Appendix A. Data Dictionary
| Variable Name | Description | Type | Unit / Format | Role in Analysis |
| Requested Ship Date | Date the customer requested the shipment to be sent | Date Time | YYYY-MM-DD | Used to calculate Delivery Delay |
| Del.Actual GI Date | Actual date when shipment was picked up from the Shipping Point | Date Time | YYYY-MM-DD | Used in calculating Transit Time |
| Del.Delivery Date | Actual Delivery Date of the order | Date Time | YYYY-MM-DD | Used to calculate total Lead Time |
| Shipping Point (Location) | Location or warehouse where the order originated | Categorical | String | Used for performance grouping |
| Product Line | Category or type of product shipped | Categorical | float64 | Used to detect anomalies by product segment |
| Order Quantity | Number of items requested per order | Numerical | float64 | Predictor variable for delivery performance |
| Delivery Delay = Del.Delivery Date − Requested Ship Date | Number of days between Requested Ship Date and actual delivery | Numerical | Day | Derived Feature |
| On-Time Flag = 1 if Delivery Delay ≤ 0 else 0 | Indicator showing whether the delivery was on time | Binary | 1=On Time; 0 = Delayed |
Derived Feature–Only for EDA Analysis. |
| In-Full Flag = 1 if Delivered Quantity ≥ Requested Quantity else 0 | Indicates if the order was fulfilled completely or shipped partially. | Binary | 1 = Complete, 0 = Partial | Derived Feature- Quality performance indicator |
| In-Full Rate = mean(In-Full Flag) × 100 per Customer or Location | Percentage of orders delivered in full for each period | Numeric | Percentage (%) | Derived Feature- KPI for fulfillment performance |
| Order Frequency = count(Orders per Customer per Month) | Number of orders placed by a customer within a time period | Numeric | Integer (Count per month) | Derived Feature- Behavioral variable for customer demand |
| Shipping Point Performance= ((On_Time_Flag & In_Full_Flag).sum() / len(x)) * 100 | calculated as the percentage of orders that were both on-time and complete | Numeric | Percentage (%) | Derived Feature- Only for EDA Analysis. It is not being used in the model to avoid data leakage. |
| Lead Time = Del.Actual GI Date − Requested Ship Date | Measures time from request to shipment; identifies early process delays. | Numeric | Integer | Derived Feature- Predictor variable |
| Transit Time = Del.Delivery Date − Del.Actual GI Date | Measures shipment transport efficiency. | Numeric | Integer | Derived Feature- Predictor variable |
| Delay Severity Score= Delivery Delay / mean(Delivery Delay) | Quantifies the severity of each delay relative to average performance. | Numeric | Float | Derived Feature |
| Fulfillment Gap = Requested Quantity − Delivered Quantity | Identifies shortages or under-delivery cases. | Numeric | Integer | Derived Feature- Predictor variable |
| Reliability= ((on_time_sum + in_full_sum) / (2 * total_orders)) * 100 | # of orders both in Full and On Time/Shipping Point | Numeric | Percentage | Derived Feature- Only for EDA Analysis. It is not being used in the model to avoid data leakage. |
| Year | Year of the data | Categorical | Integer | Derived Feature |
| Del.Planned Goods Issue Date | Planned PGI date (based on Delivery Date) | Date Time | YYYY-MM-DD | Exploratory Data Analysis |
| Total Requested - $ Value of Line Items | Monetary value of each ordered line item | Numeric | USD | Used in EDA to explore order value patterns |
| Del.Act.delivery qty | Actual Delivered Quantity per delivery document | Numeric | Units | Used to compute In-Full Flag and Fulfillment Gap |
| Del.Net Shipments | Net shipped quantity from delivery record | Numeric | Units | EDA variable for shipment completeness |
| Distribution Channel | Sales Distribution Channel | Categorical | Numeric/Text | EDA segmentation variable, not used in model |
| Item category | Classification of product item | Categorical | Text | EDA variable for product-level grouping |
| ABC indicator | ABC classification (A/B/C categorization) | Categorical | Text | EDA variable for product segmentation |
| Ship to Country | Delivery destination country | Categorical | Text | Used for anomaly analysis and country contribution charts |
| ExtMatGroup4.ExtMatGroup | Extended material group | Categorical | Text | EDA product grouping field |
| HigherLvlCust5.HigherLvlCust | Higher-level customer group | Categorical | Text | EDA grouping by customer hierarchy |
| Material3.Material.1 | Material identifier | Categorical | Text | Item-level differentiation |
| ProductFamily1.ProductFamily | Product family classification | Categorical | Text | EDA segmentation field |
| ProductSubFamily2.ProductSubFamily | Product sub-family classification | Categorical | Text | EDA segmentation field |
| ShippingPoint6.ShippingPoint | Shipment origin point | Categorical | Text | Used extensively for anomaly contribution analysis |
| Del.Created On Date | Delivery creation date | Datetime | YYYY-MM-DD | Used to validate delivery flow in the dataset |
| SoldTo | Numeric customer identifier | Categorical Identifier | Integer like ID | Used to group orders by customer, analyze customer-level anomalies, and check repeat patterns |
| Sold-to Party | Alternative customer identifier version; used in the sales order header | Categorical | Text / ID | Used to link delivery records back to sales orders and validate customer grouping |
| Reason for Rejection | SAP field indicating why an order line was rejected (if applicable) | Categorical | Text code | Helps identify incomplete or invalid orders; excluded rows during preprocessing |
Appendix B. Code Repo
References
- Ajeigbe, K., & Moore, J. (2023). AI-based anomaly detection in supply chain processes. ResearchGate. https://www.researchgate.net/publication/390311901_AI-Based_Anomaly_Detection_in_Supply_Chain_Processes.
- Amellal, A., Boukachab, A., Cherkaoui, A., & Boukachab, F. (2023). Improving lead time forecasting and anomaly detection for automotive spare parts with a combined CNN-LSTM approach. Operations and Supply Chain Management Journal, 16(2). https://www.researchgate.net/publication/371677051_Improving_Lead_Time_Forecasting_and_Anomaly_Detection_for_Automotive_Spare_Parts_with_A_Combined_CNN-LSTM_Approach.
- de Sousa, D. G. (2022). Using machine learning to predict on-time delivery (Master’s thesis, Metropolia University of Applied Sciences). Metropolia Repository. https://www.theseus.fi/bitstream/handle/10024/784410/GuimaraesdeSousa_Debora.pdf?sequence=2&isAllowed=y.
- Erfani, S. M., Rajasegarar, S., Karunasekera, S., & Leckie, C. (2016). High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 58, 121–134. [CrossRef]
- Gali, J. S., Molavi, N., & Alavi, S. (2025). Predicting global healthcare supply chain delays: A machine learning approach leveraging country-level logistics metrics. Journal of International Technology and Information Management, 34(1), 64–77. https://scholarworks.lib.csusb.edu/jitim/vol34/iss1/3/.
- Gillespie, J., Gutierrez, L., Khatri, M., & Singh, P. (2023). Real-time anomaly detection in cold chain transportation using IoT technology. Sustainability, 15(3), 2255. [CrossRef]
- Glaser, A. E., Harrison, J. P., & Josephs, D. (2022). Anomaly detection methods to improve supply chain data quality and operations. Journal of International Technology and Information Management. https://scholar.smu.edu/cgi/viewcontent.cgi?article=1211&context=datasciencereview.
- Goyal, M. K., Gadam, H., & Sundaramoorthy, P. (2023). Real-time supply chain resilience: Predictive analytics for global food security and perishable goods. Journal of Information Systems Engineering and Management, 8(3). https://www.jisem-journal.com/download/22_Real-Time_Supply_Chain_Resilience.pdf.
- Khajjou, Y. (2023). Anomaly detection in inventory management using machine learning (Undergraduate thesis, Al Akhawayn University). https://cdn.aui.ma/sse-capstone-repository/pdf/sprin-2023/ANOMALY_DETECTION_IN_INVENTORY_MANAGEMENT_USING_MACHINE_LEARNING.pdf.
- Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation Forest. In 2008 Eighth IEEE International Conference on Data Mining (pp. 413–422). IEEE. [CrossRef]
- Ok, E., Aria, J., Jose, D., & Diego, C. (2025). Ethical considerations and challenges of AI in supply chain management: Definition of AI in supply chain management (SCM). ResearchGate. https://www.researchgate.net/publication/389255282_Ethical_Considerations_and_Challenges_of_AI_in_Supply_Chain_Management_Definition_of_AI_in_Supply_Chain_Management_SCM.
- Rokoss, A., Syberg, M., Tomidei, L., Hülsing, C., Deuse, J., & Schmidt, M. (2024). Case study on delivery time determination using a machine learning approach in small batch production companies. Journal of Intelligent Manufacturing, 35(8), 3937–3958. [CrossRef]
- Schmidl, S., Wenig, P., & Papenbrock, T. (2022). Anomaly detection in time series: A comprehensive evaluation. Proceedings of the VLDB Endowment, 15(9), 1779–1797. [CrossRef]
- Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471. [CrossRef]
- Shahid, S. (2020). Predicting delays in delivery process using machine learning-based approach (Master’s thesis, Purdue University). [CrossRef]
- Xie, Z., Long, H., Ling, C., Zhou, Y., & Luo, Y. (2025). An anomaly detection scheme for data stream in cold chain logistics. PLOS ONE, 20(3), e0315322. [CrossRef]
- Xu, Y. (2021). Improved isolation forest algorithm for anomaly test data detection. ResearchGate. https://www.researchgate.net/publication/353949254_Improved_Isolation_Forest_Algorithm_for_Anomaly_Test_Data_Detection.
- Yokkampon, U., Chumkamon, S., Mowshowitz, A., Fujisawa, R., & Hayashi, E. (2021). Anomaly detection using support vector machines for time series data. Journal of Robotics, Networking and Artificial Life, 8(1), 41–46. https://par.nsf.gov/servlets/purl/10289945.

















| Step | Steps in This Research | Description |
|---|---|---|
|
Identify operational anomalies that contribute to shipment delays and partial fulfillment within B2B logistics. | Established the business problem and explained how anomaly detection supported improved delivery performance. |
|
Data Sources & Integration | Outlined the data structure and how it was prepared for analysis. |
| Scope Selection (2024) | Although EDA used 2020–2025 data to understand long-term patterns, model training was intentionally restricted to 2024, and 2025 data for testing. | |
| Metadata & Variable Definition | Defined variables and their roles used in EDA and model training. | |
| Exploratory Data Analysis | Examined variable distributions, correlations, and data patterns using visualizations. | |
|
Cleaning & Processing | Conducted data quality check and made sure the data is clean. |
| Feature Engineering | Created derived variables. | |
|
Model Development | Trained Isolation Forest and One Class SVM models |
|
Evaluate Results | Compared and visualized model results. |
|
Root Cause and business interpretation | Analyzed anomaly trends to uncover potential process inefficiencies and improvement opportunities. |
| Variable Name | Description | Type | Unit | Role in Analysis |
|---|---|---|---|---|
| Distribution Channel | Sales Distribution Channel representing various channels of movement (wholesale, distributor, direct, etc.) | Categorical | Code | Used to detect channel-specific delivery behavior patterns. |
| Number of units requested in the sales document order line. | Number of units requested in the sales document order line. | Numerical | Float | Core predictor of shipment complexity and processing load. |
| Del.Net Shipments | Net quantity shipped according to the delivery record. | Numerical | Units | Helps detect under-shipments and partial fulfillment behavior. |
| SoldTo | Customer identifier used for grouping deliveries. | Categorical Identifier | Integer | Captures customer-specific ordering behavior and recurring anomaly patterns. |
| Source_Sheet | Indicates the source dataset or batch from which the record originated. | Categorical | Text | Tracks data origin to ensure consistency across merged datasets. |
| Lead Time* | Number of days between Requested Ship Date and actual goods issue date. | Numerical | Days | Predictor capturing early-process delay dynamics before shipment. |
| Transit Time* | Number of days between actual goods issue date and actual delivery. | Numerical | Days | Predictor reflecting transportation efficiency and logistics delays. |
| Order_Frequency* | Count of sales orders placed by a customer in a given month, based on the Sales Document Creation Date. | Numerical | Integer | Behavioral predictor capturing demand intensity and ordering patterns. |
| Fulfillment Gap* | Difference between Requested Quantity and Delivered Quantity. | Numerical | Integer | Identifies shortages and partial deliveries contributing to delays. |
| Row Count | Mean | STD | MIN | MAX | |
|---|---|---|---|---|---|
| Delivery Delay | 1,745,869 | 6.0 | 20.0 | 0.0 | 416.0 |
| Transit Time | 4,710,125 | 2.0 | 7.0 | 0.0 | 369.0 |
| Lead Time | 1,746,055 | 6.5 | 20.0 | 0.0 | 375.0 |
| Count | Mean | STD | Min | 25% | 50% | 75% | Max |
|---|---|---|---|---|---|---|---|
| 15,501 | -0.07 | 0.06 | -0.27 | -0.10 | -0.05 | -0.02 | 0 |
| Shipping Point | Anomaly Concentration % |
|---|---|
| ShippingPoint1 | 23.2 |
| ShippingPoint25 | 16.3 |
| ShippingPoint17 | 16.0 |
| ShippingPoint7 | 12.1 |
| ShippingPoint31 | 9.9 |
| ShippingPoint2 | 8.8 |
| ShippingPoint27 | 4.6 |
| ShippingPoint4 | 4.0 |
| ShippingPoint12 | 1.1 |
| ShippingPoint19 | 1.0 |
| ShippingPoint15 | 0.6 |
| Other | 2.4 |
| Predicted Normal | Predicated Anomaly | |
|---|---|---|
| Actual Normal | 279,014 (TN) | 7,379 (FP) |
| Actual Anomaly | 15,498 (FN) | 8,122 (TP) |
| Predicted Normal | Predicated Anomaly | |
|---|---|---|
| Actual Normal | 235,799 (TN) | 10,803 (FP) |
| Actual Anomaly | 29,145 (FN) | 9,683 (TP) |
| Shipping Point | 2024 Anomaly Concentration % | 2025 Anomaly Concentration % |
|---|---|---|
| ShippingPoint11 | 0.4 | 24.8 |
| ShippingPoint1 | 23.2 | 24.1 |
| ShippingPoint17 | 16.0 | 11.0 |
| ShippingPoint31 | 9.9 | 9.5 |
| ShippingPoint2 | 8.8 | 8.9 |
| ShippingPoint25 | 16.3 | 8.0 |
| ShippingPoint27 | 4.6 | 5.2 |
| ShippingPoint7 | 12.1 | 5.0 |
| ShippingPoint27 | 4.6 | 5.2 |
| ShippingPoint4 | 4.0 | 1.5 |
| ShippingPoint12 | 1.1 | 0.6 |
| ShippingPoint19 | 1.0 | 0.4 |
| ShippingPoint15 | 0.6 | 0.4 |
| Other | 2.0 | 0.6 |
| Shipping Point | Percentage Isolation Forest% | Percentage One Class SVM% |
|---|---|---|
| ShippingPoint11 | 25 | 4 |
| ShippingPoint17 | 11 | 1 |
| ShippingPoint1 | 24 | 8 |
| ShippingPoint25 | 8 | 5 |
| ShippingPoint31 | 9 | 1 |
| ShippingPoint7 | 5 | 1 |
| ShippingPoint2 | 9 | 22 |
| ShippingPoint27 | 5 | 54 |
| ShippingPoint4 | 2 | 1 |
| ShippingPoint19 | 1 | 1 |
| ShippingPoint12 | 1 | 2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).