Submitted:
11 December 2025
Posted:
14 December 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- examine the semantic, procedural, and computational challenges that shape current verification workflows, focusing on the dominance of code-related errors and the limitations of existing software tools;
- design and iteratively refine a GPT-based assistant that operationalises verification rules through deterministic Python-executed logic, including exact-match diagnostics for identifying mis-typed or mis-assigned code; and
- establish a transparent, audit-ready framework for LLM-supported verification, demonstrating how LLMs can be constrained to behave as dependable rule-execution systems rather than probabilistic inference engines.
2. Background
2.1. Regulatory Context and the Practice of Cost Verification in Vietnam
2.2. Nature of Direct Cost Verification Workflows
- If an exact match is found, the system classifies the item as a typographical or mis-assigned code and suggests the correct code to replace the erroneous one.
- If no exact match exists, the item is classified as TT. TT items are routed to the TT verification pathway, where compliance depends on human evaluation of supporting price quotations (which the system reports but does not validate).
2.3. Key Challenges in Code-Centred Verification
2.4. Why Determining the Intended Code Matters
3. Literature Review
3.1. Cost Verification and Norm Alignment: The Central Role of Code Correctness
3.2. Digital and AI Approaches for Code Matching and Numerical Similarity
3.3. Large Language Models and Their Potential for Structured Verification Tasks
- checking whether entries align with external rules or normative datasets;
- identifying inconsistencies in tabular values;
- verifying multi-component numerical structures; and
- generating transparent explanations that reflect human-like verification logic.
3.4. Research Gap
4. Methodology: Action Research Design
4.1. Rationale for Choosing Action Research
4.2. Planning Phase
4.3. Acting Phase
4.4. Observing Phase
4.5. Reflecting Phase: Refining Verification Rules and Reasoning Patterns
4.6. Summary of Action Research Cycles
5. System Implementation
5.1. Data Extraction and Normalisation
5.2. Deterministic Verification Operations
5.3. Exact-Match Detection for Mis-Typed or Mis-Assigned Codes
5.4. Non-Listed Items Classification and Structured Verification Output
6. Results
6.1. Dataset and Test Setup
6.2. Deterministic Verification Performance
- Valid Code - Full Normative Match,
- Valid Code - UoM Mismatch,
- Valid Code - Normative Price Mismatch, and
- Invalid or Non-existent Code.
6.3. Exact-Match Recovery Performance
6.4. Improvements Across Action Research Cycles
6.5. Practitioner-Facing Insights from Real Estimates
7. Discussion
7.1. Theoretical and Technical Contributions
7.2. Practical Implications for Cost Verification Practice
7.3. Limitations
7.4. Implications for Scaling and Future Deployment
8. Conclusions
References
- Prieto, S. A.; Mengiste, E. T.; García de Soto, B. Investigating the Use of ChatGPT for the Scheduling of Construction Projects. Buildings 13, 857. [CrossRef]
- Rane, N.; Choudhary, S.; Rane, J. Integrating ChatGPT, Bard, and leading-edge generative artificial intelligence in building and construction industry: applications, framework, challenges, and future scope. SSRN Electronic Journal 2023. [Google Scholar]
- Jelodar, M. B. Generative AI, Large Language Models, and ChatGPT in Construction Education, Training, and Practice. Buildings 15, 933. [CrossRef]
- Zhang, G.; Lu, C.; Luo, Q. Application of Large Language Models in the AECO Industry: Core Technologies, Application Scenarios, and Research Challenges. Buildings 15, 1944. [CrossRef]
- Sonkor, M. S.; García de Soto, B. Using ChatGPT in construction projects: unveiling its cybersecurity risks through a bibliometric analysis. International Journal of Construction Management 2025, 25, 741–749. [Google Scholar] [CrossRef]
- Elhag, T. M. S.; Boussabaine, A. H. Evaluation of construction cost and time attributes. In presented at the Proceedings of the 15th ARCOM Conference, Liverpool, UK, 1999-09-15. Available: (URL gốc của tài liệu nếu có). [Google Scholar]
- Love, P. E. D.; Ahiaga-Dagbui, D. D.; Irani, Z. Cost overruns in transportation infrastructure projects: Sowing the seeds for a probabilistic theory of causation. Transportation Research Part A: Policy and Practice 2016, 92, 184–194. [Google Scholar] [CrossRef]
- Law on Construction, Law No 50/2014/QH13 dated 18/6/2014, 2014.
- Construction Law revised, Law No. 62/2020/QH14 dated 17/06/2020, 2020.
- (2021). Decree No 10/2021/ND-CP dated February 9, 2021 on the management of construction investment costs.
- (2021a). Circular No 11/2021/TT-BXD dated August 31, 2021 guiding a number of provisions on determination and management of construction investment costs.
- Circular No 12/2021/TT-BXD dated August 31, 2021 on construction norms. 2021b. Available online: https://vbpl.vn/TW/Pages/vbpq-luocdo.aspx?ItemID=152371.
- Circular No 09/2024/TT-BXD dated August 30, 2024 amending and supplementing a number of construction norms issued under the Minister of Construction’s Circular No. 12/2021/TT-BXD of August 31, 2021. 2024. Available online: https://vbpl.vn/TW/Pages/vbpq-toanvan.aspx?ItemID=169706.
- Deza, J. I.; Ihshaish, H.; Mahdjoubi, L. A Machine Learning Approach to Classifying Construction Cost Documents into the International Construction Measurement Standard. arXiv 2022, arXiv:2211.07705. [Google Scholar] [CrossRef]
- Jafary, P.; Shojaei, D.; Rajabifard, A.; Ngo, T. AI-augmented construction cost estimation: an ensemble Natural Language Processing (NLP) model to align quantity take-offs with cost indexes. International Journal of Construction Management 2025, 1–19. [Google Scholar] [CrossRef]
- Nguyen, T.-Q.; Nguyen, V.-H.; Nguyen, T.-H. BIM-based automatic quantity take-off and construction cost estimation in construction projects. Journal of Construction, Ministry of Construction (Vietnam) 2021, 05-2021, 54–59. [Google Scholar]
- Niknam, M. A semantics-based approach to construction cost estimating. Doctor of Philosophy PhD Dissertation, Faculty of the Graduate School, Marquette University, Milwaukee, Wisconsin, USA, 2015. [Google Scholar]
- Wang, Y.; Deng, H.; Li, X.; Deng, Y. A review of natural language processing application in construction engineering (in Chinese). Journal of Graphics 2020, 41, 501–511. [Google Scholar]
- DutoanEtaVN. Which cost-estimating software is the most convenient to use? 8 November 2018. Available online: https://tinhte.vn/thread/nen-dung-phan-mem-du-toan-nao-tien-loi-nhat.2807546/.
- NguyenTheAnh. Note that when using the GXD Estimating software, the operations will be very smooth and error-free. 8 November 2020. Available online: https://dutoanduthau.com/chu-y-trong-thao-tac-phan-mem-du-toan-gxd-se-rat-muot-ma-khong-bi-loi-2.html?
- Huyen. Detailed review of the F1 cost-estimating software. Comparison between F1 and G8 estimating software. 8 November 2023. Available online: https://fastcons.fastwork.vn/review-chi-tiet-phan-mem-du-toan-f1/?
- Duong, V. D. Improving the Construction Cost Estimation Process at H.A.C Investment and Construction Consulting Joint Stock Company. Master, University of Transport Technology, 2023. [Google Scholar]
- Nguyen, T. H. T.; Pham, Q. T.; Hoang, K. V. T.; Vu, L. P.; Ha, T. H. Identifying factors affecting cost management of investment projects in construction of technical infrastructure under the public-private partnership (PPP) approach. Journal of Construction, Ministry of Construction (Vietnam) 2024, 11/2024, 92–99. [Google Scholar]
- Tang, S.; Liu, H.; Almatared, M.; Abudayyeh, O.; Lei, Z.; Fong, A. Towards automated construction quantity take-off: An integrated approach to information extraction from work descriptions. Buildings 2022, 12, 354. [Google Scholar] [CrossRef]
- Flyvbjerg, B.; Skamris Holm, M. K.; Buhl, S. L. What causes cost overrun in transport infrastructure projects? Transport reviews 2004, 24, 3–18. [Google Scholar] [CrossRef]
- Lee, G.; Lee, G.; Chi, S.; Oh, S. Automatic classification of construction work codes in bill of quantities of national roadway based on text analysis. Journal of Construction Engineering and Management 2023, 149, 04022163. [Google Scholar] [CrossRef]
- Monteiro; Martins, J. P. A survey on modeling guidelines for quantity takeoff-oriented BIM-based design. Automation in construction 2013, 35, 238–253. [Google Scholar] [CrossRef]
- Solihin, W.; Eastman, C. Classification of rules for automated BIM rule checking development. Automation in construction 2015, 53, 69–82. [Google Scholar] [CrossRef]
- Khosakitchalert, C.; Yabuki, N.; Fukuda, T. Automated modification of compound elements for accurate BIM-based quantity takeoff. Automation in Construction 2020, 113, 103142. [Google Scholar] [CrossRef]
- Liu, H.; Cheng, J. C. P.; Gan, V. J. L.; Zhou, S. A knowledge model-based BIM framework for automatic code-compliant quantity take-off. Automation in Construction 2022, 133, 104024. [Google Scholar] [CrossRef]
- Zhang, J.; El-Gohary, N. M. Semantic NLP-based information extraction from construction regulatory documents for automated compliance checking. Journal of computing in civil engineering 2016, 30, 04015014. [Google Scholar] [CrossRef]
- Akanbi, T.; Zhang, J. Automated design information extraction from construction specifications to support wood construction cost estimation. In Construction Research Congress; American Society of Civil Engineers Reston, VA, 2020; pp. 658–666. [Google Scholar]
- Pan, Y.; Zhang, L. Roles of artificial intelligence in construction engineering and management: A critical review and future trends. Automation in Construction 2021, 122, 103517. [Google Scholar] [CrossRef]
- Shamshiri; Ryu, K. R.; Park, J. Y. Text mining and natural language processing in construction. Automation in Construction 2024, 158, 105200. [Google Scholar] [CrossRef]
- Kemmis, S.; McTaggart, R.; Nixon, R. The action research planner: Doing critical participatory action research; Springer Science & Business Media, 2013. [Google Scholar]
- Susman, G. I.; Evered, R. D. An Assessment of the Scientific Merits of Action Research. Administrative Science Quarterly 1978, 23, 582–603. [Google Scholar] [CrossRef]
- Susman, G. I.; Evered, R. D. An assessment of the scientific merits of action research. Studi organizzativi 2023. [Google Scholar] [CrossRef]
- Nguyen, Q.; Nguyen, H. B.; Mai, V. Developing BIM Objects Libraries for Provision of BIM Services: An Action Research. International Journal of Sustainable Construction Engineering and Technology 2024, 15, 105–117. [Google Scholar] [CrossRef]



| # | Metric | Value |
| 1 | Number of estimates | 20 |
| 2 | Total work-item rows | 16,100 |
| 3 | The header rows appear across rows … to …. | 4-7 |
| 4 | Range of rows per estimate | 27 - 4,677 |
| 5 | TT items | 222 |
| 6 | Variants in the Header configurations | 35 |
| 7 | Subtotal / non-data rows | 39 |
| 8 | Variants in the Code-format | 18 |
| 9 | Variants in the UoM-format | 12 |
| 10 | Variants in the formatting and labeling of MTR - LBR - MCR price fields | 9 |
| Error Type | No. of Samples | Description | Verification Aspect Tested |
|---|---|---|---|
| Mis-typed / malformed codes | 2 | Valid normative code with typographical deviations (extra characters, missing digits, spacing anomalies) | Code normalization & deterministic code lookup |
| Valid codes with altered MTR - LBR - MCR values | 2 | Code exists in UPB but price components differ from normative values | Deterministic price comparison |
| Incorrect UoM | 2 | UoM deviates from normative unit (e.g., m replaced by m²) | Deterministic UoM validation |
| Invalid code (no exact match) | 2 | Code not present in UPB; UoM/price vector fails to match any normative entry | Code not present in UPB; UoM/price vector fails to match any normative entry |
| TT-type synthetic items | 2 | Items requiring nearest-match benchmarking | Price-vector similarity & TT inference |
| Mixed-pattern combined errors | 2 | Combined subtle deviations (code + price or code + UoM) | Integrated robustness of deterministic + exact-match pipeline |
| Outcome Category | Description | Triggered When… |
| 1. Valid Code - Full Normative Match | Code, UoM, and all price components match normative data. | Deterministic pathway passes all checks. |
| 2. Valid Code - UoM Mismatch | Code exists, but the UoM differs from the normative UoM. | Code correct → UoM incorrect. |
| 3. Valid Code - Normative Price Mismatch | Code and UoM correct, but MTR - LBR - MCR values differ. | Price-profile inconsistency detected. |
| 4. Invalid or Non-existent Code | Code cannot be matched to any normative entry after normalization. | Deterministic pathway fails at code lookup. |
| Error Category | No. Cases | Exact Match Found | Correct Recovery | TT Classification | Notes |
|---|---|---|---|---|---|
| Mis-typed / malformed codes | 2 | 2 | 2 | 2 | Full price-vector match enabled correct recovery |
| Valid codes with altered prices | 2 | 0 | 0 | 2 | Deterministic price mismatch; no diagnostic recovery invoked |
| Incorrect UoM | 2 | 0 | 0 | 2 | Incorrect UoM |
| Invalid code (no exact match) | 2 | 0 | 0 | 2 | Invalid code + no equality match → classified as TT |
| TT-type synthetic items | 2 | 0 | 0 | 2 | Correct TT classification; not the result of code errors |
| Mixed-pattern combined errors | 2 | 0 | 0 | 2 | One preserved normative values → recovered; one inconsistent |
| Category | Variant Type | Description | Implications for Code Verification |
|---|---|---|---|
| Sheet-level variants | Displaced header rows | Header begins at row 3 - 12 due to cover text or project metadata | System must detect true header row; incorrect detection leads to misaligned extraction of code/UoM/MTR - LBR - MCR fields |
| Multiple tables or segmented blocks | Cost table broken into sections separated by empty rows or notes | Requires segmentation logic to avoid interpreting non-data rows as work items | |
| Subtotal / narrative rows | Rows summarizing subtotals or containing notes | Must be filtered to prevent false error flags and misclassification of non-work items | |
| Header-level variants | Multi-row headers | 1 - 3 header rows with nested labels or merged cells | System must parse merged/nested structures to map correct columns |
| Inconsistent description labels | “Description”, “Content”, “Work Item Name” (in Vietnamese) etc. | Requires term-normalization dictionary to identify description column consistently | |
| Variants in price-field labels | “VL/NC/MTC,” “Material/Labour/ Machinery,” “VL - NC - MTC” (in Vietnamese) | Affects detection of MTR - LBR - MCR fields needed for price-vector construction | |
| Row-level variants | Non-standard code patterns | Codes with extra spaces, hyphens, trailing characters, or merged-cell formatting | Can trigger false invalid-code flags without canonicalization |
| UoM representation variants | “m2,” “M2,” “m²,” “m^2,” etc. | Must be normalized for correct deterministic UoM verification | |
| Incomplete price fields | Missing or zero MTR - LBR - MCR values | Impairs normative price comparison and prevents exact-match recovery for typo detection | |
| Numeric formatting inconsistencies | Text-formatted numbers, comma/period mismatch, hidden formulas | Requires robust numeric parsing to avoid mis-computing price vectors | |
| Intermixed non-data rows | Blank rows, section titles, group headings | Must be excluded to prevent contamination of code-verification results |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).