Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

A Method to Enable Automatic Extraction of Cost and Quantity Data from Hierarchical Construction Information Documents to Enable Rapid Digital Comparison and Analysis

Version 1 : Received: 16 August 2023 / Approved: 16 August 2023 / Online: 17 August 2023 (09:25:22 CEST)

A peer-reviewed article of this Preprint also exists.

Adanza Dopazo, D.; Mahdjoubi, L.; Gething, B. A Method to Enable Automatic Extraction of Cost and Quantity Data from Hierarchical Construction Information Documents to Enable Rapid Digital Comparison and Analysis. Buildings 2023, 13, 2286. Adanza Dopazo, D.; Mahdjoubi, L.; Gething, B. A Method to Enable Automatic Extraction of Cost and Quantity Data from Hierarchical Construction Information Documents to Enable Rapid Digital Comparison and Analysis. Buildings 2023, 13, 2286.

Abstract

Context: Despite the effort put into developing standards for structuring construction cost, and the strong interest into the field. Most construction companies still perform the process of data gathering and processing manually. That provokes inconsistencies, different criteria when classifying, misclassifications, and the process becomes very time-consuming, particularly on big projects. Additionally, the lack of standardization makes very difficult the cost estimation and comparison tasks. Objective: To create a method to extract and organize construction cost and quantity data into a consistent format and structure, to enable rapid and reliable digital comparison of the content. Method: The approach consists of a two-step method: Firstly, the system implements data mining to review the input document and determine how it is structured based on the position, format, sequence, and content of descriptive and quantitative data. Secondly, the extracted data is processed and classified with a combination of data science and experts’ knowledge to fit a common format. Results: A big variety of information coming from real historical projects has been successfully extracted and processed into a common format with 97.5% of accuracy, using a subset of 5770 assets located on 18 different files, building a solid base for analysis and comparison. Conclusion: A robust and accurate method was developed for extracting hierarchical project cost data to a common machine-readable format to enable rapid and reliable comparison and benchmarking.

Keywords

data mining; data extraction; data science; cost infrastructure projects

Subject

Engineering, Transportation Science and Technology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.