Physics-based Graphics Models in 3D Synthetic Environments Enabling Autonomous Vision-Based Structural Inspections

: Manual visual inspections typically conducted after an earthquake are high-risk, subjec-tive, and time-consuming. Delays from inspections often exacerbate the social and economic impact of the disaster on affected communities. Rapid and autonomous inspection using images acquired from unmanned aerial vehicles offer the potential to reduce such delays. Indeed, a vast amount of research has been conducted toward developing automated vision-based methods to assess the health of infrastructure at the component and structure level. Most proposed methods typically rely on images of the damaged structure, but seldom consider how the images were acquired. To achieve autonomous inspections, methods must be evaluated in a comprehensive end-to-end manner, incorporating both data acquisition and data processing. In this paper, we leverage recent advances in computer generated imagery (CGI) to construct a 3D synthetic environment for simulation of post-earthquake inspections that allows for comprehensive evaluation and validation of autonomous inspection strategies. A critical issue is how to simulate and subsequently render the damage in the structure after an earthquake. To this end, a high-fidelity nonlinear finite element model is incorporated in the synthetic environment to provide a representation of earthquake-induced damage; this finite element model, combined with photo-realistic rendering of the damage, is termed herein a physics-based graphics models (PBGM). The 3D synthetic environment with PBGMs provide a comprehensive end-to-end approach for development and validation of autonomous postearthquake strategies using UAVs, including: (i) simulation of path planning of virtual UAVs and image capture under different environmental conditions; (ii) automatic labeling of captured images, potentially providing an infinite amount of data for training deep neural networks; (iii) availability of the ground truth damage state from the results of the finite-element simulation; and (iv) direct comparison of different approaches to autonomous assessments. Moreover, the synthetic data generated has the potential to be used to augment field datasets. To demonstrate the efficacy of PBGMs, models of reinforced concrete moment-frame buildings with masonry infill walls are ex-amined. The 3D synthetic environment employing PBGMs is shown to provide an effective testbed for development and validation of autonomous vision-based post-earthquake inspections that can serve as an important building block for advancing autonomous data to decision frameworks.


Introduction
The inspections of structures that are necessary after earthquakes are laborious, highrisk, and subject to human error. Describing the nature of inspections in a post-disaster scenario, the ATC-20 field manual [1] states that post-earthquake safety evaluations of buildings is "grueling work," resulting in a high level of stress on the volunteer inspectors that may lead to "burn-out." Entry into damaged structures for inspections poses a serious Data-acquisition with unmanned aerial vehicles and data processing using deep learning algorithms have shown tremendous potential in advancing the level of autonomy in post-earthquake inspections. A frequently studied problem is the application of machine learning algorithms, with specific focus on the use of deep convolutional neural networks (CNNs) for damage identification after earthquakes. Yeum et al. [10] proposed the use region-based R-CNN for spalling detection in earthquake damaged buildings. Mondal et al. [11], implemented the Faster R-CNN [12] algorithm to compare different network architectures for multi-class detection of damage in earthquake affected buildings. Xu et al. [13] utilized Faster R-CNN for damage identification of cracks, spalling, and exposed rebar in concrete columns. Researchers have also sought to incorporate context of the damage and information from the entire structure to contribute to a structural assessment using deep learning methods. For example, Hoskere, et al. [14,15] proposed the use of deep-learning based semantic segmentation for multiple types of damage and materials. The proposed methodology was extended to the semantic segmentation of scenes, components, and damage in reinforced concrete buildings in [16]. Narazaki et al. [17,18] proposed the use of fully convolutional neural networks to identify bridge components for post-earthquake inspections. Narazaki et al. [19] employed recurrent neural networks with video data to help better understand the structural component context of close up videos during bridge inspections. Gao et al. [20] developed the PEER-Hub dataset incorporating multiple classification challenges for the post-earthquake assessment of buildings. Liang et al. [21] proposed a three-level image-based approach for post-disaster inspection reinforced concrete bridges using deep learning. Dizaji et al. [22] conducted preliminary research on using 3D data to train a network for defect identification of cracks and spalling on concrete columns. Pan et al. [23] presented a framework to combine performance assessment with repair cost evaluation using deep learning, extending the types of information that can be extracted from image data to aid decision makers. A detailed review of advances in vision based inspection and monitoring can be found in Spencer et al. [24].
While the efficacy of deep learning methods has been demonstrated for autonomous inspection subtasks such as damage identification and component identification, end-toend validation is rarely conducted, due in part to the rare occurrence of earthquakes and the difficulty in obtaining ground-truth data. For example, the images employed in the studies discussed above are often well-lit close-ups of the damage; how these images could be acquired in the field is not considered.
The significant progress in computer graphics software over the past decade has allowed the creation of photo-realistic images and videos from 3D synthetic environments. The data generated from these graphics models, termed synthetic data, has been used to validate applications like robotic simulation (e.g., Gazebo [25]) and for reinforcement learning in autonomous vehicles (e.g., AirSim [26]). Such synthetic data has also been used for semantic segmentation of urban scenes that have showed promising performance on field-collected data (Ros et al. [27]). Moreover, improving diversity and photorealism of the simulated worlds has helped improve the performance of methods trained on synthetic data and subsequently applied on field data, as demonstrated by recent results in self-driving [28][29][30]. 3D synthetic environments can thus offer strong potential for end-toend validation of strategies for automated post-earthquake damage assessment.
While generating damaged structures in 3D synthetic environments, a key challenge is determining how to realistically display the damage on the structure. Hoskere et al. [31][32][33] proposed ideas on physics-and heuristics-based damage models to inform the accurate rendering of the damage on structures. Physics-based graphics models (PBGMs) are graphics models in a synthetic environment informed by underlying finite element model(s), for determining damage representation and/or mechanical behavior of structural and non-structural elements. Heuristics-based models utilize rules based on empirical observations for determining damage representations. Narazki et al. [34] utilized heuristics-based models in a 3D synthetic environment to obtain a dataset of damaged RC viaduct images and train a deep neural network for damage detection. Narazaki et al. [35,36] also developed phyiscs-based graphics models for displacement and strain measurement of miter gates and laboratory bridge structures. Zdziebko et al. [37] developed a Physics-based graphics model of a laboratory beam structure for the development of vision-based displacement measurement algorithms. The generated damage textures will be more physically accurate than heuristically generated damage, which is expected to reduce dataset bias in the long run. Research on using PBGMs to generate realistic and physically-based visual damage patterns of structures has not been explored. Incorporating models for visual representation of damage together with finite element models could unlock new possibilities for end-to-end inspection simulation, leading to more intelligent, efficient, rapid and autonomous structural assessments. This paper develops a comprehensive and automated framework for generating Physics-based Graphics Models (PBGMs) as part of a 3D synthetic environment with the ability to simulate earthquakes and render physically accurate damage. The framework provides novel strategies for the incorporation of non-linear finite element analysis with constitutive models, and graphics models for visual representation of the damage. The proposed framework is applied to generate renderings from twelve reinforced concrete buildings subject to earthquake excitation. The utility of the proposed framework is demonstrated through three applications and experiments, namely, the creation of a large-scale synthetic dataset of earthquake damaged structures with annotations for semantic segmentation, augmenting real data with synthetic data, and comparing the damage state estimation using UAV and ground-based images. The manuscript is organized into the following sections, (i) 3D synthetic environments for inspections, (ii) implementation of the proposed framework RC buildings, (iii) applications and experiments, (iv) results, and (v) conclusions followed by references.

3D synthetic environments for visual inspections
3D synthetic environments ( Figure 3) are defined as modeling software with the ability to simulate object geometries and textures, lighting sources, and cameras. Using synthetic environments, image capture from UAV during an inspection is simulated by rendering images from camera locations following planned flight trajectories. Different flight paths and data acquisition schemes can be evaluated in the synthetic environment for identification of flight parameters like distance from the structure for optimal identification accuracy of both damage and components, flight path for complete coverage, etc. Before such tests can be carried out, a key challenge is to model the structure and environment of interest. In this study, PBGMs are proposed as an effective tool for modeling the structures of interest in 3D synthetic environments. Generation of synthetic data using PBGMs allows for the creation of useful annotated datasets of damaged structures, as any data from algorithmically generated graphics models will be automatically labeled, both at the pixel and image level using damage locations and states implicit in the PBGM. Different conditions, such as ground excitation, lighting, paint colors, dirt, etc. can be simulated, to generate a wider variety of training data robustly representing different realistic environments ( Figure 2). The generated data can be used to train a deep network for semantic segmentation, facilitating automation of multiple tasks. As the damage is informed by a finite-element model, the generated data can be used to conduct overall assessments using the ground truth of the structure condition is available. Finally, as the visual representations are linked to the results of the finite element model, they provide one means of developing finite element model updating strategies. Figure 4 lists applications of PBGMs in synthetic environments for various visual inspection tasks. PBGMs and synthetic environments will provide a testbed for vision-algorithms with readily repeatable conditions. Algorithms that are effective in these virtual testbeds will be more likely to work well on field-collected datasets. The developed datasets using can also be used to augment field datasets to enhance accuracy. This section first described a new framework for the generation of physics-based graphics models followed by scene assembly and image rendering in 3D synthetic environments.

Physics-based graphics models
In this section a framework for generation of physics-based graphics models (PBGMs) for inspections is presented. For clarity, the framework is illustrated in a schematic presented in Figure 5 with reinforced concrete buildings with masonry infill walls as the structure type. However, the same procedures may be followed for other structures where the physics can be simulated through finite element models. The framework consists of five steps including, (i) graphics mesh creation, (ii) non-linear finite element analysis, (iii) damage masks generation (iv) damage texture generation, and (v) texture assignment The geometry of the structure of interest in the PBGM is represented by a 3D mesh. The mesh may be created in any 3D creator software. The features of buildings incorporated in the 3D mesh will enable networks trained on synthetic data to learn representations for those features in real images. For structural inspections of buildings, structural components like beams, columns, and shear walls, and non-structural components like infill walls, doors, windows, and balconies, are highly relevant as damage to these components provide visual indicators of structural health. Similar lists can be made for other types of structures to be inspected. All these components should be created programmatically through parameterization, or, as referred to in the field of computer graphics, created "procedurally". Procedural generation of the mesh will allow programmatic implementation of subsequent steps, thus enabling randomization of both geometry and textures. Randomization has shown to improve the performance for related tasks like robotic perception when learning from synthetic data by Tobin et al. [38] and regarded as an effective way to learn generalizable representations [29].

Non-linear finite element analysis
From the perspective of PBGM generation, non-linear finite element analyses provide valuable insight into the regions in a structure where damage is most likely to occur. The same parameters used to construct the mesh procedurally are used to generate finite element models as well. In the particular case of post-earthquake inspections, a two-step analysis approach is proposed, first obtaining a simplified global response of the structure and then conducting a high-fidelity analysis for the visible components to generate accurate damage patterns. The main pieces of information derived from these analyses are the plastic strain contours, and other damage indicators such as the compression damage index from a concrete damaged plasticity model, which provide direct indicators for cracking and spalling of members -two of the main visual indicators of structural health after an earthquake. As the distribution of plastic strain is not likely to change for small changes in the loading, the number of analyses can be further reduced for large structures with little effect on the final result (i.e., the rendered PBGM) by taking advantage of the fact that components often repeat in a structure (e.g., across floors in a building). The next subsection describes the proposed methodology to identify physics-based damage hotspots using the non-linear analysis.

Damage masks generation
Damage masks are 2D binary images which indicate the presence of damage on component surfaces. Several damage parameters need to be determined before these masks can be generated using the conducted analysis. These parameters relate to the number, size, shape, and location of the damage. Each of the relevant parameters may be determined through, (i) physics-based response, or (ii) defined heuristics. Both these modes come with their own set of merits and demerits. Heuristic methods are the only viable option for many damage cases that are difficult to model (e.g., due to lack of suitable material models or load representations) or for which no empirical data is available. Methods stemming from empirical data are reliable because they are based directly on observations, but identifying good heuristics are challenging. When realizable, physics-based damage masks provide a rigorous approach that link the visual representation to results of finite element analyses, leveraging efforts by researchers in developing state of the art constitutive models. Incorporating the physics enables applications such as estimating structural response (e.g., interstory drift, damage state, etc.), failure mechanisms, and model updating. We first propose a general framework to determine damage parameters and then demonstrate generating masks for common damage types of cracks and spalling using the structural response.
The damage parameters are obtained by Monte Carlo sampling from empirical or heuristic distributions. The first step is to determine the damage state (DS) of the component based on some structural response measure . The response measure used may be anything that is sensitive to visual damage. For example, for reinforced concrete buildings with masonry infill, a commonly used damage indicator, the interstory drift may be used as the response measure. The relationship between DS and is then modeled through a probability distribution (1). This distribution represents uncertainties in the geometry, method of construction and material properties. The component damage state is determined by sampling from the distribution as shown in equation 1 Various parameters 0 (e.g., number of cracks, crack width, crack length etc.) are then calculated by sampling from their corresponding distributions representing variation in damage observed given a particular damage state shown in equation (2).

0~( )
(2) While it may be possible to estimate the parameters directly from , this two step approach allows for a more intuitive method facilitating the construction of the distributions δ and based on empirical data. For parameters whose value will vary depending on the location in the member, parameters are further modified by a multiplicative factor derived from the structural response as shown in equation (3).
where is a function of some structural response parameter (e.g., plastic strain) varying in the component. Examples for selecting each of , , and for RC buildings with masonry infill are provided in section 3. The next subsections discus the generation of masks for cracks and spalls -two common types of defects once the damage parameters have been determined.
Stochastic blobs are amorphous shapes generated to select subregions of generated masks. The plastic strain map is normalized to take the form of a probability distribution . A center point ( , ) is obtained by sampling from the distribution. An amorphous blob shaped region is marked around the center point using a stochastic radius defined as the cumulative sum of a periodic function with random amplitude and phase. The blob generating function takes as input the number of waves along the circumference, . The precise equations proposed can be found in Figure 8 where ~U[ , ] represents sampling from a uniform distribution between and . is the crack widths in pixels. Once these parameters are determined, the following pipeline for generation of crack masks from the plastic strain map provided in Figure 7can be applied. A gaussian blur is applied, followed by a Canny edge detector [39] to obtain an edge image. The edges are dilated by a factor of . Finally, to add randomness to each component, a stochastic blob is generated and the intersection of the blob with the dilated edge image is included in the crack mask. This process is repeated times.

Figure 7. Crack mask generation process
Spalling is another common damage type for reinforced concrete and masonry structures affecting the integrity of components. The damage parameters to be determined here is the number of spalled regions, and is the nominal spall radius. To generate spall masks with these parameters, an area of pixels corresponding to the spall must first be defined. A stochastic blob is generated following the process outlined in Figure 6. In addition to the blob, another region is constructed corresponding to pixels with compression damage greater than the mean compression within the blob. The spall region is then set as the intersection of and . Rebar is made visible under spalled regions with some probability ( ). The process is illustrated in Figure 8. Damage textures are image textures of damaged components. Damage textures need to be generated so as to provide a realistic visual representation of the damaged structure. The following points are discussed to illustrate the process followed in generating damage textures: (i) Bidirectional scattering distribution functions, (ii) material textures, (iii) damage textures, and (iv) annotation textures.
Bidirectional scattering distribution functions: The visual appearance of an object is dependent on how light incident on its surfaces is scattered, transmitted or reflected. In computer graphics the behavior of light incident on a given material is represented by bidirectional scattering distribution function or BSDF [40]. BSDFs can be measured through optical experiments with structured light sources. Based on experiments, researchers have proposed different methods to model BSDFs. A widely implemented model available in many 3D creator software known as the Principled BSDF was proposed by Burely at el. [41] and is a physically-based rendering (PBR) model but with artistically intuitive parameters. Apart from the base color, BSDFs have 10 parameters to describe each pixel including properties of roughness, metallic, specularity, anisotropy, etc. Depending on the type of material, several of these may not be applicable, for example a concrete surface may have negligible metallic scattering properties. In addition to these values defining the scattering, the incorporation of surface normal directions at every point play a significant role in accurate renderings. If the surface is modeled at a very small-scale incorporating undulations, then the values of the surface normal can be computed directly from the geometry. However, such detailed surface modeling is seldom feasible and an alternative way to retrieve the same effect is to use a predefined surface normal map.
Material textures: PBR textures encompassing maps with BSDF parameters for base color, roughness, metallicity, etc., and surface normals can be used to adequately represent materials for the purpose of structural inspection simulation. PBR textures for common construction materials incorporating BSDF parameters created through height field photogrammetry are available on websites like CC0textures [42]. A sample image texture of a brick wall rendered from [42] using Blender [43], an open source 3D model creation software is shown in Figure 9. The example incorporates three maps: the base color, a roughness map and the normal map. The roughness changes how the light is reflected, especially near the edges of the bricks and the normal map helps visualize the fine surface undulations and the protrusion of the bricks from the mortar plane. In addition to photogrammetry-based textures, textures can also be procedurally generated in material authoring programs like Substance [44] and provide the ability to create multiple textures with different random seeds. As noted in [29,38] randomization is a crucial means of enforcing generalization. We utilize both types of PBR maps (photogrammetric and procedural) in the construction of the PBGMs. When multiple layers of materials are present, (e.g, cement plaster over masonry, paint over concrete, etc) maps are selected for each material layer and the displayed layer is selected based on the presence of damage at any given pixel. Figure 9. Illustration of PBR texture using base color, roughness and normal maps Damage textures: The damage textures for the PBGM are obtained using the material textures as the base and modifying the region within the generated physics-based damage masks using opencv-python [45]. The crack is textured by modifying the corresponding surface normal through a bump map. The depth is set as a heuristic function of the plastic strain similar to the width and length and the crack. The spall region is defined by applying a Musgrave [46] texture to create a bump map controlling the variation of depth within the spall region. For reinforced concrete components, rebar is exposed depending on the damage state of the material with some probability . The rebars are modeled as cylinders with surface variation and a metallic map.
Annotation textures: For deep learning methods, the ground truth synthetic data is rendered by using an emission BSDF. As opposed to the principled BSDF with 10 parameters, an emission BSDF has a single color parameter and acts as a light source. The emission shader is useful for rendering homogenous colors, which is what is required as ground truth for tasks like semantic segmentation. Depending on whether image data or annotation data is being rendered, the appropriate texture types are selected during the rendering process.

Texture application
The generated textures are applied to the components after UV unwrapping the components. For 3D models application of 2D textures requires a correspondence to be created such that 2D surfaces can map to corresponding locations on the 3D surface. This process of "unwrapping" the 3D model is termed UV unwrapping. UV unwrapping is conducted by selecting the edges that are to serve as seams to break up the 3D model. In most programs, once the seams are selected, the resulting 2D surfaces are then arranged to fit within a square surface. The obtained damage masks are also assembled in the same arrangement to create a direct correspondence to the UV map and thus to the 3D model. Other masks like the rebar mask are also arranged in the same way. Here, depending on

Scene, Lights, Camera & Rendering
The steps discussed thus far describe the construction of a single PBGM. To obtain photo-realistic images, the background scene also needs to be populated. For post-earthquake building inspections this includes multiple buildings, roads, sidewalks, light poles, electric cables, trees, etc. Randomization of geometry and textures is important towards the ultimate goal of generalizability of deep learning models trained in the synthetic environment. Thus procedural methods are adopted even in the scene assembly for the generation of these items.
The final step is to render the images. There are two modes of rendering commonly available in 3D creator software, namely path tracing and rasterization. Path tracing involves simulating the path of light in the scene and is more computationally expensive than rasterization but is preferred as it produces more photo realistic representations. To render images, a light source and the camera locations and orientations are to be set in the synthetic environment. To simulate realistic outdoor lighting, HDRI maps are used to light the scene [47].

Implementation of 3D synthetic environment with RC buildings
The proposed framework was implemented using multiple software applications. 3D model construction was conducted in Blender [43], the finite element analysis was conducted using ABAQUS [48], material authoring using Substance [44], and image processing in python using OpenCV [45]. This section discusses details about the construction of the synthetic environment.

Details of modeled buildings
3D models were created for 12 different fictional reinforced concrete buildings with masonry infill walls. The layout for these buildings were loosely 3 buildings (shown in Figure 11) that were affected in the Mexico City earthquake in 2017 with some simplifications made for parametric modeling. The buildings were parameterized and different realizations for each of the buildings were constructed with varying dimensions. Photographs of the buildings were obtained from three different sources: datacenterhub [49], google street view [50], and direct photography by the authors. The dimensions and layout of the building were parameterized to include dimensions and locations of columns, beams, walls, windows, and balconies. The building properties were stored in a single class object that were used both for finite element model creation and 3D model generation.

Finite element model
As mentioned in section 2, both the global and local response of the structure are required for the generation of the PBGM. The global analysis of the buildings were conducted using OpenSeesPy [51]. The creation of the mesh was automated based on the building layout parameters developed in the previous section. The structure was modeled using the confined concrete model in OpenSeesPy with the parameters in Table 1.  The shear reinforcement was assumed to be at a spacing of a maximum of (100, /3). The shear strength from reinforcement is assumed to be 3√ ′ and the corresponding rebar area as per ACI 318 are given in equations (6).
The first three global mode shapes of a parametrically generated building is shown in Figure 12 . Figure 12. First three horizontal modes of a simulated structure Each building was subject to the Tabas earthquake with varying intensity from /4 to /6. An example ground motion is shown Figure 13. A full analysis was conducted for the local response of the components using Abaqus. A python script was developed to automate the process of creation of the components of the structure. The components models included the masonry wall and confining columns and beams as shown in Figure  15 all modeled with solid elements. The model also included rebar which was modeled with beam elements with a circular cross-section. The nodal displacements at the corners of components were used as inputs for the detailed local component models. For the concrete and masonry members, the concrete damaged plasticity (CDP) model proposed by [53] was used. The material parameters used for the concrete material was based on values reported in Jankowiak et al. [54] and for the masonry material based on the values reported in Bolhassani et al. [55]. The masonry yield stress for tensile behavior was factored down so that the tensile strength of the masonry was less than that of the concrete. The steel was modeled as a plastic material with yield stress of 200 MPa. The stress-strain curves used are shown in Figure 15. The rebars are embedded within the concrete members using the embedment interaction option in ABAQUS. The walls are tied to their immediate confining members using the tie constraint in Abaqus. A multi-point constraint is applied to tie the top and bottom surfaces of the beams and columns together. The bottom surface is fixed, and the top surface is subject the interstory drift. The amplitudes are chosen to represent 4 different damage states derived based on values reported by [56].
An Abaqus explicit analysis was run for each unique component and the plastic strains at each of the amplitude levels are stored as an image for input to the texturing process discussed in the next subsection.

Damage parameters
As mentioned in section 2, the first step in identifying the damage parameters is to determine the damage state of the component. The probability distribution for different damage states given the interstory drift is taken from Chozzi et al. [56] where data from over 150 test on masonry walls subject to in-plane loading were analyzed. A log normal distribution is used to model the conditional probability of exceeding a given damage state as shown in equation (7).
( ) and represent the central tendency and the dispersion parameters of the cumulative standard normal distribution Φ. The values used for the different damage states are presented in Table 2.
. Once the damage is determined for the components, the various damage parameters were computed by sampling from their corresponding lognormal distributions. The statistics of the distributions used are provided in Table 3 and the corresponding distributions are plotted in Figure 15. The values for the crack width are based on descriptions of damage states given in Chozzi et al. [56]. The crack length, height and number of cracks for different damage states are approximated based on descriptions given in FEMA 306 [57] based on the component damage classification guides for concrete frames with masonry infill. The spall radius ratio and area has been generalized for both walls and columns based on examples provided in [58]. In the presence of more rigorous experimental data, corresponding distributions may be replaced to better represent the damaged structure.  PBR textures are used for all the construction materials. The textures for the paint, walls, beams, and columns were all generated parametrically using Adobe Substance Designer [44]. The visual features parameterized include color properties, amount of dirt, types of dirt, and size and orientation of bricks. For each generated building structure, parameters including the paint color, concrete color, brick size, and brick color are first selected. Then for each component, the parameters are perturbed to provide variability for the components.

Assembly and lighting
The assembly and construction of the PBGM and synthetic environment are automated using python scripts. In each scene one PBGM building is created. Then, the sidewalks, trees, roads and other buildings are added to complete the scene using the Scene-City Blender plugin. The scene background and lighting was set using HDRI maps downloaded from [42]. An emission shared was used for the annotations and the images were rendered using the cycles renderer.

Applications and experiments
The developed procedure for PBGMs is used to generate synthetic images that can be used for automated visual inspection studies. Three applications and examples are illustrated, (i) QuakeCity Dataset: Large-scale synthetic dataset of earthquake damaged buildings, (ii) Augmenting real data with synthetic data, and (iii) Comparing post-disaster UAV data acquisition with ground camera data acquisition.

QuakeCity Dataset: Large-scale synthetic dataset of earthquake damaged buildings
Images are rendered from multiple simulated UAV surveys of 11 damaged buildings in a city environment. Each survey replicates a field scenario where a UAV circles the building at different altitudes to cover the entire height, width, and length of the building. Each image captured by the simulated UAV is associated with six different sets of annotations including three damage masks (cracks, spalling, exposed rebar), components, component damage states, and a depth map. The QuakeCity dataset is released as part of the International Competition on Structural Health Monitoring in 2021 for project task 2 [59]. In total, 4,688 images and six annotations per image of size 1920×1080 are included in the dataset, with 3,684 for training, and 1004 for testing.

Augmenting real data with synthetic data
To reliably train an autonomous visual inspection system, a large amount of training data with damaged structures would be required. Frequently, however, the amount of such training data available is limited. Additionally, careful annotation of available images is also a challenge. In this experiment, we are interested in studying whether the incorporation of synthetic data in cases with limited availability of real data with annotations can help boost the accuracy of networks on unseen real data.

Real image dataset
A dataset for semantic segmentation of real earthquake damaged buildings was developed for the purpose of this study. The images were acquired by the authors after the 2017 Mexico City Earthquake using a DJI Phantom 3, and a Nikon D3300. The images were annotated for the presence of spalling using InstaDam [60]. In total, 150 images of resolution 1920x1080 were annotated as part of the dataset.

Network architecture
A deep network is constructed for semantic segmentation using a ResNet [61] architecture with 45 layers. The details of the encoder part of the architecture are provided in Figure 17. Residual connections involve the summation of the output of prior layers to enforce learning of new information in subsequent layers. These residual connections are used between alternate layers (e.g., Conv0 to Conv2, Conv2 to Conv4, etc.). A rectified linear unit is used as the non-linearity for all layers of the network. The details of the decoder part of the architecture are provided in Model training. The skip connections with 1x1 convolutions described in the previous subsection are taken after the Conv8, Conv20 and Conv32 layers. The network parameters were trained by minimizing the cross-entropy loss function between the predicted softmax probabilities and the corresponding one-hot labels with an L2-regularization weight decay [53]. The incorporation of the weight decay term gives preference to smaller weights and helps tackle overfitting. Batch normalization was applied to address the covariate shift that occurs during training [24], where each feature dimension is shifted by a weighted mean and standard deviation that was learned during training. The percentage of pixels in each of the classes varies significantly. For example, some classes such as cracks have much fewer pixels than spalling or corrosion due to the nature of damage. To balance the frequencies of different classes in the data set and prioritize all classes equally, median class balancing [26] was applied by reweighting each class in the cross-entropy loss. Data augmentation by resizing and cropping was incorporated in order to increase the efficacy and efficiency of training and prevent issues such as overfitting. The training was conducted using the Adam optimizer [54] implemented in Pytorch [62].

Model training
Eight different models were trained to evaluate the potential role of synthetic data in enhancing the overall performance of the models on real data. The eight models included four pairs of straining schemes where each scheme had one model trained purely on real data and another trained on real plus synthetic data. In each pair, the train/test split of real data was varied, starting from 0.2 train + 0.8 test, to 0.8 train + 0.2 test, in increments of 0.2. The same amount of synthetic training data was used in all four schemas, and this included the training images from the QuakeCity dataet (i.e., 3684 images).

Comparing damage state estimation using UAV and ground-based images
While implementing autonomous visual inspection systems after disasters, a trained model using a dataset conducted prior to the disaster would be used to process new data acquired after the disaster. The quality of the predictions on new data may however vary widely depending on the image acquisition distance. For example, it may not always be possible to have consistent data acquisition modes or distances for various structures of interest. This is especially so in crowded cities where many obstacles are present. To better study the robustness of the trained models, practitioners may want to evaluate the model's performance for different camera distances to see where data gaps are present in the model, or to inform their field acquisition strategies. In such a scenario, using a PBGM would prove very useful, as images could be acquired with different camera paths, and the accuracy of predictions of a fixed trained model can be studied.
In this experiment, we train two different ResNet 45 models to predict component damage states. One model is using only the QuakeCity training dataset and tested on images from another building. Two test sets are prepared, one simulating a UAV camera for data acquisition (UAV-B12), and another simulating a person on a ground collecting images of the structure by pointing the camera forward and upward (Ground-B12).
Together, the datasets are referred to as B12. Another model is trained with the QuakeCity training dataset plus 25% of the images from B12 (QuakeCity+0.25 B12), and evaluated on 75% of the B12 data (0.75 B12). The results of performance on the ground data are reported separately for the UAV and Ground parts of B12.

Results
The results from the experiments outlined in the previous section are now presented

QuakeCity Dataset
Example images of the generated dataset are shown in Figure 18. The images demonstrate the diversity of damaged buildings in the dataset in terms of layout, color, damage level. Images in the scenes are taken from different viewpoints and with different lighting conditions.   Figure 20. Shows another image generated with spalling, cracks and rebar annotations for each pixel. The annotation color key for both sets of annotations is provided in Table 4.

Augmenting real data with synthetic data
The results from the different models trained are shown in Figure 21 and Figure 22. Figure 21 shows the comparison of test accuracy on 60% of real data while training on 40% of the real images with and without QuakeCity data. While the initial accuracy with only real data is higher than with QuakeCity, after about 75 epochs, it was noticed that there was a significant increase in the performance of the model trained with QuakeCity data. The performance of the model clearly highlights the benefits of using synthetic data to improve the performance of deep learning models on unseen real data. Figure 21. Comparison of test set accuracy on 60% of real data while training on 40% of the real images with and without QuakeCity data The addition of synthetic data was also shown to improve the performance of the deep neural network for even for varying splits of training and testing data. Figure 22 shows the difference between the two values plotted in Figure 21, for all four models trained. The performance all models trained with the QuakeCity dataset is better than the model without the QuakeCity data after 400 epochs. The improvement in IoU is seen to be as much as 10%. Table 5 shows examples of images where the 0.4 Real model with QuakeCity data performs better than the model without. The quality of the predictions is clearly improved, and the border of the predictions can be seen to be more accurate. Figure 22. Difference between test accuracy with and without QuakeCity data for varying fractions of real training data  Table 6 shows the test IoU for different damage states for the various models trained. The model trained on the QuakeCity dataset only, which is limited to UAV views performs poorly on Ground B12 images. As a comparison, the performance of the model on 75% of UAV B12 is also shown. With the addition of 25% of B12 to the training dataset, the model performs much better on the remaining 75% of the data and is much closer to the performance on 75% of the UAV B12 set. While the results are along expected lines, the study nonetheless highlights the benefits of using a PBGM for tasks where the value and type of additional information to be incorporated into the network needs to be quantified. Given that there will be some cost associated to incorporating new data into the training dataset, a performance-based approach for data inclusion can be set-up using a PBGM as a reference. Table 7 and Table 8 show examples of predictions for the Ground and UAV B12 test datasets respectively.

Conclusions
This paper proposed a framework for generating physics-based graphics models (PBGMs) as part of a 3D synthetic environment that can support development of automated inspection strategies for civil infrastructure. The proposed framework involved combining response of a non-linear finite element model to inform the realistic visual rendering of different damage types. The framework was implemented for eleven reinforced concrete building structures subject to earthquake excitation and the damage types rendered included cracks, spalling, and exposed rebar. Three applications were demonstrated for the proposed framework. First, images were rendered from the damaged structures, pixel-level ground truth was generated for the various damage types, for components, component damage states, and depths. The dataset generated, termed QuakeCity, was released as part of the International Competition on Structural Health Monitoring 2021. The QuakeCity dataset will serve as a benchmark dataset to study the use of deep learning algorithms in automated post-earthquake inspections of building structures. Second, the efficacy of the proposed framework in generating synthetic data to augment real data was demonstrated. It was shown that the performance of models trained with synthetic data and real data performed up to 10 IoU points better than models trained with only real data. Finally, a third experiment was conducted comparing the performance of trained models on ground and UAV-based data. The experiment demonstrated the utility of the proposed framework for studying and quantifying the value of additional information for models trained for visual inspections. The results demonstrate the immense potential of using PBGMs as an end-to-end tool for the development and study of visual inspection systems.
Author Contributions: Conceptualization, VH, YN and BFS; methodology, VH and YN.; software VH.; validation, VH.; resources, BFS and VH.; data curation, VH.; writing-original draft preparation, VH.; writing-review and editing, VH, YN and BFS. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement:
In this section, please provide details regarding where data supporting reported results can be found, including links to publicly archived datasets analyzed or generated during the study. Please refer to suggested Data Availability Statements in section "MDPI Research Data Policies" at https://www.mdpi.com/ethics. You might choose to exclude this statement if the study did not report any data.