Concrete Cracks Monitoring using Deep Learning-based Multi- resolution Analysis

In this paper, we propose a new methodology for crack monitoring in concrete structures. This approach is based on a n this paper, we propose a new methodology for monitoring cracks in concrete structures. This approach is based on a multi-resolution analysis of a sample or a specimen of the studied material subjected to several types of solicitation. The image obtained by ultrasonic investigation and processing by a dedicated wavelet will be analyzed according to several scales in order to detect internal cracks and crack initiation. The ultimate goal of this work is to propose an automatic crack type identification scheme based on convolutional neural networks (CNN). In this context, crack propagation can be monitored without access to the concrete surface and the goal is to detect cracks before they are visible on the concrete surface. The key idea allowing such a performance is the combination of two major data analysis tools which are wavelets and Deep Learning. This original procedure allows to reach a high accuracy close to 0.90. In this work, we have also implemented another approach for automatic detection of external cracks by deep learning from publicly available datasets.


Introduction
To say that concrete is the most widely used man-made material in the world is a nobrainer. Nevertheless, the search for simple, effective and inexpensive techniques to optimize the performance of concrete and to control its mechanical behavior is the real challenge we must meet. In the interests of safety and economy, methods for predicting the performance of concrete structures over the long term (e.g. decades) are in great demand, especially in developing countries. Mechanical overload is one of the most frequently cited reasons for damage to concrete. However, other equally devastating factors will be mentioned in this work. Micro-cracks (see, for example, Figure 1 a)) can be caused by excessive mechanical stress, even if this stress is confined to a restricted area. If such stressoverload continues, cracks will continue to form and/or expand, which could lead to excessive damage or even mechanical collapse of the structure. For this reason, crack monitoring is crucial to ensure long-term viability. In current engineering practice, this monitoring is performed by regularly measuring the crack openings at the surface using optical measurements or extensometers. Based on these observations, it is now known that internal damage can lead to leakage or corrosion on large walls, even when only limited cracks are visible on the surface. It is therefore essential to be able to assess and monitor cracks in reinforced concrete constructions at an early stage, especially in special constructions where durability and containment can be significant issues.
Concrete is a mixture of four main materials: Portland cement, coarse aggregate, fine aggregate and water, and for industrial use, mineral and chemicals admixtures are added to accelerate or delay its grip to improve its performance [1,2]. The quantities of these elements are regulated for a quality required by the destination of the structure's construction, such as long span bridges (see Figure 1 b)), special underground structures (see Figure 1 c)), nuclear power plants (see Figure 2). An excess or defect in the required quantity of one of the constituent elements, inappropriate vibration of the initial mixed elements causes defects such as segregation or premature cracks due to shrinkage of the concrete and the presence of air bubbles causes discontinuities in the material (see Figure 1  These defects affect the strength of concrete and its durability [3]. Exposed to aggressive environments or temperature variations, visible and non-visible defects appear, its quality and resistance decrease. Under compressive stress, this material behaves well, unlike tensile stresses. In a concrete specimen subjected to compressive stress, the constraints are concentrated on rigid elements with an appreciable modulus of elasticity. Since this material is heterogeneous, an external load creates a complex state within it and a concentration of stresses around air voids [4]. There is a wide variety of methods for evaluating materials or components and nondestructive methods are an important category with multiple applications. The field of non-destructive evaluation (NDE) or non-destructive testing (NDT) [5] involves the identification and characterization of damage to the surface and interior of materials without cutting or other alteration of the material. In other words, NDT refers to the process of evaluating and inspecting materials or components for characterization [6,7] or searching for defects and flaws in relation to certain standards without altering the original attributes or damaging the test object. NDT techniques provide a cost effective means of testing a sample for individual investigation or can be applied to the entire material for verification in a production quality control system. Thus, NDT is the set of methods that can characterize the state of integrity of structures or materials without degrading them (without altering their function in use). The development of NDT methods began around [1960][1961][1962][1963][1964][1965][1966][1967][1968][1969][1970] to meet the demands of sectors such as nuclear energy, aeronautics and space. NDT gradually widened its field of application, moving from the strict field of detection, recognition and dimensioning of localized defects to the evaluation of the intrinsic characteristics of materials. The notion of defect (or fault) is defined according to the use that will be made of the product (satisfaction of the final customer). NDT methods can be applied to the same elements and structures several times and at different times. These methods are suitable whereby such methods are suitable for diagnostic testing of building structures, both during their construction and during their many years of service.
Detection of cracks is an important task in monitoring the structural health of concrete structures. If cracks develop and continue to propagate, they reduce the effective load bearing surface area and can over time cause failure of the structure.
For this reason, non-destructive testing of concrete now has two main objectives: to detect micro-cracks at an early stage and to monitor stresses in the structures [8,9].
The main objective of this work is to propose a new approach for the detection of structural cracks in concrete using an ultrasonic non-destructive testing system to scan the concrete and an original methodology based on multi-resolution analysis and deep learning.
The remainder of this paper is organized as follows: Section 2 is devoted to the foundations of our approach: -it presents and recalls NDT methods and techniques as well as the experimental setup used, -it introduces the main properties of the wavelet transform and the corresponding multiresolution analysis, -it recalls the foundations of neural networks and CNN-based Deep Learning and proposes the adopted architecture to build a classifier for detecting internal cracks from the obtained spatial-scale images.
Section 5 focuses on the implementation aspect and the analysis of the results. Finally, Section 6 concludes this study.
NDT methods are the most desirable and developed methods of concrete diagnosis. A distinction can be made between the stroke method, electrical methods, visual evaluation and acoustic methods. The latter are also called wave methods, based mainly on the analysis of the propagation of ultrasonic waves [22,23]. Acoustic methods can be divided into passive methods, in which the source of the waves is only a construction with changing load (the acoustic emission method), and active methods of sending and receiving ultrasonic waves. Currently, active methods are not as well developed and tested, so they can be widely used in the field and in most cases require access to two (opposite) sides of the test element or knowledge of its exact dimensions.   The originality of our work lies in the fact that we use, on the one hand, the ultrasound-based NDT method to identify possible cracks and, on the other hand, this method is combined with a multi-resolution analysis based on wavelets to finely analyze cracks and their size at different scales, especially at the beginning of the concrete cracking process. The final objective is to automatically classify these cracks by deep learning and to follow their evolution.

Multiresolution Analysis Based on Wavelets
The concept of multiresolution analysis [24] provides a framework for the decomposition (and reconstruction) of a signal in the form of a series of approximations of decreasing scale, completed by a series of details. To illustrate this idea, let us take the case of an image constructed from a succession of approximations; the details enhance this image. Thus, coarse vision becomes finer and more precise.
The concept of multiresolution analysis (MRA) is an effective tool that is universally applicable to the above-mentioned fields. This tool, sometimes described as miraculous, produces an immediate and easily interpretable and exploitable result. However, for specific applications that require the extraction of targeted information, it is amply clear that advanced methods will have to be developed and "merged" that exploit existing techniques or optimize analyses (e.g. in compression) by taking into account edges or contours, using 2nd and 3rd generation wavelets such as ridgelets [40], curvelets [41], contourlets [42], bandelets [43], etc. Indeed, these anisotropic wavelets are automatically oriented and expanded by unifying the geometry of a potential edge or contour. This conceptualization of multi-resolution analysis is comparable to that of a camera that gets closer to a subject or uses a zoom to distinguish its details, and moves away to capture larger structuresthe famous concept of the mathematical microscope. Figure 4 summarizes the principle of multi-resolution analysis (here for three levels of resolution) based on wavelets. The signal S is first decomposed at the 1st resolution level into an approximation A1 and a detail D1, then at the 2nd resolution level, approximation A1 is decomposed into an approximation A2 and a detail D2, and finally at the 3rd resolution level, approximation A12 is in turn decomposed into an approximation A3 and a detail D3. The signal thus analyzed can be written as follows: Let ψ(t) denote a reference pattern called the mother wavelet. It is generally requested that ψ(t) has jointly highly concentrated time and frequency supports.
ψ(t) satisfies the following equation: where n controls the number of oscillations of ψ(t). This relation means that ψ(t) is orthogonal to polynomial components of degree less than n.
The wavelet transform ( , ) of a signal X at time u and scale s is defined by (3).
where * denotes the complex conjugate of . Looking at expressions (2) and (3), it is clear that ( , ) will be insensitive to the most regular behaviors of the signal assimilated to a polynomial of degree less than (the number of vanishing moments of ψ). Conversely, ( , ) takes into account the irregular behavior of polynomial tendencies. This important property plays a key role in the detection of signal singularities, especially in the detection and tracking of cracks.
Clearly, to reduce or eliminate the redundancy, the family { , } ( , )∈ 2 must constitute an orthonormal basis of 2 (ℝ), where 2 (ℝ)denotes the vector space of measurable, square-integrable one-dimensional functions. This property of the wavelet makes it possible to obtain a fast wavelet transform. The fast wavelet transformation is calculated by a cascade of low-pass filtering by ℎ and high-pass filtering by followed by a downsampling (or decimation) by a factor of 2 (see Figure 5).
In Figure 5, (or ( , ), where represents time) and (or ( , ), where represents time) are called respectively approximation coefficients and wavelet coefficients (or details) of the signal at level . Moreover, the symbol represents the decimation by a factor of 2, in other words, the conservation of one in two samples. The impulse response of the mirror low-pass filter is ℎ ̅ ( ) = ℎ(− ) and that of the mirror high-pass filter is ̅ ( ) = (− ). These two impulse responses [10] are linked by ( ) = (−1) ℎ(1 − ) whose coefficients are obtained directly from the chosen wavelet ψ.
In Figure 6, it should be noted that the original signal has 1,000 samples while the detail (and approximation) signals have been decimated by a factor of 2 at each level of resolution. Hence, after 3 levels of resolution, from a signal of 1,000 samples, one arrives at the approximation A3 and the detail D3 which each have only 125 samples. In this study, the scalogram of the investigative ultrasound signal will be used to determine and analyze cracks in concrete. We can define the scalogram of the signal ( ) by ( , ) = | ( , )| 2 ∀ ( , ) ∈ ℤ × ℤ (5) Figure 7 shows the scalogram of a signal representing the initialization of a crack materialized by intense energy. This fracture also propagates, even if in a weaker way, to other scales which can cause in the long term to a rupture. Neural networks are modeled as collections of neurons that are connected in an acyclic graph. In other words, the outputs of some neurons can become inputs to other neurons. For classic neural networks, the most common type of layer is the fully connected layer where all inputs from one layer are connected to each activation unit of the next layer. In most common machine learning models, the last layers are fully connected layers that compile the data extracted by the previous layers to form the final output. In Fig. 5, two example neural network topologies that use a stack of fully-connected layer. Traditional neural networks use a fully-connected architecture, as illustrated in Figure 8 (left), where every neuron in one layer connects to all the neurons in the next layer. A fully connected architecture is inefficient when it comes to processing image data as in our case. For an average image with hundreds of pixels and three channels red, green and blue (RGB), a traditional neural network will generate millions of parameters, which can lead to overfitting. The model would be very computationally intensive and it may be difficult to interpret results, debug and tune the model to improve its performance.
Modern convolutional networks contain on orders of 100 million parameters and are typically made up of around 10 to 20 layers (hence deep learning). However, as we will see, the number of effective connections is significantly higher due to parameter sharing.

CNN and Deep Learning
Deep learning is the new state-of-the-art for artificial intelligence. Deep learning architecture is composed of an input layer, hidden layers, and an output layer. The word deep means there are more than two fully connected layers. Convolutional networks are a specialized type of neural networks that use convolution in place of general matrix multiplication in at least one of their layers.
Convolutional Neural Network (CNN) is one of the main categories to do images recognition, images classifications [44][45][46][47], crack damage detection [48]. The name Convolutional Neural Network indicates that the network uses a mathematical operation called convolution which performs a filtering of the original image by a filter or a kernel in order to extract features.
There are six main operations in the CNN architecture shown in Figure 9: The numbers below layers indicate the output data size of each convolution or fully-connected layer.
1. CONV: Convolution. The convolution layer is the core building block of the CNN. It carries the main portion of the network's computational load. This layer performs a dot product between two matrices, where one matrix is the set of learnable parameters otherwise known as a kernel, and the other matrix is the restricted portion of the receptive field. The kernel is spatially smaller than an image but is more in-depth. This means that, if the image is composed of three (RGB) channels, the kernel height and width will be spatially small, but the depth extends up to all three channels.

BN:
Batch normalization is a technique to standardize the inputs to a network, applied to ether the activations of a prior layer or inputs directly. Batch normalization accelerates training, in some cases by halving the epochs or better, and provides some regularization, reducing generalization error.
3. RELU: Rectified Linear Unit. Since convolution is a linear operation and images are far from linear, non-linearity layers are often placed directly after the convolutional layer to introduce non-linearity to the activation map. There are several types of non-linear operations, the ReLU has become very popular in the last few years. It computes the function = max (0, ) (see Figure 10). In other words, the activation is simply threshold at zero. There are several pooling functions such as the average of the rectangular neighborhood, L2 norm of the rectangular neighborhood, and a weighted average based on the distance from the central pixel. However, the most popular process is max pooling, which reports the maximum output from the neighborhood. 5. FC: Fully Connected Layer. Neurons in this layer have full connectivity with all neurons in the preceding and succeeding layer as seen in regular FCNN. This is why it can be computed as usual by a matrix multiplication followed by a bias effect. The FC layer helps to map the representation between the input and the output. 6. Dropout is a regularization technique for neural network models where randomly selected neurons are ignored during training. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass and any weight updates are not applied to the neuron on the backward pass. As a neural network learns, neuron weights settle into their context within the network. Weights of neurons are tuned for specific features providing some specialization. Neighboring neurons become to rely on this specialization, which if taken too far can result in a fragile model too specialized to the training data. This reliant on context for a neuron during training is referred to complex co-adaptations. You can imagine that if neurons are randomly dropped out of the network during training, that other neurons will have to step in and handle the representation required to make predictions for the missing neurons. This is believed to result in multiple independent internal representations being learned by the network. The effect is that the network becomes less sensitive to the specific weights of neurons. This in turn results in a network that is capable of better generalization and is less likely to overfit the training data.
7. Softmax is implemented via a neural network layer just before the result layer. The Softmax layer must have the same number of nodes as the result layer. In probability theory, the output of the softmax function can be used to represent a categorical distribution -that is, a probability distribution over different possible outcomes representing the possible classes in classification. In our application, we have 2 possible cases: For example 1 = 0.95 for crack and 2 = 0.05 for non-crack. Figure 9 shows a CNN architecture adapted to our problem of monitoring concrete structures. This consists of 4 CONV layers, 4 BN layers, 4 ELU layers and 4 Pool layers, followed by a FC, RELU and Dropout layers. Finally, an FC layer decides, via Softmax activation, the final classification of the image into crack or non-crack.

Metrics and Data
In this paragraph, we will define the metrics used to evaluate the performance of our Deep Learning approach, and explain that there are two methodologies to adopt depending on the nature of the concrete crack: internal or external.
To achieve our goal of classifying cracked/non-cracked concrete images, it is necessary to use evaluation measures to assess the performance of our approach. Accuracy is the ratio of the number of correctly predicted cracked and uncracked images to the total number of input images. Accuracy is the most intuitive one and is defined as follows: where where (True Positive) and (True Negative) mean images with crack and without crack, which are correctly classified.
(False Positive) and (False Negative) mean images with crack and without crack which are wrongly classified.
Precision aka confidence or true positive accuracy (positive prediction value) can be understood as the number of correctly predicted crack images divided by the number of crack images predicted by the classifier. Precision can be interpreted as an indicator of robustness. It is defined as: Recall aka sensitivity or true positive rate is the percentage of the number of correctly predicted crack images to the total number of crack images.
score is a weighted harmonic average comprehensively reflecting the and . It is defined as: where is a coefficient to trade off precision and recall. is set to be 1 here to give the precision rate and recall rate the same weight. In this context, 1 score is the harmonic mean of and and it is defined as: In this work, we will use two sources of images of concrete cracks: 1. The first source is derived from ultrasonic non-destructive testing images of internal cracks analyzed by wavelets. The multiresolution images are then classified into cracks/no cracks by deep learning.
The procedure for these images is described in Section 2. This is our main contribution here.
2. The second source of images comes from SDNET2018 dataset. SDNET2018 is anannotated image dataset fort raining,validation, and benchmarking of artificial intelligence based crack detection algorithms for concrete. SDNET2018 contains over 56,000 images of cracked and non-cracked concrete bridge decks, walls, and pavements (see Figure 11). The dataset includes cracks as narrow as 0.06 mm and as wide as 25mm. The dataset also includes images with a variety of obstructions, including shadows, surface roughness, scaling, edges, holes, and background debris (see Figure 12). SDNET2018 will be useful for the continued development of concrete crack detection algorithms based on deep convolutional neural networks.

Implementation aspect and results analysis
For both data sources, i.e., experimentally obtained NDT and wavelet-transformed images, and images from SDNET2018 dataset [49], we selected 1000 images with cracks and 1000 images without cracks. The size of each image is 256×256 pixels RGB.
For the SDNET2018 dataset, these are images of concrete bridges and it was possible to introduce various changes, such as changes in lighting conditions and crack characteristics as well as crack surface texture in order to further test the generalizability of the model and make a more comprehensive evaluation of the model.
All In order to have a comparative evaluation of the performance of the deep learning architecture to adopt, we used both the version of AlexNet [50] represented in figure 9 and an architecture of the ResNet type [51] . ResNet, short for Residual Networks, is a classical neural network used as the backbone for many computer vision tasks. This model was the winner of the ImageNet competition in 2015. The fundamental advance of ResNet is that it successfully trains extremely deep neural networks with over 150 layers. Before ResNet, training extremely deep neural networks was difficult due to the problem of evanescent gradients. AlexNet, the winner of ImageNet 2012 and the model that apparently kicked off deep learning, had only 8 convolutional layers, the VGG network had 19, Inception or GoogleNet had 22 and ResNet 152 had 152. In our work, we will code a ResNet-50 which is a reduced version of ResNet 152 and is frequently used as a starting point for transfer learning.
ResNet is a powerful backbone model that is used very frequently in many computer vision tasks. Its originality is to use skip connection to add the output from an earlier layer to a later layer. This helps it mitigate the vanishing gradient problem.
In this study, we used Keras to load their pre-trained ResNet 50 (see Figure 13).

Figure 13. ResNet 50 Model
ResNet-50 model consists of 5 stages each with a convolution and identity block. Each convolution block has 3 convolution layers and each identity block also has 3 convolution layers. The ResNet-50 has over 23 million trainable parameters.
Classical Adam optimization of stochastic gradient descent is used for training [52]. Table 1 shows the results from the NDT procedure based on multiresolution analysis to detect internal cracks in a concrete structure. Table 2 shows the results from SDNET2018 dataset. In both tables, two deep learning architectures are compared in terms of Accuracy, Precision, Recall and F1 score.  Table 1 shows that the performance of the ResNet50 architecture is superior to that of AlexNet. This was to be expected but the difference is not very pronounced. It should be noted that the Accuracy from the method proposed here which is the detection of internal cracks from NDT followed by a wavelet-based multiresolution analysis is capped at 90%.
On the other hand, Table 2 shows high performance and the difference between the ResNet50 and AlexNet architecture is more clear.
The apparent limitation of the NDT-Mutiresolution Analysis method is explained by the fact that the crack is internal and is more difficult to detect than a surface crack. In reality, the method we propose is very efficient since it allows the detection of an invisible crack by optical means which would avoid many disasters in sensitive structures.

Conclusions
In this work, we proposed an original method for monitoring cracks in concrete structures. This method focuses on internal cracks or on the beginning of cracks invisible from the outside.
Such cracks are detected by ultrasonic NDT and analyzed by wavelets providing a space-scale image allowing to localize the crack in space and at each resolution.
The resulting multiresolution image is then subjected to a crack/non-crack classification process based on Deep Learning (AlexNet, ResNet).
We have shown that it is possible to reach an accuracy of 90%. Such a result is very positive and shows that our approach is unavoidable when it comes to "securing" vital economic structures such as nuclear power plants and dams where the initialization of an optically invisible crack can cause major disasters.