Approximating the Steady-state Temperature of 3D Electronic Systems with Convolutional Neural Networks

: Thermal simulations are an important part in the design of electronic systems, especially as systems with high power density become common. In simulation-based design approaches, a considerable amount of time is spent by repeated simulations. In this work, we present a proof-of-concept study of the application of convolutional neural networks to accelerate those thermal simulations. The goal is not to replace standard simulation tools but to provide a method to quickly select promising samples for more detailed investigations. Based on a training set of randomly generated circuits with corresponding Finite Element solutions, the full 3D steady-state temperature ﬁeld is estimated using a fully convolutional neural network. A custom network architecture is proposed which captures the long-range correlations present in heat conduction problems. We test the network on a separate dataset and ﬁnd that the mean relative error is around 2% and the typical evaluation time is 35ms per sample (2ms for evaluation, 33ms for data transfer). The beneﬁt of this neural-network-based approach is that, once training is completed, the network can be applied to any system within the design space spanned by the randomised training dataset (which includes different components, material properties, different positioning of components on a PCB, etc.).


Introduction
Physics simulations are becoming an essential aspect in the design of electronic systems. For an optimally designed electronic system, the interplay of many different physical domains has to be taken into account. For instance, for the design of an efficient power converter co-simulations including the generation of heat via electronic losses as well as the conduction and heat rejection via forced convection have to be studied [1]. Such coupled simulations lead, however, to large computational requirements. Often, the large simulation times render thorough automatic optimizations of designs impossible.
Machine learning (ML) techniques have been identified as possible candidates to increase the computational speed. The application of machine-learning techniques to the design of electronic systems focuses mainly on two aspects [2]. The first aims at reducing the number of required design iterations, see e.g. [3,4]. As an example of such an approach, an optimal (from a thermal point of view) placement of a chip on a system required, applying genetic algorithms, around 1000-10000 FEM simulations [5], which, however, still takes a considerable amount of time. A second use of ML techniques consists of using model for components based on neural networks (NNs) that can capture the nonlinear behaviour in circuit simulations. Among others, this method has been successfully applied to model an inductor [6], a MOSFET [7], a power diode [8], and a supercapacitor [9]. For the design optimization, sweeps over the design space to evaluate the Pareto front [6] or a combination with conventional design optimization methods [10] are feasible due to the fast evaluation time of these NN based models.
In addition to thermal simulations, the application of different NN architectures to perform physics simulations has been the focus of much research. For instance, the use of NNs has been explored for fluid dynamics problems [11][12][13], for electroconvection [14], transport problems [15,16], magnetic field estimation [17], classical mechanics [18] and in replicating multi-particle evolutions by learning the pair-wise interaction function [19].
Although these physical phenomena clearly have a three-dimensional nature, the machine learning community has mostly focused on approximations of physical simulations in lower dimensions only. Especially, the representation of three-dimensional full-field solutions for complex geometries remains a challenge. One major difficulty when going from 2D to 3D representations of a system is the limited memory of the GPUs [20] used to train and evaluate NNs. One way to reduce memory requirements would be to use a smaller representation of the geometry instead of images [21]. Although some alternatives, like point clouds based on CAD geometries [22][23][24] or octrees were proposed [25], most available NN architectures are still developed for images.
For this type of applications, so-called physics-informed NNs have been developed that aim at embedding prior physics knowledge into the design of the NNs [26]. For this purpose, the autogradient feature of the NN is used to construct a loss term based on the underlying differential equation. These techniques require the network to be defined in terms of the absolute coordinate positions, i.e. given the input coordinates, the NN is trained to give the solution at the given position. Thus, these techniques lead to promising results in simple geometries e.g. for flow problems [27]. However, it is not clear how complex geometries could be included. Additionally, advanced ML features like convolutional kernels (which automatically lead to the preservation of translational invariance [28]) cannot be used.
In this paper we explore the possibility of generating full-field 3D approximations of the temperature distribution in an electronic system using NNs. We consider simplified representations of electronic components and systems with no electronic functionality, keeping only the essential properties for the present work to serve as a proof of principle. Here, we focus on the steady-state, equilibrium thermal configuration of the systems since training data can be easily obtained from FEM simulations.

Data set generation
In this work a fully-convolutional neural network (FCN) was trained to approximate the 3D FEM solutions of electronic systems using supervised learning. This procedure requires a large dataset of random systems with corresponding FEM solutions. The dataset needs to be representative of the design space. This entails that all properties that one wants to change during the design process (component type / number / placement, material properties, heat sink specifications, external temperature), are randomised in the dataset. In total, 464 unique systems were generated using an automatised workflow in python (see Fig. 1 for a graphical summary):

1.
System generation: For each system the number and type of components is randomly chosen and they are placed at random locations on a PCB. Material parameters are assigned to the different parts of each component.

2.
Generation of FEM solutions: Initial and boundary values are randomly chosen, heat sources (i.e. losses) are assigned to some of the components, and the mesher and FEM solver are called.

3.
Voxelization: Postprocessing of the systems and the FEM solutions to generate a set of 3D images per system as input for the NN.

System generation
The main challenge in generating systems was to make them replicate the complexity of the 3D geometries and variation of material properties while ensuring that the resulting temperature fields stay within a physically plausible range. To this end, all systems were based on six different components. The components were designed manually using FreeCAD (one large and one small IC, one large and one small capacitor, and copper layers of different shapes and sizes, see Fig. 1a). Most components (e.g. the ICs) have an internal structure (e.g. legs, die). Each of these sub-part was assigned its typical material (e.g. copper for the legs). To account for variations in the material properties, a physically 3 of 14 plausible range was specified for each material type as the average value given in Tab. 1 ±25 %. The actual material properties of each part were randomly chosen (uniformly) from these ranges. The components were randomly placed on a fixed-size PCB of dimensions 25 mm × 25 mm × 2 mm (Fig. 1b). Such a random placing procedure does not produce functional electronic circuits. However, as long as the dataset used for the training of the NN captures the impact that different component placements have on the heat transfer, the NN is expected to generalise also to (unseen) functional electronic circuits. The size of the dataset was increased by four via three 90 degree in-plane rotations of each system, so that the total dataset consisted of 1856 systems.

Finite-element simulations
Three types of boundary conditions (BCs) were imposed on the systems for the finiteelement (FE) simulations: (1) a Dirichlet-type BC at the bottom of the PCB, which was set to a constant external temperature T ext , taken randomly per system from the range 300 K ± 25 %, (2) a Neumann-type BC represented by a heat sink on the top of the large IC, modelled via a heat transfer coefficient α sink randomly taken from the range 2538.72 W/(Km 2 ) ± 25 % and the external (far-field) temperature (in the steady-state this corresponds to an implicit flux BC), and (3) on all other outer surfaces a heat transfer coefficient α was prescribed, taken randomly from the range 14 W/(Km 2 ) ± 25 %. Electronic losses were modelled as constant, predefined volumetric heat sources located at the silicon die of the ICs and at the core of the capacitors, with values chosen randomly from the ranges specified in Tab. 2.
A conformal mesh was generated using gmsh [29], which consisted on average of 3 million elements. Steady-state solutions were obtained with the open-source FEM solver ElmerSolver [30,31]. The main advantage of ElmerSolver is that a scriptable interface between FreeCAD (for automatic geometry generation), to ElmerGrid / gmsh (for automatic tetrahedral mesh generation) and ElmerSolver exists which enables an automatised workflow for the system generation.

Voxelization
For the NN, the 3D-systems were represented by a stack of 3D-images. This enabled us to apply NN methods previously developed for vision tasks to our problem (Fig. 1d). Specifically, the input of the NN consisted of multiple channels where each channel was a 3D-image that represented one of the physical properties (heat conductivity, external temperature, density, heat capacity, BC values) of the system. To create these 3D-images, the geometries of the components were voxelized independently using a custom FreeCADpython script. The voxel size was 0.19 × 0.19 × 0.05mm so that a batch of ten images fits in the GPU memory, while the voxel size still resolves sufficient structural detail. The smaller resolution in z-direction was chosen in order to resolve the thin copper layers. Then, each component in the original systems was replaced by the voxelized representations. To create the different channels, the voxelized geometry in the images was replaced part by part by the corresponding material parameter used in the FEM simulation. For our systems, this procedure led to images with 128 × 128 × 128 voxels.
As labels for the supervised learning procedure the temperature solution obtained by the FEM solver was used. The 3D temperature field was directly voxelized during post-processing by Elmer. The FEM solution is only available within the geometry (where a mesh is available). The outer image regions (we call them "air" regions) were labelled as −1 and excluded in the loss definition.

NN architecture
The input of the NN consists of multi-channel 3D images and the output is a singlechannel 3D temperature map. To this end, a 3D fully-convolutional NN is used in this work.
Note, first of all, that one of the design goals was to have a network with a relatively small number of parameters and, hence, low memory requirements. The reason is a technical one; the fastest implementations of NNs are currently optimised to run on GPU cards, where the memory available is limited (typically of the order of tens of GBs). Since the 3D systems of this work are represented as 3D images with several channels, which on their own require already a significant amount of the available memory, it is convenient that the network itself consumes a minimum amount of memory. The NN architecture should, thus, be chosen to match the physical problem.

Properties of heat propagation
For the choice of an appropriate NN, it is important to understand the main features of the heat propagation problem. Assuming that the material properties in the system do not change with temperature, the thermal behaviour of the system is determined by the heat equation where ρ is the density, c P the constant-pressure specific heat, k the heat conductivity and h a heat source. All of them depend on the position x (although in the generated systems they are constant over the different parts of each component). The term ( u · ∇)T represents convection, which is ignored for simplicity. A solution of the heat equation is fully determined only after imposing boundary conditions, which in this case are the Dirichlet and Neumann boundary conditions discussed in Sec. 2.1.2. As can be seen, density, heat capacity and heat source do not appear separately in the equation, but rather as the products ρc P and ρh. This implies that any solution of the equation will depend on those products only and, thus, it is sensible to use them as input to the NN as explained below. Moreover, this work considers the stationary (thermal equilibrium) case only, where the temperature field T(x) is no longer time dependent. If convection is ignored as well, the term ρc P disappears from the equation, which implies that the solution will not depend on that factor either. It is also instructive to explore exact analytical solutions of the heat equation, even though they apply in very idealised situations only. A fundamental solution of the heat equation, called a heat kernel solution, is a time dependent solution of the equation for an initial point-like heat source at a known position. This can be extended to a space and time dependent heat source by convolving point-like solutions under an integral. A 1D heat kernel solution for an extended heat source h(x, t) reads Formally, a general solution of the heat equation can be found in terms of the heat kernel solution. Inspecting the heat kernel solution, it is possible to anticipate the presence of exponentials as a general feature of solutions of the heat equation and the fact that the contributions from different spatial points are aggregated via an integral. In order to fulfil the requirement of having a small network whilst still preserving key aspects of the heat propagation problem, a custom FCN was designed, as described in the following subsections.

Long-range correlations. Fusion blocks.
One of the most important aspects, that the NN must capture in order to successfully approximate the results of a standard simulation, of the heat problem in the steady-state regime is the presence of long-range effects. Most of the convolutional NNs (CNNs) available in the literature have been developed for the classification of images (in the case of 2D CNNs) or of video sequences (in the case of 3D CNNs). The performance of several of the standard low-resource 3D CNNs (e.g. [32]) on the dataset described in Sec. 2.1.2 was analysed but the temperature fields obtained with such networks were very sharp, not resembling a realistic situation, where around a hot spot a temperature gradient is to be expected. Moreover, heat propagation from a heat source to a distant but thermally connected point in the system was absent.
With the goal of recovering long-range effects in the system, another feature of convolutional layers called dilation was explored, see Eq. A2. In short, usually in a CNN the kernels act on contiguous pixels. The dilation parameters d h , d w and d h in Eq. A2 reflect how many pixels (or voxels) are skipped when applying the kernel, with d = 1 meaning a standard convolution, d = 2 if one pixel is skipped, etc. Moreover, as stressed above with regards to the heat kernel solution, not only the solution takes long range effects into account, but it also aggregates them (via an integration in the case of the heat kernel solution)). We thus defined what we called enough predictions from the network.

Choice of activation functions
As explained in Sec. A, a NN is composed of a sequence of linear operations followed by a fixed non-linear operation called activation function. In a regression problem like the present one, a judicious choice of the type of non-linearities allows to keep the network small. From the heat kernel solution above, it can be seen that general solutions make use of the exponential function. It is thus convenient to use non-linearities that contain an exponential. Using too many of them, however, turns out to be damaging for the performance of the network. The reason is that, several consecutive layers containing an exponential function lead to a exp(− exp(− exp(...)))-type functions acting on the input. After some exploration, a SELU activation function [33] was chosen for one of the layers only (see Fig. 3 with α ∼ 1.6733 and σ ∼ 1.0507. For the rest of the layers a LeakyReLU [34] was used with S = 0.01.

Input to the network
Another important aspect to take into account in the development of NN-based solutions is the choice of the input given to the network. As highlighted in Sec. 2.2.1, the density and heat source appear as products only in the heat equation. For the steady-state solutions the heat capacity does not contribute and, thus, will not be fed to the network in this work.
As discussed, on every system boundary conditions of the Dirichlet and Neumann types were imposed, a Dirichlet boundary condition on the bottom of the PCB, whose temperature is fixed to the external temperature value, and a Neumann temperature condition on every other exposed surface, where the heat flux is fixed via a heat transfer with n the normal direction. Moreover, the input of NNs is typically rescaled to be in the [0, 1] range since that makes the training of the networks more stable numerically. This can be easily achieved by dividing each of the channels by their maximum value. However, in this work, a more physically meaningful rescaling scheme was chosen. This comes at the expense that each of the input channels is not strictly in the [0, 1] range but still numerically small. Namely, characteristic values were introduced for each of the relevant dimensional quantities.
These are the maximum heat power P max = 20 W, the maximum temperature T max = 1000 K, and a length scale l 0 = 0.4 mm. Note, that the maximum values defined here do not represent a hard limit at which the network will fail to work since the network does not strictly require all values to be smaller than one. The dimensionless channels are then defined as The heat source channel C 1 is defined in terms of the heat source density per mass h, the density ρ and the total volume V body of the part where the heat density is applied. Similarly, the flux boundary condition channel C 3 is defined as the heat transfer coefficient α times the total surface area A body over which the loss is specified. Note that the Dirichlet boundary condition is not fed to the network as an input channel, but softly imposed via the loss (see below).

Network architecture.
For this work, a fairly thorough exploration of possible network architectures that would contain the features discussed so far was performed. As a result the architecture shown in Fig. 3 is proposed. In the figure, k refers to the kernels size, s to the stride, d to the dilations and C to the output channels of each layer (in the three cases the same for all dimensions). This network has only ∼370K trainable parameters, which is a tiny number compared to standard architectures (e.g. the 3D version of SqueezeNet [32,35] has 2.15M parameters). A brief explanation of the choices made is in order:

•
It was observed that too many downsampling layers have a damaging effect on the accuracy of the output. Downsampling in CNNs is used to extract useful features from images. In our case, the most relevant features are already part of the input, as discussed above. The main effects of downsampling in our case are to aggregate long-range effects in addition to the dilation in the Fusion blocks and also to reduce memory requirements. Thus, only two downsampling layers were used.

•
As is well known in FCNs, skip connections help avoid the usual checkerboard artifacts in the output. In this work three additive skip connections were used in the network, from the output of the Fusion blocks to the input of the two upsampling layers (transpose convolutions; see Sec. A) and the final convolutional layer, respectively. • An initial depthwise Fusion block (depthwise means that channels are not mixed) provides the necessary additional preprocessing of the input data.

Objective function
For the loss function a mean L 1 relative loss term was combined with a penalty term for the Dirichlet boundary condition at the bottom layer of the PCB. The air region was excluded from the loss evaluation because no FEM solution is available there (note, however, the values in the air region are indirectly influencing the temperature distribution within the system via the convolutions, which, as we will see, has a potentially negative impact on the accuracy of the result). A physics-informed loss as proposed by [13] did not improve the solutions. In fact, it drove the system to a uniform temperature distribution since in all parts of the system where no heat source is present, the differential heat equation error can be minimized by minimizing the temperature gradient. This is somewhat unexpected and further exploration on physics-informed losses for the present problem will be the subject of future work.

Results
From the generated dataset formed by 3D systems and the corresponding FEM solution, only a fraction of it is used to train the NN. Specifically, 75 % of the dataset was used as training set and the remaining 25 % as test set. It is the accuracy in reproducing the latter that truly measures the quality of the network solution.
In Fig. 4 a histogram with the mean absolute value per voxel of the relative error between the FEM and the NN solutions is shown. As can be seen, the mean relative error is at the percent level, with values below 2 % in most systems in the test set. Additionally, the Dirichlet boundary condition was softly imposed via the loss function on the bottom of the PCB. The network reached here an average 0.1 % relative error. The present work was initiated as a proof of principle for an NN-based tool to evaluate the viability of potential system designs from a thermal point of view. In terms of accuracy we judge the values achieved as accurate enough for that goal.
The mean value of the relative error, however, does not fully represent the (in-)accuracy of the NN results, as large but very localised error values can be smeared out when averaging over the large number of voxels in a system. More insightful is thus to compare FEM and NN solutions directly in their 3D representations. In Fig. 4 a representative collection of 3D solutions and 3D maps of the relative temperature differences is shown, picking one sample from some of the bins in the histogram discussed above, as indicated in the figure. For the majority of systems the NN solution reproduces the FEM solution reasonably well. The main discrepancies appear in relation with the small chip and on the surfaces of the different components and PCB.
Further details are shown in Fig. 5, where a section of a selected system is presented. There, on the surface of the PCB and also around the small chip, the L 1 error is somewhat larger. Also, in the inner part of the large chip some localised discrepancies can be seen, which are a reminiscence of the checkerboard artefacts of FCNs. However, these are clearly localised and do not seem to affect the overall temperature of the chip.
The second aspect to consider when judging whether the NN solution presented herein is a valuable tool in the design workflow of electronic systems is the evaluation speed. In Tab. 3 a comparison of the evaluation speeds between the NN and the FEM solutions is shown. The NN was run on an NVIDIA Titan RTX GPU whilst the FEM solver was run on a single thread of an Intel Xeon W-2145 CPU. This implies that the FEM evaluation time in Tab. 3 does not correspond to the time a fully optimised and parallelised FEM simulation would require. Nevertheless, we see that the NN provides a solution in ∼ 35 ms (with most of the time, in fact, used in transferring the 3D system to the GPU    It is worth discussing now some of the limitations of the FCN architecture suggested in this work. The first limitation stems from the fact that the architecture sets the maximum effective receptive field on the input images, which is determined by the combination of maximum dilation and number of downsamplings. Thus, even though the architecture is fully convolutional -which implies that it can immediately process larger systems than those it has been trained on -it will however not be able to capture any heat transfer over larger distances than those covered in the original receptive field. We have not tested what the implications of this limitation are. A second limitation arises from the typical length scales in the system. A minimum length scale is implicitly set by choosing the voxel resolution and, in principle, if a different voxel resolution has to be chosen the network would have to be retrained.
The main advantage of the approach presented here is that, once the network has been trained, it can approximate the temperature field of any system within the design space spanned by the randomised training dataset. This means that the network can be applied to systems with different number of components, different placement of those components, different material properties and different BCs than those it has been trained on. Therefore, the simulation time spent in generating the dataset is compensated by the gain in future simulations, as long as many of them are needed for a certain design task.

Confidence Estimation
Finally, we would like to discuss a possible confidence estimator for the NN solutions. The goal of a confidence estimator in this context is to assess the quality of the NN approximation when no FE result exists to compare with, as would be the case in any realistic use of our tool.
Our suggested confidence estimator is based on an integral form of the steady-state heat equation where Gauss' theorem was used to transform the integral over the system volume V to a surface integral in the first term. Since the NN prediction is available at each voxel, the equation is discretised to obtain an approximation per voxel e 6 ∑ i=1 e k(∂ i T)n i e ∆A i + e ρ e h e ∆V ≈ 0.
For the temperature gradient (∂ i T), the finite difference of the temperature (as predicted by the NN) of the voxel under consideration and its adjacent voxel (in the direction of the normal n i ) is inserted using a piece-wise linear interpolation of the temperature in cases where these two voxels have differing k values. The bottom of the PCB is kept at a fixed temperature. On all other outer surfaces the flux boundary condition is included by evaluating the first term with This heat equation error should measure the local violation of the steady-state heat equation.
If the net heat flux of a voxel is non-zero this implies that the temperature obtained from the NN does not represent a steady state. The local L1 loss values are compared with the local heat equation error for a particular sample (Fig. 6) for different sections of the system in the z-direction. The proposed estimator appears to qualitatively identify many of the regions where the error of the NN estimation of the temperature map is high compared to the FEM solution. It is also apparent, however, that not all voxels with high error values are identified.
Moreover, the estimator does not provide a clear quantitative measure of the error. The absolute values of the heat equation error are much larger in regions with high k than in regions with lower k. This could indicate that the piece-wise linear interpolation used for the approximation of the temperature is not a good estimation of the actual temperature distribution. However, at the current state, it is not possible to use higher order approximations since they would require more voxels per uniform-k region, which in turn conflicts with the limited RAM available on the GPU.
More research in the direction of finding a confidence estimator is clearly needed.

Conclusions
In this article a neural network architecture was proposed to approximate the temperature distribution of a 3D system of electronic components with heat losses and heat transfer through their boundaries. The main goal was to assess the performance and accuracy of neural networks for this task.
The network proposed is able to provide a simulation result in an evaluation time of the order of milliseconds. The relative error achieved, when compared to a standard FEM simulation for the same system, is around 2% averaged over the whole system, with larger errors localised mostly on the surfaces of the components. Limitations of the architecture proposed in its application to systems with different sizes and/or resolutions have also been discussed.
Finally we motivated a possible criterion for estimating the confidence of a prediction based on an integral version of the heat equation. Even though the criterion proposed does not allow for a quantitative estimate of the error of a given prediction, it may be used to identify regions with large estimate errors.
Author Contributions: Both authors contributed substantially to the conceptualization, methodology, and writing-original draft preparation, review and editing. All authors have read and agreed to the published version of the manuscript.
Funding: This work has been supported by Silicon Austria Labs GmbH (SAL), owned by the Republic of Austria, the Styrian Business Promotion Agency (SFG), the federal state of Carinthia, the Upper Austrian Research (UAR), and the Austrian Association for the Electric and Electronics Industry (FEEI). the purpose of downsampling is to force the CNN to extract relevant features from an image (e.g. edges)but, in this work, extracting features is not so relevant since they are, to some extent, manually implemented. Nevertheless, some degree of downsampling is necessary to reduce the memory requirements of the network. The inverse action to downsampling is upsampling. One can think of it as an interpolation procedure, where from one input pixel a higher number of pixels are generated as output. In practice, this is achieved by an operation called transposed convolution, which is mathematically very similar to the standard convolution in Eq. A1, with different weights and biases to be learned. Upsampling is necessary if downsampling is used in some of the layers of the network but the output of a CNN must be another image of the same size of the input image, as is our case. CNNs of this type are called fully-convolutional neural networks (FCNs).
Finally there is a different type of layer called pooling layer. They are parameterfree layers (non-learnable) that apply a fixed operation on their receptive field. Typical pooling layers include max-pooling and average-pooling layers, which give as output the maximum or the average value, respectively, of the pixel (resp. voxel) values in their receptive field.
The training of a CNN consists in finding the optimal values of the parameters w and b of each of the layers. This is achieved by minimising a certain objective function (loss) on a labelled dataset. For this work, as discuss in the main text, the dataset consists of a collection of simplified 3D circuits, each labelled by the result of a standard thermal simulation. The training is performed on a subset of the whole dataset (training set), whilst the rest are used to test the network (test set).