CFDBench: A Large-Scale Benchmark for Machine Learning Methods in Fluid Dynamics

In recent years, applying deep learning to solve physics problems has attracted much attention. Data-driven deep learning methods produce fast numerical operators that can learn approximate solutions to the whole system of partial differential equations (i.e., surrogate modeling). Although these neural networks may have lower accuracy than traditional numerical methods, they, once trained, are orders of magnitude faster at inference. Hence, one crucial feature is that these operators can generalize to unseen PDE parameters without expensive re-training.In this paper, we construct CFDBench, a benchmark tailored for evaluating the generalization ability of neural operators after training in computational fluid dynamics (CFD) problems. It features four classic CFD problems: lid-driven cavity flow, laminar boundary layer flow in circular tubes, dam flows through the steps, and periodic Karman vortex street. The data contains a total of 302K frames of velocity and pressure fields, involving 739 cases with different operating condition parameters, generated with numerical methods. We evaluate the effectiveness of popular neural operators including feed-forward networks, DeepONet, FNO, U-Net, etc. on CFDBnech by predicting flows with non-periodic boundary conditions, fluid properties, and flow domain shapes that are not seen during training. Appropriate modifications were made to apply popular deep neural networks to CFDBench and enable the accommodation of more changing inputs. Empirical results on CFDBench show many baseline models have errors as high as 300% in some problems, and severe error accumulation when performing autoregressive inference. CFDBench facilitates a more comprehensive comparison between different neural operators for CFD compared to existing benchmarks.


Introduction
Recent advances in deep learning have enabled neural networks to approximate highly complex and abstract mappings from large-scale data [26].This has led to the emergence of surrogate modeling as a technique to learn fast and approximate solvers for partial differential equations (PDEs) using neural networks, and this has shown some promising results in various domains [25,27,30,3].
One application of PDE solvers is computational fluid dynamics (CFD), which is a well-studied and important field with many practical applications.Therefore, the last few years saw many new attempts at developing better CFD methods with the help of deep neural networks [24].These neural models are trained on large-scale data by approximating input-output pairs through gradient descent.There are multiple reasons for adopting deep learning methods over traditional numerical methods.One advantage is mesh-independence.Numerical methods operate on meshes, and the mesh construction process is time-consuming and requires much expert knowledge to ensure convergence and good accuracy.In contrast, some numerical operators are mesh-independent, allowing input and output at arbitrary locations.Additionally, although deep neural networks need to undergo a long training process, with the help of modern hardware, they can be several orders of magnitude faster than numerical methods 3. We evaluate some popular neural networks on CFDBench, and show that it is more challenging than many virtual problems used in previous works, revealing some problems that need to be solved before these operators can replace traditional solvers.
2 Related works

Numerical Methods for Solving PDEs
Numerical methods have been widely used to solve CFD problems.The basic idea is to divide the original continuous solution area into many finite interconnected regions, enabling using a point (called node) to approximately represent an entire small region.Then, using different discrete methods, the governing equations (which are typically PDEs) can be reduced to algebraic equations called discrete equations.Solving these discrete equations gives us the values of the nodes.Some common discrete methods include finite difference methods, finite volume methods, finite element methods, spectral methods, and lattice Boltzmann methods (LBMs).
The main idea of the finite difference method (FDM) [43] is to approximates derivatives in partial differential equations (PDEs) at grid points, often using Taylor series expansions, converting PDEs into a system of algebraic equations.Although its theory is simple, it struggles to handle complex geometries and irregular boundaries.The finite volume method (FVM) [47] divides the calculation area into non-repeating control volumes and then integrates fluxes across control volumes (typically grid cells), ensuring strict adherence to conservation laws.The finite element method (FEM) [54] is based on the classical variational method (Ritz method [37] or Galerkin method [10]), which divides the domain into finite elements and approximates the solution using basis functions within each element, transforming PDEs into algebraic equations.The advantages of the FVM and FEM are good conservation and good adaptability to complex grids.At the same time, the disadvantage is high computing consumption and that its convergence is highly dependent on the quality of the mesh.The spectral method [11] uses the characteristics of the Fourier series to transform the nonlinear problem into a linear problem.Its advantages include high accuracy and great applicability to problems with periodic BCs, but it has considerable limitations such as divergence on discontinuous functions.Lattice Boltzmann methods (LBMs) [6] is a more recent method based on the thin (mesoscopic) scale model and Boltzmann gas molecular motion theory, with the advantage of fast solution speed, however, it often results in lower accuracy as a compromise.Concerning our work, most numerical methods with relatively high accuracy have very large computational costs.Despite many efforts to reduce such computational costs has been made, the costs are still very great in many industrial and commercial applications.Therefore, deep neural models, which can better make use of modern hardware, emerge as a viable workaround for reducing this computational cost.

Neural Networks
In the last decade, neural networks have demonstrated impressive capabilities in various computer vision and natural language processing tasks [26,17,18,4,8].A neural network consists of a large number of neurons.It can approximate any arbitrary mapping by automatically minimizing a loss function that is differentiable with respect to the model parameters.By iterating through a large set of input-output pairs, the model parameters are updated by gradient descent.Some common types of neural networks include feed-forward neural networks (FFNs), recurrent neural networks (RNNs) [19], generative adversarial networks (GANs) [15], convolutional neural networks (CNNs) [12] etc.
Regarding CFD problems, we generally want to model a flow field, which can be seen as a kind of condition generation task.This is one common objective of many applications of deep learning.More concretely, forward propagators such as numerical methods can be regarded as a conditional image-toimage translation task [21].Some notable works include [39,38,53].Of concern is ResNet [18] and U-Net [39].The former adds a residual connection which makes the model predict the shift from the input instead of the output directly, which empirically improves the performance and stability of image processing.U-Net shrinks the hidden representation in the middle of the ResNet, reducing the number of parameters and improving the globality of feature dependencies.

Neural Operators for Solving PDEs
There have been a great number of research works on applying neural networks to solve PDEs.These works can be largely classified into two categories: (1) approximating the solution function and (2) approximating the solution operator.
The former category is pioneered by physics-informed neural networks (PINNs) [34], a deep learning framework for solving PDEs in physics.The framework uses an FFN to approximate the solution to PDEs by learning the distribution of training data while minimizing the loss function that enforces constraints based on physics laws.A series of improvements to PINNs have been proposed.These include dividing the solution domain to speed up the convergence [23,22,20], combining the numerical derivative and adaptive derivative reverse propagation to improve accuracy [7].Some works focus on improving the neural architecture [32], by adopting convolutional layers instead of fully connected layers, such as PhyGeoNet [13], PhyCRNet [36], etc.However, these methods have limited applicability and only a few are evaluated on complex flow equations.Moreover, since PINNs approximate one solution function, they have to be retrained for every new input function or condition.
The second category learns a whole family of solutions by learning the mapping from input functions to output functions.[28] have proved that the neural operator can approximate any arbitrary nonlinear equations.Some notable neural operators include FNO [27], LNO [5], and KNO [50], among others.These operators are forward propagators similar to numerical methods but learn in other domains to achieve mesh independence.Another series of neural operators is the DeepONet [30], which encodes the query location and the input functions independently and aggregates them to produce the prediction at the query location.Many improvements based on DeepONet have been proposed [28,29,46,52,16,49,48].Because the number of existing neural operators is too great for us to evaluate every single one, we have selected a few representative ones in this paper to include as baselines.We encourage future works to evaluate on CFDBench to compare against these baselines.

Benchmarking Data-Driven Scientific Modeling
Following the popularity of data-driven methods in scientific models with the help of deep learning, several evaluation benchmarks have been proposed.WeatherBench [35] is a benchmark consisting of historical weather data extracted and processed from the ERA5 archive, but weather forecasting is one specific CFD problem where the geometry never changes, and there is large-scale historical data, which does not exist for many other scenarios.PDEBench [44] is a multi-task benchmark on several classic PDE problems, but it only includes two CFD problems where the BCs are always periodic, and physical properties and geometries are also constant.MegaFlow2D [51] is a large-scale CFD-specific benchmark with 2 million snapshots of generated flows.However, this benchmark contains only two problems, and the BCs and physical properties are the same across all snapshots, rendering it unable to measure neural operators' generalization along these dimensions.

CFDBench
In this section, we first give a formal definition of the flow problems.Then, we present the four flow problems included in CFDBench, along with the parameters and various considerations during dataset construction.For each problem, we generate flows with different operating parameters, which is the term we use to refer to the combination of the three kinds of condition: (1) the BCs, (2) the fluid physical properties (PROP), and (3) the geometry of the field (GEO).Each kind of operating parameter corresponds to one subset.In each subset, the corresponding operating conditions are varied while other parameters remain constant.The goal is to evaluate the ability of the data-driven deep learning methods to generalize to unseen operating conditions.Figure 1 shows an example snapshot of each of the problems in our dataset.

The Definition of Flow Problems
The Navier-Stokes equations can be formalized as follows.
where ρ is the density and µ is the dynamic viscosity, u = (u, v) ⊤ is the velocity field, and p is the pressure.
Suppose the fluid is incompressible (ρ = const) and the fluid is a Newtonian fluid (τ = µ du dy ).Combining the continuum hypothesis and Stokes' law, we get the following equations inside the flow domain (when (x, y, t) ∈ D).
and (u, v) are constant on the boundaries (∂D).
In this work, we consider four important and representative fluid problems that can comprehensively evaluate different methods' capabilities in different problems.They are (1) the flow in the lid-driven cavity, (2) the flow into the circular tube, (3) the flow in the breaking dam, and (4) the flow around the cylinder.These flow problems cover most of the common flow phenomena.They have both open and closed systems and vary in shape.The system boundaries include both moving/stationary boundaries and velocity/pressure inlet and outlet boundaries.They include vertical flows within gravity and plane flows without gravity.Their flow characteristics include the formation of a viscous boundary layer, the formation and shedding of vortexes, and the formation of jets.They have both single-phase flow and two-phase flow, both laminar flow and turbulent flow.However, in order to ensure the cleanliness of the data, that is, to ensure that the data fully satisfy the above equation, we regard the flow as the flow of incompressible Newtonian flow, ignoring the mass transfer at the two-phase interface and the energy dissipation during the flow process.
For simplicity, we will refer to the four problems as (1) cavity flow, (2) tube flow, (3) dam flow, and (4) cylinder flow.For each problem, we use different operating parameters and generate the flow fields using numerical methods.

Cavity Flow
Cavity flow refers to a flow in a square container with a moving upper wall surface (i.e., the lid) and three stationary walls.Due to viscosity, the moving wall drives the fluid in proximity to move in the same direction until the stationary wall forms a jet impacting the lower wall and then forms a secondary vortex.On the one hand, the lid-driven cavity flow has a wide range of applications in the industry, such as the transient coating (short dwell coating) process [2], the ocean flow affected by the wind, and so on.On the other hand, the special case is that the BC is discontinuous [41] at the connection of the moving wall and the stationary side wall, which makes it judge the convergence of numerical methods.Thus, it is widely used to verify the accuracy of computational fluid mechanics software or numerical methods [14].Therefore, the construction of the top lid-driven cavity flow data set is beneficial to study the ability of the neural network model to solve the flow problem.
In the dataset with the cavity flow, the baseline conditions are ρ = 1kg/m 3 , µ = 10 −5 P a • s, l = d = 0.01m, u top = 10m/s, where ρ and µ are the density and viscosity of the fluid, l and d are the length and width of the cavity, and u top is the top wall movement velocity.50 different cases are generated by varying u top from 1m/s to 50m/s with a constant step size.84 cases are generated varying the physical properties of the working fluid, with 12 different values of density and 7 values of viscosity.For the cases with different geometries, we choose different combinations of length and width from {0.01, 0.02, 0.03, 0.04, 0.05}.To have an appropriate scale of difference between the frames, we set the time step size to ∆t = 0.1s.

Tube Flow
The tube flow refers to a water-air two-phase flow into the circular tube filling with air The boundary layer in the circular tube is one of the most common flows, which means that the viscosity resistance of the fluid on the near-wall surface is greater than the fluid in the bulk flow region.When the water flows into the round tube filled with air, we can clearly see that the flow is slow near the wall and fast in the center.Therefore, the construction of water-air laminar flow in a circular tube is beneficial to study the ability of the neural network structure to capture the two-phase interface and to learn the laminar boundary layer theory.
In the dataset of the tube flow, the baseline conditions are ρ = 100kg/m 3 , µ = 0.1P a•s, u in = 1m/s, d = 0.1m, l = 1m, where ρ and µ are the density and viscosity of the fluid, u in is the inlet velocity (from the left), d and l is the diameter and the length of the circular tube.50 cases were generated for different BCs, increasing the inlet velocity from 0.1m/s to 5m/s, with increments of 0.1m/s.100 cases with different physical properties of the working fluid are generated, and the two-dimensional space of different densities and dynamic viscosity are shown in Table 3, where the density increases from 10kg/m 3 to 1000kg/m 3 with increments of 110kg/m 3 , and viscosity increases from to and the viscosity increases from 0.01P a • s to 1P a • s with increments of 0.11P a • s.For different geometries, the diameter of the circular tube is taken from {0.01, 0.05, 0.1, 0.3, 0.5}, and we choose five different ratios of diameter and length by making sure the length satisfies 0.1 ≤ l ≤ 10.This results in 25 different geometries.To have an appropriate scale of difference between the frames, we set the time step size to ∆t = 0.01s.

Dam Flow
A dam is a barrier across flowing water that obstructs, directs, or slows down the flow.Meanwhile, sudden, rapid, and uncontrolled release of impounded water quickly causes a dam to burst [1].To further understand the flow of water over the dam, we simplified it to the flow of water over a vertical obstacle.When the Reynolds number is low, the fluid is dominated by the viscous force and will flow vertically down the wall as it flows through the dam [33].As the speed increases, the fluid is more affected by the inertial force, and a jet will be formed.Then the fluid falls to the boundary because of gravity and the collision with the boundary makes more reverse flow, which will hit the dam with a bigger velocity than the inlet.Therefore, the dam flow dataset is helpful in studying the learning ability of the model for flows subject to different viscous and inertial forces.
In the dataset of dam flow, the baseline conditions are ρ = 100kg/m 3 , µ = 0.1P a • s, u in = 1m/s, h = 0.1m, w = 0.05m, where ρ and µ are the density and viscosity of the fluid, u in is the inlet velocity (from the left), h and w is the height and the width of the dam obstacle.The entire fluid domain is 1.5m long and 0.4m high.The inlet velocity boundary is close to the ground, with a total length of 0.1m, and 0.3m above it is the inlet pressure boundary.The barrier is located 0.5m from the entrance.70 cases were generated for different BCs, increasing the inlet velocity from 0.05m/s to 1m/s with increments of 0.05m/s and from 1m/s to 2m/s with increments of 0.02m/s.100 cases with different physical properties of the working fluid are generated, and the two-dimensional space of different densities and dynamic viscosity are shown in Table 3, where the density increases from 10kg/m 3 to 1000kg/m 3 with increments of 110kg/m 3 , and viscosity increases from to and the viscosity increases from 0.01P a • s to 1P a • s with increments of 0.11P a • s.50 cases with different geometries are generated, increasing the height from 0.11m to 0.15m with increments of 0.01m and width from 0.01m to 0.09m with increments of 0.01m of dam obstacle.To have an appropriate scale of difference between the frames, we set the time step size to ∆t = 0.1s.

Cylinder Flow
A flow around a cylinder is a typical boundary layer flow, which is commonly seen in the industry where water flows through bridges, the wind blows through towers, etc [40].When the fluid with a large flow rate passes around the cylinder, the boundary layer fluid separates to form the reverse zone due to the combined effect of reverse pressure gradient and wall viscous force retardation.At a specific Reynolds number, the two sides of the cylinder periodically generate a double row of vortexes with opposite rotational directions and are arranged in a regular pattern.Through nonlinear interactions, these vortexes form a Karman vortex street.after nonlinear action.Therefore, the cylindrical flow dataset is important for examining the capability of neural networks in modeling periodic flows with obstacles.
In the dataset of the cylinder flow, the baseline conditions are ρ = 10kg/m 3 , µ = 0.001P a • s, u in = 1m/s, d = 0.02m, x 1 = y 1 = y 2 = 0.06m, x 2 = 0.16m, where ρ and µ are the density and viscosity of the fluid, u in is the inlet velocity (from the left), d is the diameter of the cylinder, x 1 , x 2 , y 1 , y 2 is the distance between the center of the cylinder and the left, right, top and bottom boundaries, respectively.50 cases are generated for different BCs, increasing the inlet speed from 0.1m/s to 5m/s with increments of 0.1m/s.115 cases are generated for the different physical properties of the fluid so that the Reynolds numbers are in the range of [20,1000].Table 4 shows some values of density and viscosity, but not all combinations are used because that results in Reynolds numbers outside of the target range.For different geometries, the distance from the cylinder to the upper and lower boundaries and the entrance is taken from {0.02, 0.04, 0.06, 0.08, 0.1}, the distance from the cylinder to the exit boundary is taken from {0.12, 0.14, 0, 16, 0.18, 0.2}, and the radius of the cylinder is taken from {0.01, 0.02, 0.03, 0.04, 0.05}.20 cases are generated.To ensure an appropriate scale of difference between the frames, we set the time step size to ∆t = 0.001s.Table 5: Operating parameters of the subset in the cylinder flow problem.
Table 6: Breakdown of the number of cases in each problem (the rows) and the corresponding subsets (the columns) in CFDBench.Each problem contains three subsets, each with one type of operating condition parameter that is varied.

Number of cases
Each frame

Data Generation
All the data in this paper are generated by ANSYS Fluent 2021R1.In order to calculate the viscosity term accurately, the laminar model is used for laminar flow and SST k −ω model for turbulent flow.All solvers used are based on pressure.We choose a Coupled Scheme for single-phase flow and SIMPLE for two-phase flow as a pressure-velocity coupling algorithm.The pressure equation uses the second-order interpolation method (the VOF model uses the PRESTO!Interpolation method), and the momentum equation adopts the second-order upwind method.The time term adopts the first-order implicit format and interpolation uses the least squares method.To capture the phenomenon of boundary layer separation at the near-wall surface, the size of the first layer mesh in the near-wall surface is encrypted to 10 −5 m.To ensure the accuracy of the computational model and results, all computational models underwent grid-independent validation.
After discretizing the governing equations, the conservation equation of the universal variable(Φ P ) at the grid element P can be expressed as: in which a P is coefficient of the node of element P , a nb is coefficients of neighbor nodes and b is the coefficient generated by constant term, source term and boundary condition.It defines the global scaling residual as: The residual represents the relative size of the total unbalance term in the computational domain, and is generally used to judge the convergence of the solution.The smaller the residual, the better the convergence.In this paper, the residual convergence condition of all terms is set to 10 −9 , and the residuals in the final calculation results are shown as Figure 2. The residuals of the velocity terms are all at least 10 −6 .
All generations are run with 30 solver processes on a CPU of AMD Ryzen Threadripper 3990X.The final generated data was interpolated to a grid size of 64 × 64.

Training Subsets
We divide the data of each problem into three subsets, BC, PROP, and GEO.Then, we use the seven different combinations (PROP, BC, GEO, PROP + BC, PROP + GEO, BC + GEO, All) to evaluate the generalization ability along different dimensions of various baseline methods.The specific number of data in the subsets is shown in 6, and each combination is the sum of all the data of the two subsets.

Data Splitting
Each subset of data is split into training, validation, and test sets with a ratio of 8:1:1.To ensure that the operating parameters in the test set are never seen during training, we require that the frames of each case are never distributed among different splits.

Experiments
After generating the benchmark data, we use it to train popular data-driven neural networks that can be used for approximating the solutions to PDEs.To keep the number of experiments manageable, in the following discussions, unless stated otherwise, we have the models predict the velocity field.We believe that modeling other properties or components of the flow should not be too different.
We first define the learning objective of the neural network.Then, we give a brief description of the baselines we experimented on.After that, we explain the loss functions and hyperparameters used in the experiments.

Training Objectives
Most flow problems focus on solving the distribution of flow fields in the domain.Therefore, the objective of the neural networks is to approximate the following mapping within the domain where Ω = (u B , ρ, µ, d, l, w) is the operating parameters, which include the BC (u B ), the physical properties (ρ, µ), and the geometry S. Σ is the input function, which can be either the velocity field at a certain time (in autoregressive generation) or the spatiotemporal coordinate vector (x, y, t) (in the non-autoregressive model).u is the output function, which is the velocity field.
When using a neural network f θ with parameters θ to approximate G, there are two approaches: non-autoregressive and autoregressive modeling.

Non-Autoregressive Modeling
In non-autoregressive modeling, the input function Σ is a query location (x, y, t) and the model directly outputs the solution at that position:

Autoregressive Modeling
Autoregressive modeling, which is similar to traditional numerical methods, learns the mapping of a flow field from the current time step to the next time step.Therefore, it predicts the distribution of flow fields at each moment according to the temporal order: where û is the predicted value at time t, n and m are the height and width of the domain.In other words, the input function is Σ = u (t − ∆t).
The learning goal is to find one θ * that minimizes the loss function L on the training data T .

Baselines
We evaluate on CFDBench some popular and performant neural networks that have been applied to solve PDE in existing works.Although CFDBench can be used to evaluate both data-driven and physics-informed methods, our experiments are limited to the former.This is because most physicsinformed methods enforce operating conditions through loss functions, requiring retraining on unseen conditions.
We can generally predict the flow in two manners: non-autoregressively or autoregressively.The former directly predicts the output function value at a query location specified in the input.The latter predicts the field at the next time step given the field at the current time step.The two kinds are not directly comparable, so we discuss them separately.
From the perspective of the model architecture, we can categorize them into three types: (1) FFNs, (2) the DeepONet family, and (3) image-to-image models.Models within the first category simply concatenate all inputs into one vector and maps that to the prediction space with an FFN, shown on the left of Figure 3.The second category includes all variants of DeepONet [30], shown in the middle of Figure 3.The essence of this architecture is that the query location is independently encoded by a trunk net.This makes it possible to encode the input functions and other conditions without being limited to the shape or mesh of the output function domain and reuse that encoding to query the value of the output function at any location.The third category contains ResNet, U-Net, and FNO, shown on the right of Figure 3.They are the models that accept a n-dimensional array and output another n-dimensional array, which is the architecture that is commonly used for image-to-image tasks.Thus, we name this category image-to-image models.Table 7 compares all the baselines that we consider in this paper and Figure 3 figuratively illustrates the types and shapes of the input and output of each model.In B, we will briefly describe the structure of each baseline model.

Conditioning on Operating Parameters
Most existing works on neural operators keep the operating parameters (Ω) constant, and the input function, which is the IC, is the only input to the operator.In contrast, CFDBench considers varying the operating parameters while keeping the IC constant.Consequently, we need to make appropriate modifications to existing neural models for PDEs such that the predictions can be conditioned on the operating parameters.
For the autoregressive models, we treat the problem as a conditional image-to-image translation task, where the velocity field at the previous moment u(x, y, t−∆t) is the input image, the velocity field at the current moment u(x, y, t) is the target image, and the operating condition parameters Ω are the condition.For simplicity, we add Ω to the input as additional channels, one channel for each parameter.In this work, there are 5 parameters in Ω, so the input at position (x, y) is (u(x, y), u B , ρ, µ, h, w) where h, w are the height and width.For the flow around a cylinder, the model also needs to know the location and shape of the obstacle.To this end, we add a mask channel where 0 indicates obstacles at that position and 1 indicates no obstacles.

Loss Functions
During training, we use the normalized mean squared error (the NMSE defined below) as the training loss function to ensure that the model would prioritize minimizing the difference for labels with smaller absolute values. 4For evaluation, we also report the following three kinds of error values for comprehensiveness.We denote the label value with Y and the predicted value with Ŷ.

Mean Square Error (MSE)
Normalized Mean Square Error (NMSE) Mean Absolute Error (MAE) As we will show with experiments in Section 5, one method may perform better than another method in terms of one metric, but perform worse in terms of another metric.Therefore, it is important for practitioners to select one or multiple metrics that best reflect their interests.

Hyperparameter Search
The performance of the methods is dependent on the hyperparameters such as learning rate, number of training epochs, etc.Because our problem setting is significantly different from existing works, the optimal hyperparameters of each baseline model are likely very different from the ones found by the authors.We perform a hyperparameter search of the baseline models using the PROP subset of the cavity flow problem (84 different flows).
A more detailed description of the hyperparameter search process can be found in the Appendix.In summary, to make the methods comparable, we generally want to keep the number of parameters to be roughly the same. 5For ResNet, U-Net, and FNO, we try different depths and numbers of hidden channels.We also experiment with new ways to inject operating parameters.For FNN and variants of DeepONets, we try different widths and depths of the hidden linear layers.Additionally, the learning rate is selected individually for each method based on the validation loss, and we always train until convergence.For ResNet, we conducted a hyperparameter search on the depth d (i.e., the number of residual blocks) and hidden dimension h (i.e., the number of channels of the output of each residual block).We found that ResNet's ability to learn from flow problems is poor, and it quickly becomes unable to converge when d and h increase 6 .The setting with the lowest validation loss is d = 4 and h = 16, which we used to train on the data of flow in the tube, and the test loss is shown in Table 8.The result shows that ResNet's performance is generally slightly worse than the identity transformation.One plausible explanation for this is that ResNet is poor at modeling global dependencies, i.e., the input signal at any point after one convolution layer with a k × k kernel can only spread around the original position within its neighboring k × k range.Therefore, we do not consider ResNet in further discussions below.

Other Details
For autoregressive models, we always train the model on one forward propagation, while for nonautoregressive models to train on randomly sampled query points on the entire spatiotemporal domain.
We tune the learning rate on the cavity PROP subset, and always have it decay by a factor of 0.9 every 20 epochs, which we empirically found to be effective.One may get better performance by tuning more hyperparameters, such as trying different learning rate schedulers and tuning them on the entire dataset.However, that is prohibitively expensive considering the size of the dataset.All methods were implemented using PyTorch deep learning framework, and all experiments were executed on one local computer with one RTX 3060 GPU.Most results are the average of three runs with different random seeds.

Results
Our analysis of the experimental results commences with the prediction of the flow field distribution at a singular time step, subsequently progressing to the autoregressive inference of multiple sequential time steps.To evaluate the predictive capabilities, we conduct a comparative assessment of both non-autoregressive and autoregressive models.Additionally, we provide a comparative analysis of the computational power consumption associated with each of these models.

Single Step Prediction
Figure 9 and Figure 10 show the predicted velocity field of all baseline models on the three subsets of the four flow problems in CFDBench.From top to bottom, the first row is the input, the second row is the label, and the following are the predictions of non-autoregressive and autoregressive models.We find, in general, that the baseline models perform relatively well on cavity flow and dam flow while struggling on tube flow and cylinder flow, especially for non-autoregressive models.
It is important to recognize the difference between autoregressive and non-autoregressive models when analyzing the result.The task of the non-autoregressive model is to directly produce the value of the output function at a designated query location in the entire spatiotemporal domain.This should be significantly more difficult than the autoregressive model, which only needs to learn the mapping from the field at the previous time frame to the field at the current time frame.
Also, the autoregressive models require that the input and output functions be represented with a grid, which limits their flexibility and may result in loss of information on regions where the field value changes sharply for small spatial changes.Furthermore, the non-autoregressive model has better meshindependence, because the model can output the predicted value of the output function at any location.This has great significance for the study of many topographic complex problems.In addition, nonautoregressive inference may be much more efficient 7 because it can predict values at any time frame while autoregressive models need to propagate one time step at a time.In summary, the autoregressive and non-autoregressive models cannot be directly compared against each other, and non-autoregressive inference is generally much faster at long-range prediction and is significantly more difficult.

Non-Autoregressive Modeling
Figure 5 shows the test NMSE, MSE, and MAE of FFN and DeepONet on the four problems and their corresponding seven subsets in CFDBench.More specific results can be found in Table 11.Contrary to the observations by [30], we find that that is no clear winner between FNN and DeepONet in terms of generalization ability.Moreover, we observe that the error is generally fairly large compared to the numerical methods that are used to generate the data, and this order of magnitude is arguably not suitable for many practical applications of fluid simulation.This indicates that pure data-driven neural operators still have a large room for improvement before they may be used to replace traditional numerical methods.

Autoregressive Modeling
We also observe the PROP subset is generally easier than other subsets.This is likely because physical properties affect the velocity less than other operating parameters, making the train-test domain gap smaller.With varying BCs and geometries, DeepONet suffers from severe overfitting, producing fields with little resemblance to the labels.With varying BCs, it is prone to show the velocity distribution in a steady state while with varying geometries, it tends to behave as identity transformations.
Figure 6 shows the test NMSE of all autoregressive models and the identity transformation on the four flow problems with 7 subsets of data.Figure 7 shows the test NMSE of the autoregressive models on the four flow problems (with all cases), this serves as a comprehensive summary of the performance of the autoregressive baselines.The complete result of our experiments is listed in given in C which contains the test NMSE, MSE, and MAE of each autoregressive model on each of the seven subsets of the four problems in CFDBench.
In general, Auto-FFN and autoregressive models from the DeepONet family are at best slightly better than the identity transformation, which means they often learn to output the input as their predictions.
In cavity flow and tube flow, U-Net demonstrates superior performance due to its encoding-decoding structure in the spatial domain, which enables it to capture sharp changes in the velocity field more effectively.On the other hand, the MSE of U-Net and FNO is small while the MAE is large.This is because the velocities have generally small absolute values (u < 1), and the relative error is large when the absolute error is small.In dam flow prediction, the DeepONet family generally prevails while the non-convergence phenomenon is observed in FNO (FNO's result is excluded from the bar chart because the error is too large).The presence of gravity as a dominant physical force in dam flow suggests that the DeepONet family may be more effective in handling PDEs with source terms.
Both image-to-image models perform the best in the cylinder flow (MSE ∼ 10 −5 , MAE ∼ 3 × 10 −3 ), and in this dataset, FNO is better than U-Net.We conjecture this is because FNO is endowed with an ability to extract the characteristics of the periodic vortex more effectively by learning in the frequency domain.
For the tube flow problem, U-Net's predictions have horizontal stripe noises while FNO manifests vertical pattern noise at t = 0.For the cylinder flow problem, we can see from the prediction that although FNO's test loss is very low, it produces visible noises.This is because, in FNO, high frequencies are discarded to reduce the computational cost, as a result, it struggles to model flat regions and sharp changes.This also implies that the loss functions we have considered (which are also used in many previous works) may not be good at capturing all the artifacts of various methods.

Multi-Step Prediction
One important characteristic of traditional numerical methods is that they can extrapolate to any time points through an arbitrary number of forward propagation steps (provided that the iterative process converges).Consequently, it is desirable for data-driven deep learning methods to effectively generalize to time steps beyond those encountered during the training phase and predict the field at any time step.For non-autoregressive models, we can simply query points beyond the temporal range of the training distribution, but for autoregressive models, since predictions depend on the previous predictions, errors may accumulate over multiple forward propagation steps [3].
Figure 8 illustrates the errors of the baseline models by propagating from the IC (the velocity field at t = 0) with respect to different time steps.As expected, autoregressive models exhibit severe  error accumulation.One illustrative example is observed in the tube flow problem, where FNO's error increases by 1000 times within just 15 forward propagation steps.This adheres to our intuition.Perhaps surprisingly, in some cases such as in the tube flow problem, the errors of autoregressive models decrease over time, which means that the model was able to produce a prediction that is more accurate (with respect to the label) when it is provided a less accurate input function.In other words, some autoregressive models are able to utilize the operating conditions to correct themselves.Another observation is that models with convolutional layers, i.e., U-Net, FNO, and Auto-DeepONetCNN, are more prone to error accumulation than other baselines.One possible explanation is that convolutional layers treat the features at different locations the same way, while fully connected layers have one set of weights dedicated to processing every input point.

N M S E A u t o -F F N A u t o -D e e p O N e t A u t o -E D e e p O N e t A u t o -D e e p O N e t C N N F N O U -N e t i d e n t i t y N M S E A u t o -F F N A u t o -D e e p O N e t A u t o -E D e e p O N e t A u t o -D e e p O N e t C N N F N O U -N e t i d e n t i t y
Concerning non-autoregressive models, the errors are relatively stable with respect to the time step.However, they are outperformed by autoregressive baseline models most of the time.In the worst case, DeepONet is consistently worse than all other baselines, even after 20 steps of error accumulation.
Mitigating error accumulation is an active research direction and is out of the scope of this paper.One approach, suggested by [3], is to train multiple models with varying step sizes.An alternative strategy involves imposing physical constraints on the model, effectively rendering it "physics-informed".

Computational Cost
For more complex flow problems, traditional numerical methods can be very expensive in terms of computational cost, often requiring days and even months to run simulations.It has been shown that deep learning methods can be multiple orders of magnitude faster than numerical methods [27,30,3], which is one of the primary advantages of data-driven methods [34].
Different from traditional numerical methods, deep learning methods also involve a training procedure, which can be very time-consuming, and a set of parameters that can be very memory-consuming.

81HW )12 $XWR'HHS21HW&11
$XWR('HHS21HW $XWR'HHS21HW $XWR))  Thus, we need to consider these two aspects in addition to the inference time.We measured the computational cost of each baseline model in terms of time and memory usage during training and inference time.The result is listed in Table 9.The models are implemented with PyTorch and executed using GPU.The statistics for training are measured with a batch size of 32, and for inference, we use a batch size of 1.The experiment was conducted with one computed with one i7-12700F CPU and one RTX 3060 GPU.
From the result, we see that different models have very different computational costs, especially during training.Auto-FFN is around 21 times slower than Auto-DeepONet in training, despite having only double the number of parameters and no significant difference in prediction error.This is intuitive because as mentioned in Section B.1.2,by reusing the output of the branch net within one minibatch, DeepONet can significantly improve the training efficiency.Another important observation from this result is that autoregressive models generally have many more parameters compared to nonautoregressive models, but the two kinds of models have comparable training computational costs.This is because autoregressive baselines predict the entire output function with one forward pass, while non-autoregressive baselines predict each data point of the output function one by one.
On the other hand, during inference, the time needed for the models to perform one forward propagation (or one query for non-autoregressive models) is very similar, all within the range of 5 to 10 ms.This is much faster than the numerical method employed for the generation of this dataset, which takes around 1 second for every frame.

Conclusions
We have introduced CFDBench, a large-scale multi-task benchmark for evaluating the inference-time generalization ability of neural operators in fluid dynamics.CFDBench includes four archetypal flow scenarios: (1) the lid-driven rectangular cavity flow, ( 2 Secondly, we use the constructed data set to benchmark the ability of mainstream models to predict the distributions of velocity.The models include two non-autoregressive models FFN and DeepONet, and four autoregressive models, namely, the autoregressive version of FFN, DeepONet, EDeepONet, and DeepONetCNN, and three three image-to-image models, namely ResNet, U-Net, and FNO.Before training the model, we introduced each model in detail, compared the differences between the models, and carried out a hyperparameter search for each model to get a suitable hyperparameter for the flow problem. By analyzing the single-step and multi-step prediction of baselines on CFDBench, we find that U-Net is the best for the flow problem without source term (gravity), FNO is the best for the phenomenon of periodic eddy currents, and autoregressive DeepONetCNN is the best for the dam flow problem with gravity.The non-autoregressive models with the advantage of grid independence, perform well on the flow problems with relatively small changes in the flow field, such as the cavity flow and the dam flow, but they struggle to converge on the tube flow and cylinder flow problem.In the results of multi-step inference, the fully-connected neural networks are significantly better than the convolutional neural network, non-autoregressive models consistently outperform autoregressive models.The root-meansquare error eventually becomes stable with the extension of the extrapolation time.
All the results of this article show that although these methods perform well on simple problems, they exhibit limited generalization ability on more challenging dimensions, and thus there is still much

A Mathematical Notations
For clarity, we list all commonly used mathematical notations and the corresponding definitions in Table 10.
Table 10: Definition of common mathematical notations used in the paper.

∇
The differential operator.

D
The domain of the fluid field.

T
The maximum time step of interest.∆t The time difference between two adjacent time frames.u The velocity of the fluid.The bias term in DeepONet.

B Detailed Baseline Models B.1 Non-Autoregressive Baselines
In non-autoregressive modeling, we refer to the operating condition Ω as the input function.

B.1.1 FFN
FFN (feed-forward network) is the simplest form of non-autoregressive modeling.The coordinates of the query location and the input function are simply concatenated into one vector, and fed to a chain of fully connected layers, with a non-linear function after every layer except for the last layer.Thus, the prediction is where || is the concatenation operator.This model is depicted in Figure 4a and it can be regarded as the data-driven version of PINN [34].

B.1.2 DeepONet
[30] have shown that by separating the encoding process of the input function and the query location can reduce error.They are encoded by two separate FFNs, the branch net and the trunk net.The outputs are aggregated by dot product to produce the final prediction: where f B and f T are the branch and trunk net, and b ∈ R is a trainable scalar that acts as the bias term.In other words, DeepONet is a specific case of FFN where each linear layer is cut in half, and each neuron can only see the operating parameters Ω or only the query coordinates (x, y, t).
Furthermore, to improve the training speed of DeepONet, we can reuse the output of the branch net within each mini-batch.We sample k = 1000 points in each frame as labels.f B (Ω) is computed once, and each of the 1000 points (x, y, t) are dotted with f B (Ω) before updating the model weights.Figure 4b illustrates the structure of DeepONet.

B.2 Autoregressive Baselines
Autoregressive is arguably more similar to traditional numerical solvers, where the model predicts the flow state at the next time step given the previous time step, i.e., f θ : (u(t − ∆t), Ω) → u(t).Image-toimage models directly model f θ , and different image-to-image models differ only in the implementation of f θ .

B.2.1 Autoregressive FFN
The autoregressive FFN is similar to the non-autoregressive version.The input field, operating conditions, and the query location are all concatenated and fed to an FFN, which predicts the current field at the query location: where u sample refers to a list of field values sampled from u.This can be seen as a completely data-driven version of PINN [34]. Figure 4a depicts the structure of Auto-FFN.

B.2.2 Autoregressive DeepONet
We also consider modifying DeepONet to allow it to generate the solution autoregressively, and we name this model Auto-DeepONet.The structure is shown in Figure 4c.The input to the branch net (i.e., the input function) is (u(t − ∆t), Ω) where u(t − ∆t) is the last predicted velocity field and Ω is the operating condition parameters.The input to the trunk net is the spatial coordinates (x, y) of the query location, while the target output of the model is the value of the velocity field in the next time frame at (x, y), i.e., u(x, y, t).The model is formulated as follows.

Autoregressive EDeepONet
EDeepONet (Enhanced DeepONet) [45] extends DeepONet's architecture to consider multiple input functions.EDeepONet has one branch net for encoding each input function independently, and the branch outputs are aggregated by element-wise product.Since in autoregression, the DeepONet conditions on two inputs, u(t−∆t) and Ω, we also evaluate the autoregressive version of, Auto-EDeepONet.The prediction is modeled as follows.
where ⊙ denotes the element-wise product.
In other words, EDeepONet is a specific case of DeepONet, where the branch net is split into two parts, each responsible for one input functions) and the neural links between each piece are removed (or deactivated by setting them to zero).This structure is illustrated in Figure 4d.
We do not evaluate the non-autoregressive version of EDeepONet because our preliminary experiments show that splitting Ω has no significant impact on the ability of the neural network.However, in autoregression, the input includes u sample (t − ∆t), which is much larger than Ω, and simply concatenating the two vectors may cause the neural to fail to learn dependence on Ω.

B.2.4 Autoregressive DeepONetCNN
We also experimented with CNN as the feature extractor for the input field called Auto-DeepONetCNN.This is almost the same as Auto-DeepONet, but the f B is implemented with a CNN, because CNN may be better at extracting features from a lattice of a field.Since CNN requires a cuboid input, the input to the branch net needs to be u(t − ∆t) instead of u sample (t − ∆t).Similar to ResNet, U-Net, and FNO, Ω is appended to u(t − ∆t) as additional channels.The formulation is as follows.

B.2.5 ResNet
A residual neural network (ResNet) is a CNN with residual connections proposed by [18], and it has shown excellent performance on many computer vision tasks.Residual connectivity can effectively alleviate the degradation problem of the neural network when the depth increases, thus enhancing the learning ability of the model. 8The model can be formalized as follows.9 where CNN(•) is a CNN network.
ResNet has many possible ways to put the ResNet blocks together, and this paper uses a string of residual blocks of the same size.

B.2.6 U-Net
U-Net [39] is a CNN with an encoder-decoder structure, which performs very well in numerous image segmentation and image-to-image translation tasks.The encoder realizes feature extraction and parameter reduction through down-sampling methods such as convolution (with larger striding) and pooling, and the decoder uses the feature encodings to produce an image through up-sampling and channel splicing, so as to achieve the purpose of image generation or segmentation.Compared to ResNet, the down-sampling of U-Net can reduce the number of parameters.Up-sampling can improve the globality of the convolution kernel because after up-sampling, a signal affects a larger region than without up-sampling.The structure of the U-Net used in this paper is illustrated in Figure 4f.

B.2.7 FNO
Fourier neural operator (FNO) [27] is a neural network that parameterizes the convolution kernel in Fourier space.It can learn the mapping of high-dimensional space and especially performs well in the problem of turbulent pulsation.The Fourier neural operator first raises the input function to a highdimensional space through a shallow fully connected network and then approaches the target transform through the Fourier layer containing the Fourier transform and the inverse transform.The FNO has better globality than an ordinary CNN because any signal in the Fourier space affects the output on the entire spatial domain.Figure 4e shows the structure of FNO, and it can be formalized as follows.
where F(•) denotes the Fourier transform, P (•), Q(•), W i (•) are ordinary convolutional layers, and It is worth mentioning that, in the original paper of FNO [27], the input includes multiple time steps before the current time step, which provides additional information about the flow's state and may make inference easier.However, this limits the usability of the method.Therefore, in this work, we only consider the scenario where the input contains no more than one frame.

C.1 None-autoregressive Baselines
Here, we list the detailed results of the autoregressive baselines on the four problems in CFDBench, shown in Table 11 C.2 Autoregressive Baselines Here, we list the detailed results of the autoregressive baselines on the four problems in CFDBench, shown in Table 12, 13, 14, and 15.

C.3 Qualitative Prediction Examples
Here, we list some velocity field prediction results of all baseline models, shown in Figure 9 and Figure 10.

D Hyperparameters of Baseline Neural Networks D.1 DeepONet
For the non-autoregressive DeepONet, we tried three activation functions, Tanh, ReLU, and GELU [9] during hyperparameter search, and found that the validation NMSE and MSE of the model when using ReLU were significantly and consistently smaller than otherwise.This is different from some previous findings that indicate ReLU is worse at modeling flows [34].One reasonable explanation is that many existing works only consider periodic BCs and simpler flow problems, which is different from CFDBench.Moreover, our preliminary results show that in the cases where there is a circular flow, the model predicted a linear distribution instead, which indicates that the activation function does not capture the nonlinear characteristics well.To improve the activation function's ability to model nonlinearity, we propose to normalize the input value of the activation function and add an activation function on the last layer of the branch net.We find that normalizing the input value of the activation function can significantly reduce NMSE, and removing activation functions after the last layer is better.In this paper, unless stated otherwise, we use ReLU as the activation function and normalized the input value of the activation function, without the activation function for the last layer neurons.Additionally, preliminary trials show that DeepONet is unstable for inputs with large absolute values, so we normalized all the operating condition parameters.
The two sub-networks (branch net and trunk net) are feed-forward networks with constant width (output dimension).Therefore, the width and the depth (the number of fully connected layers) determine the number of parameters, and therefore the capacity of the model.We additionally conducted a hyperparameter search for the model's width n (We follow the original DeepONet and set the width of the two sub-networks to the same value), the branch net's depth B, and the trunk net's depth T .The results are shown in Figure 11.The model displays severe overfitting when the parameters are too large.In the following sections, we used the parameters with the smallest validation NMSE, i.e., n = 100, B = 12, T = 16.This results in 263,701 parameters.

D.2 FNN and other models in the DeepONet Family
For other models in the DeepONet family and the two FFNs, we generally wish to have their size be similar to other models.We start from the width and depth of the to sub-networks in the DeepONet model, and simply try different widths and depths aroudnd the same magnitude.The hyperparameters concerning the activation functions are the same as in DeepONet.

D.3 U-Net
In U-Net, each max-pooling layer is followed by two convolutional layers that double the number of channels.We consider searching the hidden dimension of the first convolutional layer (d 2 ) and the means of injecting operating condition parameters Ω.We can either explicitly include Ω by adding it as addition channels of the input (as described in Section 3.2.3),or we can implicitly include it in the down-sampled hidden representations, i.e., we add a linear layer that projects Ω into the same shape as the encoder's output and add it to that output.When the input explicitly contains Ω, its dimensionality (i.e., number of channels) is d 1 = 8, the features are (u (x, y), v (x, y), mask, u B , ρ, µ, h, w) at every location on (x, y), where u and v velocity along the x and y axis.When implicitly conditioning on Ω, the input contains only the velocity field.which makes it d 1 = 3, which includes only u, v and the mask.For d 2 , it determines the dimensionality of each convolutional layer, thus, the intuition is that the larger d 2 is, the larger the model, and the stronger the learning ability.This is also the trend found by our hyperparameter search.However, a larger model also requires greater computational cost.On the other hand, we observe insignificant differences between the two ways to condition on the operating condition parameters.In the subsequent sections, we explicitly include Ω as additional input features (to make it more similar to the FNO structure that we use), and d 2 = 12.This results in 1,095,025 parameters.

D.4 FNO
Since the structure of FNO is highly similar to ResNet, we search the same hyperparameters, namely, the number of FNO blocks (d) and the number of channels of the hidden representations (h).We choose to filter out high-frequency signals greater than 12 in the Fourier layer, which is used in the original paper.The results are shown in Figure 13.We observe that increasing both d and h can result in better validation loss, which is intuitive because both imply a greater number of parameters, which increases the capacity of the model.In order to ensure that the number of parameters of the baselines model is similar, such that the training and inference costs are similar, we choose to use d = 4 and h = 32, which results in 1,188,545 parameters.

E Data Processing E.1 Interpolation Into Grids
Before feeding the data to the neural networks for training, they require some pre-processing.Firstly, all data need to be discretized and interpolated into grids.For simplicity, we also keep the spatial domain to be 64 × 64, so for different heights and widths, the size of the grid cells (denoted as ∆x × ∆y) are different.The value of each grid cell is set to be the average of the data points that fall into that cell, and if there is zero data points in a cell, its value is set to be the average of adjacent cells. 10 Additionally, for BCs that are constants, we pad the tensor with one extra grid line.For instance, for the tube flow problem, we pad all grids on the top and bottom boundary with one line of zeros, resulting in a tensor with 66 rows and 64 columns. 10For multiple contiguous empty cells, we set their value iteratively from the boundaries of the empty region.

Figure 1 :
Figure 1: Some examples of the velocity field in the four problems in CFDBench.From left to right: cavity flow, tube flow, dam flow, and cylinder flow.

Figure 2 :
Figure 2: The residuals of each flow problems in this paper.(a) cavity flow, (b) tube flow, (c) dam flow, (d) cylinder flow.

Figure 3 :
Figure 3: Overview of the input and output types and shapes of each baseline model.

Figure 4 :
Figure 4: The structure of each baseline model in this paper.

P
r o p B C G e o P r o p + B CP r o p + G e oB C + G e o A l C a v i t y f l o w N M S E -D e e p O N e t M S E -D e e p O N e t M A E -D e e p O N e t P r o p B C G e o P r o p + B CP r o p + G e oB C + G e o A l T u b e f l o w N M S E -D e e p O N e t M S E -D e e p O N e t M A E -D e e p O N e t P r o p B C G e o P r o p + B CP r o p + G e oB C + G e o A l l 0 D a m f l o w N M S E -D e e p O N e t M S E -D e e p O N e t M A E -D e e p O N e t P r o p B C G e o P r o p + B CP r o p + G e oB C + G e o A l l 0 C y l i n d e r f l o w N M S E -D e e p O N e t M S E -D e e p O N e t M A E -D e e p O N e t

Figure 5 :
Figure 5: The prediction results of 2 non-autoregressive baseline models on 7 subsets of the data set, with the vertical axis representing the average NMSE of all frames in the test set and the horizontal axis representing the data type.

P
r o p e r t y B C G e o m e t r yP r o p + B CP r o p + G e oB C + G e o A l l 0 p e r t y B C G e o m e t r yP r o p + B CP r o p + G e oB C + G e o A l l 0 p e r t y B C G e o m e t r yP r o p + B CP r o p + G e oB C + G e o A l l 0 p e r t y B C G e o m e t r yP r o p + B CP r o p + G e oB C + G e o A l l 0 o -F F N A u t o -D e e p O N e t A u t o -E D e e p O N e t A u t o -D e e p O N e t C N N F N O U -N e t i d e n t i t y N M S E A u t o -F F N A u t o -D e e p O N e t A u t o -E D e e p O N e t A u t o -D e e p O N e t C N N F N O U -N e t i d e n t i t y

Figure 6 :
Figure 6: The prediction results of 6 autoregressive baseline models on 7 subsets of data set, with the vertical axis representing the average NMSE of all frames in the test set and the horizontal axis representing the data type.

Figure 7 :
Figure 7: Summary of the performance of autoregressive baseline methods on the four problems (with all cases).FNO's result on the dam flow problem is removed because the error is too large and including it would make the plot less intelligible.The bar chart of the cylinder flow (d) is in logarithmic scale.

Figure 8 :
Figure8: The error of autoregressive and non-autoregressive baselines as a function of the number of forward propagation steps, given only the operating parameters Ω and the initial conditions, evaluated on all cases in each task.The y-axis is on a logarithmic scale.ResNet is not included because its error is too high and including its line would make the figure less intelligible.Best viewed in color.
u B The boundary conditions.u sample The set of query points on the input field function.u The x-velocity of the fluid.v The y-velocity of the fluid.τ The shearing stress of fluid.ρ Density of fluid.µ Viscosity of fluid.Ω The working condition parameters, which is (u B , ρ, µ, S), where S denotes the shape of the spatial domain, which is different for each problem.Σ Input function to the PDE solver.θ The parameters of a neural network.f θ A neural model parameterized by θ.L The training loss function.T The training data.Y The label value of training data.Ŷ The predicted value of training data.|| Concatenation operator f B The branch net in DeepONet.f T The trunk net in DeepONet.b

Figure 11 :
Figure 11: The validation loss of DeepONet using different hidden dimensions and numbers of FNO blocks on the cavity problem.

FNOFigure 9 :FNOFigure 10 :
Figure 9: Prediction of the velocity field by the baseline models on the cavity and tube flow problems.The result of autoregressive models is the prediction of one forward propagation step.The input is not given to non-autoregressive models.

Figure 12 :
Figure 12: The validation loss of U-Net with different hidden dimensions and ways of conditioning on operating parameters on the cavity problem.

Figure 13 :
Figure 13: The validation loss of FNO using different hidden dimensions and number of FNO blocks on the cavity problem.

Table 1 :
Table1presents a comparison between the main benchmarks for data-driven PDE modeling and our benchmark.
A comparison between CFDBench (ours) and existing benchmarks in data-driven PDE modeling."Varying x" means that different data examples have different x.

Table 2 :
Operating parameters of the subset in the cavity flow problem.

Table 3 :
Operating parameters of the subset in the tube flow problem.

Table 7 :
Overview of the different baseline models we consider."Auto."refers to whether the method is autoregressive.u sample is a list of samples points from u.

Table 8 :
The validation loss of ResNet and the identity transformation for the 7 subsets (see Section 5) in the cavity flow problem.The better result is highlighted in bold.

Table 9 :
Computational cost of the different baseline models on the cavity flow data, PROP subset.training time refers to the time required for training before using the model, inference time refers to the time required for computing one forward propagation (for autoregressive models) or the prediction on one query point (for non-autoregressive models).
room for improvement.We are convinced that our dataset provides an important first step towards better designing data-driven neural operators for CFD.

Table 11 :
Main test results of non-autoregressive methods (FFN and DeepONet) on each subset of each problem.

Table 12 :
Detailed results of autoregressive baseline models on the Cavity Flow problem on one forward propagation.

Table 13 :
Detailed results of autoregressive baselines models on the Tube Flow problem on one forward propagation.

Table 14 :
Detailed results of autoregressive baselines models on the Dam Flow problem on one forward propagation.

Table 15 :
Detailed results of autoregressive baseline models on the Cylinder Flow problem on one forward propagation.