Fluid-based ﬁeld representation for collision risk assessment on the road scenes

: Prediction of the likely evolution in the trafﬁc scenes is a challenging task because of high uncertainty of sensing technology and dynamic environment. It leads to failure of planning for intelligent agents like autonomous vehicles. In this paper, we propose a ﬂuid-based physical model to present the inﬂuence of surrounding object’s motion on driving safety. In our pipeline, the input sensor could be LiDAR, camera, or multi-modal data. We use a Kalman ﬁlter to estimate the state space of each detected object, and adopt the properties of stable ﬂuid to build a riskmap based on the density ﬁeld. The noisy state space are then modeled as the boundary conditions in the simulation of advection and diffusion process. We test our approach on the public KITTI dataset and ﬁnd this model could handle the short-term prediction in case of misdetection and tracking failure caused by object occlusion, which shows promising in collision risk assessment on road scenes.


Introduction
Assessing risks of collision is an essential task for autonomous vehicles driving on the road. It requires predicting the likely evolution of the current traffic situation, and assessing how dangerous that future situation might be [1]. A common measure of collision risk is time-to-collision (TTC) which is insufficient for handling complex situations, e.g., on a curved road. Moreover, once the object failed to be tracked due to hard occlusion, it would lead to less effective assessment of collision risks.
Collision risk assessment employs information from sensors and uses grid-based representation to estimate potential threats [4]. An occupancy grid of the scene is a typical model by updating with the Bayesian occupancy filter to continuously present the surrounding environments [2,3]. The spatial information of vehicles, cyclists, pedestrians and other obstacles is quantified into a distribution of occupancy by using probability or belief mass. The occupancy grid can be constructed by using multiple sensors like LiDAR, radar, cameras and fusion together.
However, only the occupancy in the visible region is presented, except for the occluded objects, which would be really potential threats. If an object is failed to be tracked due to occlusion, it has high probability to reappear. Therefore, in order to improve driving safety, we aim to develop a better representation of dynamic objects around ego-vehicle. It could reveal spatio-temporal information even if tracking is lost. Moreover, we expect to improve the efficiency of the prediction with physical model in the traffic scenes.
Physical model is employed to encode the spatial-temporal information for the interaction between objects. In 2011, Hamrick et al. [5] performed experiments to suggest that humans use internally physical models to predict how dynamical systems evolve. A recent line of work on long-term prediction based on physical simulators achieves some progress by fusion of data-driven learning method. For example, robots can quickly learn manipulation skills when predicting the consequences of physical interactions [6]. Pixel-level video prediction is another topic achieving great progress recently. It focuses on prediction in future by observing the generated frames in the past [7,8].
Different from a data-driven method, we explore a fluid-model-driven approach to simulate the evolving distribution of substance around surrounding objects and predict their behaviors. The advection and diffusion process in the fluid model help reveal hidden spatial information in presence of missing data. In order to analyze objects' interaction and estimate collision risk, we modify the standard diffusion process and put emphasis on the diffusion based on the object's state space.
In the following sections, we firstly make a short survey about related work in Sec. 2. Then, the formulation of fluid model and construction of riskmap is presented in Sec. 3. We evaluate our method with real-world public data from KITTI dataset in Sec. 4 and finally make a discussion and conclusion in Sec. 5.

Related Work
In this section, we review recent studies that have contributed to 3D object detection and tracking and risk representation on road scenes. In recent years, with the development of deep neural networks, research on the tasks of computer vision, such as image classification, object detection and semantic segmentation, achieve significant progress. Especially for the object detection, there are two categories of network frameworks: (1) two-stage network, e.g., R-CNN, Faster R-CNN and Mask RCNN; (2) one-stage network, e.g., YOLO and SSD. These methods show great improvement over traditional methods like SVM and AdaBoost. Some ideas are employed in 3D object detection towards point cloud from LiDAR, RGB-D and stereo cameras.
The 3D object detection based on the point cloud mainly has three categories of approaches: (1) projection from 3D point cloud to 2D image, such as Complex-YOLO [10] and BirdNet [11]; (2) voxel feature based detection, like 3DFCN [12], Vote3Deep [13], and VoxelNet [13]; (3) multi-modal fusion approaches, such as MV3D [15] and AVOD [16]. Recent research in multi-object tracking (MOT) has focused on the tracking-by-detection and learning-to-track principals. Both batch methods [17,18] and online methods [19,20] have explored how to learn a similarity function for data association. More recent studies on MOT have integrated the hierarchical features from deep convolution networks [21,22] and correlation filter [9]. In addition, reinforcement learning algorithm has been proposed to link data in online MOT, and Markov decision processes (MDP) have been proved to be suitable for dynamic environment [23]. Multi-object can be modeled as multi-agent which has its own lifetime to perform certain tasks and maintain certain states.
Prediction of objects' motions and risk assessment in traffic scenes is an important issue. Trajectory analysis [24,25] and occupancy grid mapping [2,26] are two major approaches to compactly represent other objects' motions. Trajectory analysis mainly focuses on the temporal characteristic of vehicle's motion. Sivaraman et al. [24] attempt to obtain vehicles' trajectories in front of the ego-vehicle from a monocular camera and a stereo camera. The behaviors of the observed trajectories are then learned by unsupervised learning method. In [25], panoramic camera arrays are used to capture a full surround view, and trajectories of surrounding vehicles are then extracted and classified by the hidden Markov model (HMM) into certain state like overtaking and lane change. On the other side, occupancy grid mapping essentially relies on probabilistic spatial information, e.g., using recursive Bayesian filtering [2]. A probabilistic grid-based approach is used to exploit surrounding vehicles' movements and infer the presence and location of occluded road surface as a complementary method to already existing road detection systems [26].
In addition, the field-based approaches have been proposed for automatic vehicle guidance for a long history. Electric field model is used to interpret the vehicle's motion as an electron within an electric field, and the system is transformed into a riskmap reflecting the risk at a certain position in the dynamic environment [27]. Wolf and Burdick [28] introduce a vehicle collision avoidance system in a full two-dimensional field with lane, road, car, and velocity potential function components. Recent work [29] assesses collision risks by risk potential modeling of predicted motion of the surrounding vehicles in various driving conditions. Wang et al. [30] propose a hybrid field model to assess the driving safety and use it in pre-collision warning system. We have proposed a field-based representation of surrounding vehicles from the forward-moving monocular camera [31]. But it is just a first attempt, which do not take the scale, orientation and velocity of object into account. In this work, we make improvement on building the fluid-based field with the properties of objects, which gives an impressive representation of collision risk.

Approach
In this paper, the key point is using the fluid model to demonstrate the interaction of moving objects in the traffic scene, no matter which modal of the input data. Our framework would fit multi-modal sensor data such as LiDAR, radar and camera. The 3D LiDAR point cloud data from the public KITTI dataset is employed, which could provide accurate objects' location and classification via the state-of-the-art 3D object detector [32]. Our system pipeline consists of two components: (a) 3D object detection and tracking; (b) fluid-based field computation and representation. In the follows, we will present the details of each step.

3D Object Detection and Tracking
3D object detection module provides n 3D bounding boxes D t = {D 1 t , D 2 t , . . . , D n t } at the frame t, which contains the 3D coordinate of the object center x i , y i , z i , object's size l i , w i , h i , and heading angle θ i for each detected object We directly adopt the state-of-the-art 3D detector PointRCNN with their models pre-trained on the KITTI training set to get the detection results.
Inspired by the work [33], we use the Kalman filter to predict the entire state spaces of n detected object trajectories The motion model of the Kalman filter is formulated as Since we don't observe the initial velocities of objects, the covariance matrix P of high uncertainty should be given at the inital step. Then, a data association module is employed to match the next detection D t+1 with predicted trajectories T t+1 by using the metrics of 3D IoU and Hungarian algorithm [34]. The outputs of the data association module are the matched pairs of detection and predicted trajectories (D t+1 , T t+1 ) match . Using the matched pairs, the measurement model of the Kalman filter is set as In order to manage the birth and death of the trajectories, unmatched detection is considered as potential objects entering the perspective field and new trajectory would be created. In addition, unmatched trajectories are regarded as potential objects leaving the perspective field, and once each unmatched trajectory exceeds several frames, it will be deleted which means the death of the trajectory.
The output of the Kalman filter is the corrected trajectories of objects based on the matched detection. However, it is still hard to handle the occlusion issue due to missing detection. We employ the fluid-based representation module to manage the short-term occlusion problem thanks to the physical modeling of the fluid.

Fluid-based Field Representation
With the trajectory of tracked objects, we are able to compute the 2D riskmap in the bird-eye view based on a fluid-based field representation. In the field of physics, a fluid whose density and temperature are nearly constant is described by a velocity field u and a pressure field p [35]. Given that the velocity and the pressure are known for the initial time t = 0, the evolution of these quantities over time is given by the Navier-Stokes equations [36]: where v is the kinematic viscosity of the fluid, ρ is its density and f is an external force. The "·" denotes a dot product between vectors, while the vector spatial partial derivatives is denoted as ∇ = (∂/∂x, ∂/∂y) in 2D and ∇ = (∂/∂x, ∂/∂y, ∂/∂z) in 3D. And the shorthand notation ∇ 2 = ∇ · ∇. In our pipeline, let x i t = (x i , y i ) t denote the spatial coordinate of the ith object at the time t, because we only consider the 2D fluids. The relative velocity of the ith object (to ego-car) at the time t is denoted as Note that if the object at the last frame is failed to be tracked, we assume its velocity unchanged.
As stated by the Helmholtz-Hodge Decomposition [37], any vector field w can be written uniquely as w = u + ∇q, where ∇ · u = 0 and q is a scalar field. The velocity field is a vector field, and thus it can be projected into a "divergence free" part regarded as u = Pw = w − ∇q by an operator P. The Eq. 3 can be transferred to ∂u ∂t = P(−(u · ∇)u + v∇ 2 u + f) (4) and it removes the pressure term of the scalar field by using the fact that Pu = u and P∇p = 0. In this module, we compute the distribution of the substance around each object as a density field and adopt incompressible flow to model the behavior of substance. Suppose at the time t, we take the trajectories T t to describe the involved computation of the corresponding density field as a riskmap D(x t ). After D(x t ) is obtained, we advance to the computation at the next time step. The input and output are illustrated in Fig. 1, where we also show the ego-car in magenta and other detected vehicles in black.  This process starts from the solution w 0 (x) = u(x, t) of the previous step and then sequentially resolve each term of Eq. 4. Similar to the work [35], we decompose the computation of the field into an advection stage, a projection stage and a diffusion stage of a periodic process. The steps are: This process involves the advection and diffusion without adding the force term f. The density field periodically evolves from the advection to the diffusion. The effects of the advection and diffusion are illustrated in Fig. 2.
As objects move, the velocity can be modeled as the boundary condition in the advection process. Specifically, we first solve the differential equation as ∂u ∂t = −u · ∇u (5) with u(0) = w 0 and w 1 = u(∆t) for the velocity field u with the boundary condition. Since the substance is incompressible under the advection guided by w 1 , the advection of the substance in the density field D is computed by solving Then, diffusion is the other effect of the moving vehicles on substances. Each vehicle is supposed to continually play a role on the entire field. In this case, the existence of surrounding vehicles can be persistently observed even with a periodic advection of the density field. Formally, we solve the diffusion equation: where λ controls the rate of diffusion and s measures the strength of the source of diffusion at the position of each object. For preventing the space being fully filled with substances, a damping step is proposed at the end of each time step. The density field is damped as D = ωD after the diffusion, where the damping coefficient ω is set to 0.95 in our implementation. The advection and diffusion equations as well as the projection for incompressibility are solved by the solver proposed in [35]. The diffusion equation is solved by iteratively applying isotropic Gaussian filter as shown in Fig. 3(a); thus, the distribution of the density field around each object is isotropic. For better presenting the possible risk, we introduce anisotropy along the velocity of the tracked objects in the diffusion step as illustrated in Fig. 3(b). The diffusion stencil in isotropic diffusion is uniform around each grid, i.e., w(i ± 1, j ± 1) = λ. We take the velocity field u into account to modify the stencil. The stencil at neighboring grids is updated as w(i + sgn(v x ), j) = λ + a · v x and w(i, j + sgn(v y )) = λ + a · v y , if the velocity at the grid (i, j), a is the parameter controlling the anisotropy and the sign function is Note that more complex anisotropic such as Laplace operator could be used here, but we only adopt such updating scheme because it is a simple and efficient encoding method for the short-term prediction of surrounding objects.

Experiments
In this section, we evaluate the proposed method on the KITTI tracking dataset, which contains 21 LiDAR data clips and corresponding image sequences. Currently, we take the vehicles as example for validation and ignore other objects like pedestrians and cyclists. We directly employed the pre-trained PointRCNN model to get the results of 3D object detection from the LiDAR point cloud. Then, the 3D Kalman filter is adopted to track multiple objects in each sequence. All tracked data are illustrated in a bird-eye view with a fluid model in order to assess collision risk. The fluid-based riskmap is demonstrated in Fig. 4, where we project the 3D multi-object tracking results on the synchronized image sequence. The process of fluid modeling is computed on a single CPU thread with 2.5GHz about 25 milliseconds. The field is updated frame by frame with a resolution 512 × 512, corresponding to an area of 80m × 80m. Fig. 4(a) shows the 3D tracking results projected on the continuous image frames. The advection and diffusion process makes the density field of corresponding objects varying frame by frame until they disappear from the view as shown in Fig. 4(b). The region with the red color represents high collision risk area, and the blue color means no collision risk. When objects leave from ego-car's perception field, the substance of the density field would dissipate immediately; it means there is no more risk from these objects. We find that it is relatively stable with respect to noise in the multi-object tracking, especially when there is tracking failure as shown in Fig. 5. Let's look at the area marked by red circles in the riskmap. At the first frames 122, 123, 124, it is successfully detected and tracked (3D bounding box in blue); then, it is fully occluded by another vehicle and no more being tracked in the following frames from 125 to 127. At the last two frames 128, 129, we can see that this vehicle is detected and tracked as a new object. In this sequence, the riskmap could still show the influence field of this object by advection and diffusion, even though it is not detected and tracked.
In cases with no tracking failure, our method presents a riskmap with the evolving spatial and temporal information, while conventional risk assessment methods produce a scalar signal to measure the risk level. The density field presents potential risks as the ego-vehicle moves. Comparing with occupancy grid which only shows the static occupancy information, our method could assess collision risk by dynamic prediction. We test our method with more complex scenarios, and the results are shown in Fig. 6. The regions with potential risk are filled with warmer color when the ego-vehicle is waiting at a crossroad or making a turn. The passing vehicles around ego-car show their possible collision region on the orientation as shown in Fig. 6(a). Fig. 6(b) illustrates that when the ego-car turns, the surrounding objects make a reasonable density-field representation of possible collision risk.

Conclusion
In this paper, we propose an approach of the fluid-based risk assessment and representation, which can be easily integrated with different modal data, like LiDAR point cloud and camera image sequence. We use a state-of-the-art 3D detector and the traditional Kalman filter for multi-object tracking to estimate the state space of surrounding objects (using vehicle as example). Then, the advection and diffusion process handles the physically evolving density field in order to build the dynamic riskmap for intelligent agent like autonomous vehicle. This method enables a short-term prediction of the distribution along the trajectory of each object, even though object detection and tracking fails and sensor data is imperfect. Besides, our method can also be used in the inference of object's visibility, e.g., certain object becomes occluded. The efficiency of the proposed approach is evaluated on the public KITTI dataset.
Driving assistance and automated driving are high-level tasks which require the understanding of the driving semantics or intention, while our method only takes the estimated trajectories and creates information at the conceptual level. In the future we will explore how to combine more road elements like lane marking, geometric information, static obstacles for planning and decision-making further. Although we do not focus on tracking occluded objects in this work, we do believe a stable tracking is beneficial for risk assessment. Therefore, a field-based systematic method to deal with the challenges of tracking surrounding vehicles is also part of our future work.