1. Introduction
The current outbreak of the novel coronavirus SARS-CoV-2 has spread to many countries [
1]. The virus has infected a wide range of humans and animals, with human-to-human transmission and animal-to-human transmission [
2,
3]. And the number of infected people is still increasing. Many people show mild symptoms, or no symptoms at all [
4,
5]. Asymptomatic infections are still at risk of transmission, which increases the difficulty of epidemic prevention, control and monitoring. In the era of globalization, the rapid flow of population has played a big role in the spread of the virus [
6]. Meanwhile, crowd gathering and social interaction have accelerated the spread of the virus. At present, the main methods of epidemic prevention and control are to restrict interpersonal communication, maintain social distance and wear masks, which can limit the spread of the epidemic to a certain extent [
7,
8]. But at the same time it will also have a negative impact on economic development and daily life [
9]. Due to the long development cycle of vaccines, the limited maintenance time of the human body, and the high rate of viral single-stranded RNA acquisition, it is foreseeable that the new coronavirus will affect activities.
In the information age, digital tracking technology and Geo-Information system have played a very important role in the fight against COVID-19 [
10,
11,
12], providing a reference for solving travel restrictions. The analysis and mining of big data can help us control and track the development of the epidemic, and play a role in prediction and prevention. Close contact tracking applications have been adopted in many countries [
13,
14], such as COVIDSafe in Australia, COVIDTracer in New Zealand [
15,
16], StopCovid in France and CovidRadar in Mexico.
For example, Australia's COVIDSafe uses Bluetooth technology to record the information of people who stayed for a set time within a set distance, and it will be automatically cleared after 21 days [
17]. The software is similar to Japan's COCOA [
18]. By generating random identifiers that are refreshed every two hours, these identifiers are transmitted between devices via Bluetooth and stored locally on the device. Only when there is a positive person can the local encrypted data be uploaded to the cloud platform through the activation password [
16]. This reduces the time to find and locate close contacts to a certain extent. However, there are strict restrictions on duration of stay, and no information about contacts who stay less than 15 minutes is saved.
CovidTracer in New Zealand works through the QR code generated when the company or organization registers the application. The official QR code of the Ministry of Health is placed in public places, and the arriving personnel scan the code, and then add the location to their trajectory. When an infected person is present, an alert is issued to the user and the place. However, there are problems with user scan fatigue and corporate information overload. And user trajectories may be incomplete, and public transportation problems are not considered [
15].
Besides, Apple and Google jointly developed an API for exposure notification, which can be used across borders. API uses bluetooth technology and applies different hashing algorithms to generate different keys within a certain period of time, and infected cases only upload keys from the past 14 days [
19]. The use and implementation largely depend on the corresponding public health department, but the authorities need to follow development standards such as privacy and security. Moreover, there is a delay in using the key to obtain the information of the contactor, and it is also troublesome to prevent community transmission and identify asymptomatic communicators.
In China, we can obtain the health code through WeChat and Alipay applet. The health code is a digital pass and health certificate. The movement of the holder is restricted by the color of the health code [
20], which achieves the effect of isolation to some degree. However, the health code cannot be communicated between different provinces, and the data obtained is inaccurate and the health code does not have a uniform national standard [
21]. Based on this, a communication travelling code was issued. By sending short messages to the operator, the provinces, cities and regions that the mobile terminal holders have visited in the past period of time are obtained [
16]. However, the data is inaccurate. We can not directly obtain details such as the source of infection and close contacts, and do not have real-time alarm functions.
So the problem is that existing technology can not accurately locate the source of infection and the epidemic prevention and control measures are inadequate. Taking all these into account, this article provides a method and system for locating the source of infection based on big data to solve the predicament.
This system provides precise location of the source of infection based on smart mobile terminals and big data, real-time recording, and tracing of people in close contact with the source of infection. And the system provides real-time display of multi-user risk levels and location information of the source of infection. The system uses GPS and Bluetooth technology, the technical background will be introduced in the second section. The third section will introduce the related algorithms adopted by this method. The fourth section will briefly introduce the system architecture and functional design. The system does not rely strongly on infectious disease monitoring. When there is no infectious disease monitoring, the method can still calculate and update the regional risk level of each location and the risk level of each person at each time based on the big data of the flow of people. The method of locating the source of infection can be applied to suppress the transmission of infectious diseases and thus control or prevent epidemics. The method can also be used to intelligently generate epidemic investigation reports to guide epidemic prevention and control.
2. Technical background
The system needs Bluetooth system of smart phone mobile terminal and a server. We have applied for a patent for the design of this system [
22].
As shown in the
Figure 1 is a schematic diagram of the overall system principle, which contains the data collection and transmission. User 01 indicates the people in the short-range circle of the Bluetooth transmission of the target person; User 02 indicates the person located in the critical transmission distance circle of the target person’s Bluetooth transmission; And user 03 indicates the person located outside the critical transmission distance circle of the target person’s Bluetooth transmission. Users in different Bluetooth propagation circles have different effects on target users, and the probability of virus propagation is greatly reduced in the case of exceeding the security range, so we record only the information of people in the proximity circle through the algorithm, and record the information of users' close contacts through the software installed on the smartphone and update it to the server side through the network for unified processing. The measurement of the safety range is achieved by low-power Bluetooth technology. We will then describe these components in detail.
2.1. Smart phone
Data collection is completed on the smart device. Mobile devices can be smartphones, smartwatches and smart bands. The implementation of our system takes a smartphone as an example.
Some studies have shown that many people carry their smartphones with them [
23]. This makes it possible to track close contacts through smartphones. So we use smartphones to constantly perform Bluetooth scans and broadcasts by turning on background services, collecting the data we need from each user. Smartphones collect the necessary data and store it in a local database. This data is then uploaded to the server under certain circumstances. The data of the mobile phone is set to expire and deleted from the local database of the mobile phone periodically.
Specifically, the software installed on the mobile device of the target person may obtain the Bluetooth permission and GPS permission, and track the movement trajectory of its close contact.By turning on the Bluetooth function, the target person can exchange individual identity information, personal risk level and Bluetooth signal strength with other people in the Bluetooth critical transmission circle. And then regular updates of personal risk level are done in the smart phone The information transmitted by smart phones includes: wireless received signal strength information, target personal risk level, and personal protection information.
2.2. The server
In the process of data transmission and processing, the mobile terminal transmits the user identity information, location information, protection information, and risk level information collected to the database of the cloud platform through the data network. The cloud platform is a general framework that provides data storage computing and network transmission functions. The large amount of data collected from the client requires us to choose a cloud-centric database to build a powerful storage platform. We can also build a cloud service platform ourselves. We use the server to store and analyze data passed from the user's local database, maintain the database on the server side, and update the risk level. When the user's risk exceeds the threshold, a warning is sent to the client. When the status of the user becomes infected, the server will ask the user to upload the scan record of the local database. After comparing the time stamp and unique identifier, the server will return the list of close contacts.
The server analyzes the data based on the big data algorithm and updates the person risk information database, the regional risk information database, and the map library in real time. Through the data network transmission, the relevant risk information may be updated on smart mobile terminal. The user's location may be visualized on the map, and an alarm may be given based on the set threshold. The server side can also generate risk reports based on big data, which assists in the prevention and control of the epidemic. And then we can complete real-time interaction with users.
So when a person is found to be infected with a virus and is identified or tagged as an infected person, we can quickly obtain the personal information of the infected person through this system. We may use log files to obtain the places he has recently visited and screen out all the information of close contacts. These close contacts will receive the warning message as soon as possible. We can form cooperation with mobile phone operators and send SMS notifications at the same time. And then, update the risk level information promptly. So the system helps control the further spread of the epidemic. At the same time, the warning function reduces the possibility of exposure to dangerous environments and lessens the risk of spreading infections.
Compared with the health code, the advantage of this system lies on the following: it offers a convenient way to query close contacts, it does not rely strongly on the monitoring system, and it provides the function of predicting and alarming the risk level.
2.3. Low power Bluetooth technology
When the target person enters the room, it is difficult to use GPS to accurately locate under environmental influences. Therefore, this article uses a combination of GPS and Bluetooth positioning to achieve precise positioning. The advantage of Bluetooth positioning is low power consumption [
24]. When in the working state, the beacon node can automatically send information to users in the coverage area, obtain the location of the user, and convey the required information based on the location. In this system, we can obtain the distance between two devices through Bluetooth. And we can exchange some information between different mobile device through bluetooth.
Bluetooth Low Energy(BLE) has a wide range of applications for proximity detection and interaction, and both Google and Apple have introduced proximity-based applications [
25]. And low-power Bluetooth technology is now heavily used in various mobile terminals. With the development and use of smart devices, Bluetooth is increasingly turned on a daily basis, providing the possibility of practical application of this article. In low-power Bluetooth broadcast mode, the role of smart devices is divided into observer and broadcaster [
26,
27]. The Bluetooth protocol stack does not limit the range of roles of devices, each BLE device is peer-to-peer, and smartphones can act as both broadcasters and observers. There are two types of broadcast modes, non-connectable and connectable broadcasts. Connectable broadcasts allow another device to request a connection. In this article, we are using the non-connectable broadcast mode. In this mode, the device does not establish a connection, which ensures that neighboring devices do not access private information. To track contacts, we configure the smartphone to periodically broadcast advertising packets via the non-connected advertising mode. When a nearby smartphone receives the packet, it can measure the received signal strength.
2.3.1. Advertising Package
In the unconnected advertising mode, the smartphone will broadcast advertising packets periodically based on the advertising interval (the advertising interval defines the broadcast frequency of the message). As shown in
Figure 2, broadcast sending messages of up to 47 bytes. Note that 16 bytes are used for the preamble (1 B), access address (4 bytes), header (2 bytes), MAC address (6 bytes) and CRC (3 bytes). Thus, there are 31 bytes left to put information related to the environment signature. Both broadcast and scan response data are contained in multiple advertising(AD) data segments, and each AD data segment must consist of the length and data. Broadcast packets contain information including personal risk levels and other necessary discriminatory information. It is sent as AD data after encryption.
2.3.2. RSSI
The parameters related to distance estimation are mainly RSSI. The received Bluetooth signal RSSI (Received Signal Strength Indication) has a certain mathematical relationship with the distance. According to the signal strength received by the transmitting end and the receiving end, the propagation loss of the signal is calculated and converted into the distance through the fitting regression function [
28]. The calculation method of the distance between two smart mobile terminals may be given as
where Dist is the distance between two devices,
is the signal strength from the Bluetooth port of the second mobile terminal to the first one,
is the signal strength from the Bluetooth port of the first mobile terminal to the second one,
is the Bluetooth channel attenuation coefficent under the standard 1 meter distance,
is environmental attenuation factor,
is environmental correction parameters. According to the inverse square law , RSSI is inversely proportional to the square of the distance.
In a specific example, the above distance Dist is the estimated distance between two smart mobile terminals. Under normal circumstances, according to the surrounding environment, the value of can be given as 59, the value of is 2, and the value of is 0.2.
Different environments have different effects on the variation of RSSI, even if the distance between any two devices in these environments is the same. Therefore, environmental factors need to be considered when applying the path loss model to estimate the distance for a given RSSI. After a comparative analysis of filtering algorithms, Kalman's algorithm is the most effective solution [
29]. Kalman's algorithm is a linear filter and the basic idea is to use the minimum mean square error as the best estimate of time and to complete the estimation of the state variables by the estimate of the previous moment and the observation of the current moment [
30].
Therefore, as shown in
Figure 3. we preprocess the RSSI of the collected broadcast packets to obtain the corrected RSSI values
The Kalman filter model includes the following equations.
Kalman filter time update equations are
Measurement update equation are
3. Risk Level Algorithm
In this system, we describe the size of infection probability by risk level. And alarming is performed by the risk level classification. Therefore, as a core function, in this section we will discuss the algorithm to implement the real-time update and query function of risk level and the algorithm of early warning.
3.1. The Personal risk Level Query
First, the system provides real-time personal risk level update and query. Due to the mobility of people, changes in the detection circumstances, the situation of contact people, and the characteristics of the contact area, the personal risk level needs to be updated in real time.
Taking into account the transmission characteristics of the COVID-19 [
31], the risk level of each target person at each moment is related to the following factors:
The personal risk level at the previous time at the first current time.
The personal risk level determined based on the detection information.
The infection transmission probability at the previous time at the first current time.
The regional risk level of the previous moment of the first current moment.
The infection probability of the environment where the target person is exposed to.
Wherein, the first current moment and the previous moment of the first current moment are separated by a first time period, and the initial personal risk level of each target person is preset.
The initial person risk level of each target may be preset according to the initial information which includes the initial position and initial protective measures. Supposed that the first current moment is recorded as t, the first time period is recorded as
, and the previous moment of the first current moment is recorded as
. And then the calculation and update methods of the person safety level of each target person at each moment may be given by
where
is the safety level of the target person at the
moment, D(t) is the safety level of person after the detection based on detection information at t moment,
is the safety level of each person obtained by the target person’s smart phone throngh Bluetooth interconnection within the set distance,
is person number, the total number of person other than target person is n.
is transmission probability for infection. The infection transmission probability is detemined by the relative positon of the infected person’s device and each other devices, and by protective measures of the infected person.
is the region risk level of the target person at location
at time
.
is the probability of the target person being infected by the environment. The probability of the target person being infected by the environment is determined based on the target person’s protective measures and stay time in the environment.
Through the above formula, the personal risk level can be obtained in an iterative manner. The iterative process is shown in the
Figure 4.
The personal risk level D(t) determined according to the detection information after the detection is a important influencing factor. The risk level can be cleared when the source of infection no longer exists in the monitoring environment. As the number of infected people increases and the environmental risk level rises, the personal risk level increases. The probability of infection transmission is a small value. The risk level of the target person is affected by factors such as the target person's protective measures, the infection rate of other people within the set range, and the environmental infection rate.
Infected people in the area have an impact on the risk level of the target person. Infected people with different risk levels set different infection transmission probabilities according to the relative position and protective measures of the infected people. The level of infected people within the set range is added according to the weight, and has an impact on the risk level of the target person at the current moment. If a person in the range are not an infected person, the infection transmission rate is 0, which does not affect the level of the person.The third term on the right side of the formula calls the database to determine whether people in the current range are in frequent contact before the impact is superimposed. And if it is the frequent contact in 14 days, the impact will not be superimposed.
Taking into account the impact of the environment on the risk level of person, the impact factor is determined by the protective measures of the target person and the stay time in the environment. If the target person stays in a high-risk environment for a long time, the value of will increase accordingly, the risk level of the person will increase, and the risk level will increase.
3.2. The Regional risk Level Query
Second, the system provides the real-time regional risk level query. The reasons for update include person contamination, disinfection measures and the eradication of viruses over time.
The regional risk level of each target location at each moment is related to the following factors:
The regional risk level of the target location at the previous moment of the second current moment.
The disinfection coefficient at the previous time of the second current time.
The personal risk level at the previous moment of the second current moment.
The transmission probability of the infected person to the environment and the infection source dissipation coefficient.
Wherein, the second current moment and the previous moment of the second current moment are separated by a second time period, and the initial regional risk level of each target location is preset;
The initial regional risk level of each target location can be preset according to the relevant data provided by the disease control department. Supposed that the second current moment is recorded as
, the first time period is recorded as
, and the previous moment of the first current moment is
. So the method of calculating the regional risk level of each target location at each moment may be given by
Here,
is the regional risk level of location
at time
.
is the disinfection coefficient of the virus that the virus in the current area before time
is eliminated in a set ratio after disinfection measures are taken at time
.
is the person safety level of person i at time
.
is the transmission probability of a infected person to the environment at time
. Besides, the transmission probability of the infected person to the environment is related to the protection measures and the safety level of the infected person.
is the dissipation factor of the source of infection,
is the attenuation coefficient, and
is the residence time of the source of infection. The attenuation coefficient
is related to the environment and surrounding materials.
From the above formula, it can be confirmed that the regional risk level can also be obtained in an iterative manner. The flow chart of the iteration is shown in the
Figure 5.
The regional risk level is related to the disinfection measures and the person staying in the environment. If the disinfection is better, the regional risk level is lower, and the area is safer. If there is a long-term gathering of infected persons in this area, the longer the time, the higher the regional risk level will be under the influence of the second term of the above formula.
3.3. The Early Warning
Third, the system provides the early warning. The alarm function is divided into two aspects.Alarms are given to smart mobile terminals with a regional risk level greater than the set regional risk level threshold. At the same time, when a smart mobile terminal with a personal risk level greater than the set personal risk level threshold enters the set Bluetooth interconnection range, an alarm is alerted. The set Bluetooth interconnection range may be 30 meters. Distance measurements can be obtained by the RSSI distance conversion algorithm in the previous section. In the actual application process, areas with a regional risk level higher than the regional risk level threshold are set as high-risk areas. In addition, the individual whose personal risk level is greater than the set personal risk level threshold is a high-risk person. In this article, we did not consider the exact scope and classification standard. The research and system implementation data of epidemiologists will play an important role in this regard.
3.3.1. High-risk Person Warning
When a high-risk person appears in the iterative process, the server side of the cloud platform will send alarm information to the high-risk target and the relevant management department to process the alarm information and perform nucleic acid detection and screening. The system backtracks the contact within 14 days, and sends a warning notice to these people in the first time, and updates the risk level of them. People with high risk will display the red circle range on the map, and alarm will be given to each smart mobile terminal when people with high risk enter the range set for Bluetooth interconnection.
3.3.2. High-risk Areas Warning
Existing areas with medium and high risk levels need to be checked by official announcements, which is inconvenient. The user is unlikely to know all the places to avoid, in this case, the area alarm provides a great help. In the alarm function provided by this system, high-risk areas are marked red on the map. Early warning is achieved by maintaining a regular area risk level request to the server by carrying the current location information. When you are near high-risk areas, the server system will sent a warning message to intelligent terminals that enter high-risk areas. And protective tips will pop up in a pop-up window.