Securing Fine-grained Spatio-temporal Top-k Query in TMWSNs: a Novel Scheme

To ensure the security of spatial-temporal Top-k query in two-tiered wireless sensor 1 networks, many schemes have been proposed in the literature in the past decade. However, most 2 of them only consider the scenario where sensor nodes are static, and cannot achieve the security 3 goal for spatial-temporal Top-k query in mobile sensor networks, because the mobility of the 4 sensor nodes will affect the spatial-temporal relationships of the sensory data items generated by 5 the sensor nodes. Although we have proposed some schemes for two-tiered mobile wireless sensor 6 networks (TMWSNs) in our previous work, there is still large room to improve their performances. 7 In this paper, we proposed a novel scheme named STQ-TMWSN for secure fine-grained spatial8 temporal Top-k query in TMWSNs based on the virtual-grid construction and the size-order 9 encryption binding. Theoretic analysis shows that STQ-TMWSN can achieve low computation 10 complexity and high security performance. Simulation results indicate that STQ-TMWSN brings 11 much lower communication cost than the state-of-the-art schemes on securing Top-k query in 12 TMWSNs. 13


Introduction
As one important component of Internet of Things (IoT) [1], wireless sensor net-17 works (WSNs) [2] can be used in many application scenarios and are still being studied 18 [3] by many researchers even though extensive research has been carried out on WSNs 19 for the past two decades. In recent years, two novel variants of WSNs, namely two-tiered 20 wireless sensor networks (TWSNs) [4] and two-tiered mobile wireless sensor networks 21 (TMWSNs) [5], attract more and more attention from both the industial and the research 22 communities since they perform much better than the traditional WSNs on the scalability, 23 the flexibility and the robustness. Such advantages are mainly brought by the two-tiered 24 architecture design of TMWSNs. Specifically, the lower tier of TMWSNs is composed 25 of many mobile sensor nodes, while the upper tier consists of some storage nodes. The 26 mobile sensor nodes at the lower tier are responsible of monitoring the physical environ- 27 ments around and generating sensory data items, which can be transmitted directly to 28 the nearby storage nodes at the upper tier to get stored or processed. Users can retrieve 29 the sensor data items which they are interested in by launching some kinds of queries, 30 such as spatial-temporal Top-k query, to the storage nodes which will send the query 31 results to the users through on-demand wireless links after processing the queries. 32 sensor nodes in TMWSNs. Once a storage node is compromised, all the sensory data 37 items stored on it can be disclosed, and the procedures of query processing on it are no 38 longer trustworthy either. 39 In this paper, we focus on the problem of secure spatial-temporal Top-k query in 40 TMWSNs, considering the storage nodes may be compromised. A spatial-temporal 41 Top-k query is defined as a query which aims to find out the qualified top k sensory data 42 items generated in the queried region and the queried time interval [5]. Our aims are 43 to preserve the privacy of the data items stored on the storage nodes and protect the 44 integrity of the spatial-temporal Top-k query results. 45 To our best knowledge, there have been only a few works studying the problem 46 of securing spatial-temporal Top-k query in TMWSNs at present. Most of the existing 47 secure Top-k query processing schemes are proposed for cloud computing [6,7] and 48 TWSNs [8,9]. Those proposed for cloud computing are not fit for TMWSNs because of 49 the following reasons: Top-k queries in the cloud are generally securely processed based on the data which 51 are outsourced on cloud servers by the same data owner. In cloud computing, the 52 data owner knows all its outsourced data and thus can construct the tree-based 53 index (e.g., IR-tree [10]), the binary heap [11]or other tree-like structures based on 54 the whole data set to facilitate Top-k query without losing data privacy; while in 55 TMWSNs, expect for the storage nodes which are considered as not fully trusted, 56 there is no such a data owner who knows all the sensory data generated by all the 57 sensor nodes and thus cannot construct the data-privacy-reservation index easily. In addition, the secure Top-k query processing schemes proposed for TWSNs are not 64 fit for TMWSNs either, because they cannot preserve the integrity of the spatial-temporal 65 Top-k query results in TMWSNs. In fact, attackers can launch much more covert attacks 66 in TMWSNs than in TWSNs. When a mobile sensor node travels from the queried region 67 to other regions or vice versa in the queried time interval, some sensory data generated 68 by the sensor node may be in the queried region, and others may not. Obviously, the 69 sensory data generated out of the queried region by the traveling sensor node are not 70 the qualified ones which satisfy the requirements of the spatial-temporal Top-k query. 71 However, few securing Top-k query schemes proposed in TWSNs consider this, which 72 leaves leaks for the attackers to launch new kinds of covert attacks. For example, the 73 attackers may replace the data items which are generated in the queried region by a 74 sensor node with those produced out of the queried region by the same sensor node.

75
The above-mentioned reasons motivate us to make a profound study on securing 76 spatial-temporal Top-k query in TMWSNs. In summary, the main contributions of this 77 paper are three fold:

78
• It proposes a novel scheme named STQ-TMWSN (STQ is short for spatial-temporal 79 Top-k query) to preserve the privacy of the data stored on storage nodes and protect proved in the paper that STQ-TMWSN is not only able to preserve the privacy of the 84 sensory data items and their corresponding scores, but also detect the incomplete 85 query results successfully for spatial-temporal Top-k query under the security model 86 presented in this paper.

87
• Extensive simulations were conducted in the paper, and the results show that 88 STQ-TMWSN is much more efficient than the related state-of-the-art schemes. item according to [12].

106
Besides the MAC-based technique, some other methods were also proposed to 107 ensure the privacy of the sensory data and the completeness of the Top-k query results

108
in TWSNs, such as inserting digital watermarks or dummy readings into the normal 109 ones [17] and constructing data aggregation trees [18,19] [5], for securing spatial-temporal Top-k in TMWSNs. However, one of the encryption 150 technologies used in the two schemes is OPES [23] which has been proposed more than   as get the query results from the storage nodes through the on-demand wireless links 178 [12]. 179

180
In this subsection, we introduce the notations and describe the definitions of some 181 terminologies used in this paper. We use the denote the sensory data items generated by sensor node S i at its j th target location in the 183 t th epoch T t , where µ t i,j is the total number of the sensory data items generated by S i at 184 its j th target location in T t . For any sensory data item D t i,j,x , its corresponding data score 185 d t i,j,x can be worked out using a public scoring function f ( * ) [24], namely d t i,j,x = f (D t i,j,x ).

186
Without loss of generality, we assume different sensory data items have distinct scores.

187
Moreover, in order to facilitate presentation, we assume the ranking orders of the sensory 188 data items generated by any sensor node at a target location are consistent with their 190 where i and j are the node ID and the target location ID of S i respectively. The specific 191 meanings of the notations used in this paper are listed in Table 1.

S i
The sensor node whose ID is i(0 < i ≤ N) N Totoal number of sensor nodes in one cell The j th target location of S i during T t µ t i,j Total data item numbers of S i generated at Loc t i,j in T t n t i,j Total number of the qualified Top-k data items generated by S i at Loc t i,j in T t Q t A spatial-temporal Top-k query R t The query result of Q t I Q t The ID of Q t I C The ID of a given cell C QR I C The queried region in cell I C Key t i The pairwise key shared by S i and the network owner in T t RT t

S i
The data report generated by S i in T t E Key t i { * } Symmetric encrypting operation with Key t i based on [25] E OPE { * } Encrypting operation based on the OPE encryption scheme [26] RST t

S i
The processed result of RT t Total number of the sensory data items encrypted in DPP t i,j R tpk Set of the qualified Top-k data items extracted from R t We define the terminologies used in this paper as follows:

193
• Fine-grained Spatial-temporal Top-k Query: Given a cell whose ID is I C in TMWSNs, 194 an epoch T t , and a parameter k, a fine-grained spatial-temporal Top-k query is de-195 fined as the query which tries to find out the top k sensory data items that have the 196 biggest (or the smallest) scores among all the sensory data items generated in QR I C

197
in T t , where QR I C is a sub-region of the cell whose ID is I C . The metalanguage of a 198 fine-grained spatial-temporal Top-k query Q t is described as in Eq.(1).
• Queried Node and Queried Location: given a spatial-temporal Top-k query Q t =

200
{I Q t , T t , k, I C , QR I C }, for anyone of any sensor node's target locations in epoch T t , if 201 it falls in QR I C , then it is called a queried location, and the corresponding sensor 202 node is called a queried node.

203
• Qualified Top-k Data Items: given a spatial-temporal Top-k query if a sensory data item D t quali f ied satisfies the following two conditions, it is called 205 the qualified Top-k data item of Q t : 1) D t quali f ied was generated in QR I C and T t ; 206 2) Among all the sensory data items generated in QR I C and T t , there are at least 207 N Q t − k data items whose scores are smaller (or bigger) than the score of D t quali f ied ,

208
where N Q t refers to the total number of the sensory data items generated in QR I C 209 and T t . is short for 'Order-preserving Encryption' [26]) as well as some proof information 215 generated by S i at Loc t i,j during T t . More specific contents of DPP t i,j will be shown 216 in Algorithm 1 in Section IV. to the cases in real applications. Specifically, we assume a curious storage node will try 222 its best to disclose the sensory data items as well as the data scores computed based on 223 the public scoring function, and a malicious storage node will do its best to undermine 224 the completeness of the spatial-temporal Top-k query results. To achieve the malicious 225 attack, a compromised storage node may put none or only part of the qualified top k data 226 items into the Top-k query result, and it may also put some fabricated data items and/or 227 the unqualified-but-real ones into the query result when processing a spatial-temporal 228 Top-k query. For example, suppose the complete query result should be {D t 1 , D t 2 , D t 3 }.

229
Then an incomplete query result may be {D t real-but-unqualified sensory data item and D t f abricated is a fabricated data item. The  Other information, such as spatial-temporal Top-k query and the generation locations 235 of the sensory data items, will be leaked to storage nodes. It is hard to enable storage 236 nodes to process spatial-temporal Top-k query smoothly and successfully without such 237 leaks. Fortunately, the leaked information brings little threat to the safety of the systems.

238
Moreover, we assume each mobile sensor node is assumed to be equipped with the 239 tamper-proof hardware, with the help of which the adversaries cannot disclose the 240 encryption materials stored in the hardware even if they capture the sensor nodes [9]. Under the system and the security models described above, the problem tackled in 243 this paper can be presented in a word as follows: Given a spatial-temporal Top-k query

247
Specifically, our design goal is to propose a novel scheme which enables efficient, 248 privacy-preservation and integrity-verifiable query processing for spatial-temporal Top-k 249 query in TMWSNs. Specifically, three objects as follows should be achieved.

250
• The privacy-preservation goal: Our proposed scheme should preserve the privacy 251 of the sensory data items and their scores collected from the mobile sensor nodes.

252
Without losing the ability of processing spatial-temporal Top-k query, storage nodes 253 in the systems must be not able to disclose the sensory data items and their scores.    As it is assumed in [12], we also assume each sensor node is pre-loaded with a is an one-way hash function.

286
In STQ-TMWSN, the sensor deployment field is divided into many small virtual   This subsection describes how each sensor node generates its data report, which 305 will be uploaded to the corresponding storage node at the end of each epoch, based on 306 its own sensory data items under the privacy-and-integrity preservation requirements.

307
Specifically, for any sensor node S i (0 < i ≤ N), the procedure of data report generation 308 in STQ-TMWSN is shown in Algorithm 1.

309
In the protocol, S i firstly computes the score of each sensory data item generated 310 by itself based on the public scoring function; then, it works out DPP t i, for each of its target locations which it has been moved to during epoch T t . To do this, 312 three cases are considered: item was generated by S i at Loc t i,j in epoch T t , and it also needs to include both the 317 pairwise-key-encrypted score and the OPE-encrypted score of the only data item. The 318 former will be used as part of the proof information for integrity verification, and the 319 latter will be used by storage nodes to process spatial-temporal Top-k query smoothly.

320
The only sensory data item should also be encoded using the pairwise key and included if µ t i,j = 1 then 343 7:  if Loc t i,j is in QR I C then     corresponding to the spatial-temporal Top-k query Q t .

457
The main idea of Algorithm 3 to verify the completeness of R t is to find out the 458 minimal data score of the qualified Top-k data items and the maximal score of the un- Top-k data items. If at least one of those qualified Top-k data items was deleted from DPP t i,j by 572 the storage node when producing RST t S i of R t which is the query result of the spatial-temporal 573 Top-k query Q t = {I Q t , T t , k, I C , QR I C }, under the security model described in Section III, the 574 incomplete R t must be detected by any network owner with a 100% successful rate in TMWSNs 575 aided by our scheme STQ-TMWSN.

576
Proof. Since the storage node does not know Key t i , if it inserts the sensory data items 577 which are encrypted with some other keys rather than Key t i into DPP t i,j (∀i ∈ [1, N], ∀j ∈ 578 [1, λ t i ]), the incomplete R t must be detected by the network owner according to lines 579 6 ∼ 9 in Algorithm 3. Moreover, according to lines 33 ∼ 35 in Algorithm 3, R t must be 580 also considered as incomplete if the storage node puts any encrypted data item, which 581 was generated by S i in T t at some other location rather than Loc t i,j , into DPP t i,j . Thus, in 582 the following of this proof, we need only to consider the situation that all the encrypted 583 sensory data items left in DPP t i,j after being processed by the storage node are the real 584 ones which were generated by S i (∀i ∈ [1, N]) at Loc t i,j in T t (but some or all of them may 585 not be the qualified ones). Then, if at least one qualified sensory data items generated by 586 S i at Loc t i,j in T t is discarded by the storage node, one of the following two cases must 587 appear: 1) the storage node has deleted all the sensory data items from DPP t i,j when 588 producing RST t S i of R t ; 2) the storage node only discarded part of the sensory data items 589 from DPP t i,j , and the discarded data items contain some qualified one/ones.

590
First of all, consider the case that the storage node has deleted all the sensory data 591 items from DPP t i,j . In this case, the storage node should leave E Key t sensory data items generated in QR I C and T t must be put into R tpk to make the number 596 of the elements in R tpk equal to k according to lines 53 ∼ 55 in Algorithm 3. If the 597 discarded sensory data items contain some qualified one/ones, d t i,j,1 must be the score of 598 a qualified Top-k data item. Then, f (MIN(R tpk )) must be smaller than MAX(V nonTop ) 599 because the score of any qualified Top-k data item must be bigger than that of any real 600 but unqualified one generated in QR I C and T t consuming all data scores are distinct.

601
Thus, according to lines 56 ∼ 58 in Algorithm 3, the incomplete R t must be detected by 602 the network owner.

603
Then, consider the case that the storage node deletes part of the sensory data items 604 from DPP t i,j , and the deleted data items contain some qualified one/ones. In this case, 605 two situations should be discussed. One is that all the sensory data items encrypted 606 with sequence order numbers are deleted from DPP t i,j , while the other one is that at 607 least one sensory data item encrypted with a sequence number is left in DPP t i,j after 608 being processed. In the first situation, E Key t If the storage node discards the sensory data items/item in set Φ 1 or Φ 2 from DPP t

657
This section analyzes the computation complexity of the three schemes presented 658 above.

659
Firstly, the computation complexity of Algorithm 1 is analyzed as follows. Since and fit for the resource-limited sensor nodes [28,29], let alone the storage nodes which are 669 much more powerful than the sensor nodes. Moreover, OPE also has low computation 670 complexity according to [26]. For each DPP t i,j (0 < i ≤ N, 0 < j ≤ µ t i,j ), the length of 671 the data that need to be encrypted varies according to µ t i,j , which symbolizes the total 672 number of the sensed-data items generated by S i at Loc t i,j in T t . Let l D and l d denote the 673 bit length of a sensed-data item and that of a data score respectively, l n symbolize not 674 only the bit length of a sequence number but also that of µ t i,j , l Loc refers to the bit length  (3) and (4) respectively according to Algorithm 1. Finally, it is the turn of Algorithm 3, which mainly consists of one outer "for" loop 686 whose loop body contains an inner "for" loop. In the loop body of the outer loop, the

Parameters Default value
N 300 T (Length of each epoch) 100 s T mobile (Period for a sensor node to keep moving) 5 s T static (Period for a sensor node to keep static) 5 s m speed (Moving speed of each mobile sensor node) 5 m/s r mobile (Ratio of the mobile sensor nodes to the total ones) 100 % C size ( Cell size) 400×400 m 2 R (Sensor communication radius) 50 m r D (Data generation rate of each sensor node) 2 items/s q period (Period for the network owner to launch a query) 5 s q radius (Radius of the queried region which is a circle) 50 m l D (Length of a sensory data item) 400 bits l d (Length of a data score) 20 bits l n (Length of a sequence number) 10 bits l id (Length of an ID number) 10 bits l t (Length of a time data) 32 bits l Loc (Length of each two-dimensional location) 128 bits l VLoc (Length of each target location) 16 bits e send (Cost of sending one bit data) 1 mJ e receive (Cost of receiving one bit data) 1 mJ

714
This subsection presents the simulation results of C cell and R vs with different set-