Submitted:
15 January 2024
Posted:
15 January 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Our approach and goals
1.2. The QoS and the QoIS of a WSN
1.3. Data delivery models of WSNs and their QoIS
- Event-driven model
- Query-driven model
- Continuous delivery model
1.3.1. Event-driven
1.3.2. Query-driven
1.3.3. Continuous delivery
2. QoIS Assessment Criteria
- Raw data is collected by the sensor nodes respectively at the area of interest where the phenomenon is happening (example: a numeric value of temperature).
- Information which is the aggregation of raw data sensed by each node all over the network. the goal of the aggregation process is to extract meaningful information from raw data via function(s) that we name "aggregation function" (example: the mean temperature in an area, the aggregation function is the mathematical mean function).
- Knowledge retrieved from information gathered by a sensor network through out a long enough period of time to make conclusions based on these collected information and other contextual information. These conclusions are what we name "knowledge" (example : the mean temperature of a sensed area at day/night of each season. The contextual information here is the date/time when data was collected).
2.1. Raw data
- Accuracy: refers to the degree to which the information provided by a WSN reflects the true state of the environment being monitored [16] .
- Relevance: refers in the article [16] to whether the information provided by a WSN is useful and applicable for its intended purpose. Irrelevant data can waste resources and reduce overall system performance.
- Redundancy: refers to the information overlap among different sensor nodes, and can be measured via investigating the similarity of their sensing results [18]. Data provided by a sensor node that is similar to the one provided by another node is likely to be redundant.
- Consistency: refers to whether the information provided by a WSN is internally consistent and free from contradictions or errors [16]. Inconsistent data can lead to incorrect conclusions or decisions.
- Trustworthiness: refers to whether the information provided by a WSN can be trusted by its intended recipients [16]. This includes factors such as data integrity, security, and privacy.
2.2. Aggregated information
- Shannon entropy: is a measure of the amount of uncertainty or randomness in a system. It was introduced by Claude Shannon in 1948 as a way to quantify the amount of information contained in a message or signal [21]. The entropy of a system is defined as the average amount of information needed to describe the state of the system. In other words, it is a measure of the amount of "surprise" or "uncertainty" associated with the system. Shannon entropy has applications in a wide range of fields, including information theory, cryptography, statistical physics, and computer science.
- User requirements responding: refers to the information needs of users in a WSN. These requirements are based on a set of attributes that define the Quality of Information (QoI) that users expect from the system [16]. The article also suggests that user requirements can be thought of as measured information based on a specific set of attributes and, notes that users in a WSN may not necessarily be human, but could also be other systems or applications that rely on the data provided by the WSN.
- Error probability: is defined as a measure of the likelihood that an error will occur in the dissemination of information acquired/extracted/transmitted in a wireless sensor network (WSN).
- Path Weakness: is a game-theoretic metric used to measure the qualitative performance of different routing mechanisms in WSNs. The approach uses qualitative performance as a QoI characteristic and uses a sensor-centric concept. According to the article, path weakness is calculated by considering the probability that an attacker can successfully launch an attack on a given path in the network. The higher the path weakness, the more vulnerable the path is to attacks and therefore, lower its QoI.
- Transient Information level: is a key metric used to define the QoI assigned to a message in a WSN. According to the article, transient information level is defined as the product of information and projected physical distance of that information from destination node. In other words, transient information level takes into account both the amount of information being transmitted and how far it needs to travel to reach its destination. This approach is relevant to QoI information transport block as attributes related to information transport such as timeliness of information are used.
- Peak Signal-to-Noise Ratio (PSNR): is a metric used to measure the quality of a reconstructed signal compared to its original form. It is commonly used in image and video processing applications to evaluate the fidelity of the reconstructed signal. However, this approach only focuses on accuracy and does not consider other attributes such as timeliness for timely arrival of information for decision making.
2.3. Knowledge
- Recall: is a metric used to evaluate the performance of a classification model in machine learning. It measures the proportion of actual positive cases that were correctly identified by the model, out of all positive cases in the data set.
- False-positive Rate: is another metric used to evaluate the performance of a classification model in machine learning. It measures the proportion of negative cases that were incorrectly classified as positive by the model, out of all negative cases in the data set.
- Overall Success Rate: is somehow a combination of both Recall and False-positive rate. In other words, it is the proportion of negative cases that were correctly classified as negative by the model as well as positive cases correctly identified by the model.
3. QoIS Improvement techniques
3.1. Data Cleaning
- Identifying and measuring quality indicators and metrics. This article identifies four quality indicators: data volume, completeness, accuracy, and consistency. For each indicator, the article proposes a set of metrics that can be used to measure its value. For example, the data volume indicator can be measured by counting the number of data points collected by each node in the network.
-
Applying a cleaning strategy based on these indicators taking into consideration the relationships between indicators : step, the article proposes a cleaning strategy based on these indicators. This cleaning strategy consists of three main steps:
- –
- Identifying errors in the data using the quality metrics.
- –
- Repairing or removing errors using appropriate techniques such as interpolation or outlier detection.
- –
- Validating the cleaned data using additional quality metrics.
- Completeness is positively correlated with data volume.
- Time-related indicator is positively correlated with correctness.
- Calculate the volume indicator of data set D
-
If the volume indicator is larger than a given threshold then:
- –
- Clean the data set by completeness indicator: identify missing or lost data points and then use appropriate techniques such as interpolation or extrapolation to fill in the missing values. The article notes that different interpolation techniques can be used depending on the characteristics of the data set, such as linear interpolation, cubic spline interpolation, or kriging.
- –
- Clean the data set by time-related indicator: the algorithm is based on the assumption that the time interval between two consecutive data collections is constant. The algorithm identifies and removes outliers by comparing the time interval between two consecutive data collections with a predefined threshold value.
- –
- Clean the data set by correctness indicator: identify and remove outliers based on statistical analysis of the data set.
3.2. Data Selection and Transmission
3.3. Data Approximation
3.4. Data Compression
- Discrete Cosine Transform (DCT): is a mathematical technique that converts a signal or image from its spatial domain into its frequency domain. It is similar to the Fourier Transform, but it uses only real numbers and cosine functions instead of complex numbers and sine/cosine functions. The DCT is widely used in image and video compression algorithms, such as JPEG (Joint Photographic Experts Group) and MPEG (Moving Picture Experts Group), because it can efficiently represent signals with smooth variations in their frequency content [33]. The DCT coefficients can be quantized and encoded using lossy or lossless compression techniques to reduce the amount of data needed to represent an image or a video stream without significant loss in visual quality.
- Discrete wavelet transform (DWT): is a mathematical technique that decomposes a signal or an image into a set of wavelets, which are small waves that can capture localized features or details in the signal. Unlike the Fourier Transform, which uses only sine and cosine functions to represent signals in their frequency domain, wavelets can be designed to have different shapes and scales that can better capture different types of features in a signal [33]. The DWT is widely used in image and video compression algorithms, such as JPEG2000, because it can efficiently represent signals with both smooth variations and sharp edges or details in their frequency content.
- Differential Encoding: is a simple data compression technique that is commonly used in WSNs. The basic idea behind differential encoding is to encode the difference between consecutive data samples instead of encoding the actual data values themselves [34]. This can be useful in cases where the data being transmitted has a high degree of correlation or similarity between consecutive samples. In differential encoding, the first data sample is transmitted as-is, while subsequent samples are encoded as the difference between the current sample and the previous one. This difference value can be represented using fewer bits than the original sample value, resulting in a reduction in the amount of data that needs to be transmitted. At the receiver end, the original data can be reconstructed by adding up all of the difference values starting from the first sample. Differential encoding can be particularly effective for applications where small changes in sensor readings are more important than absolute values, such as temperature or humidity monitoring. However, it may not be suitable for applications where large changes in sensor readings need to be accurately captured and transmitted.
- Run-Length Encoding (RLE): is a data compression technique used in WSNs. The basic idea behind RLE is to represent long sequences of repeated data values with a single symbol or code. This can be useful in cases where the data being transmitted has long runs of identical or similar values [35]. In RLE, the data stream is scanned for runs of consecutive identical values. Each run is then replaced with a code that represents the length of the run and the value being repeated. For example, if a sequence of 10 zeros is encountered, it can be replaced with a code that represents "repeat 0 ten times". This can result in significant reductions in the amount of data that needs to be transmitted, especially for applications where long runs of identical or similar values are common. At the receiver end, the original data can be reconstructed by decoding each run code and repeating the corresponding value for the specified number of times. RLE is a simple and efficient compression technique that can be implemented using minimal computational resources, making it well-suited for use in resource-constrained WSNs. However, it may not be as effective for compressing more complex or diverse types of data.
- Huffman Coding: is a popular data compression technique that is commonly used in WSNs. The basic idea behind Huffman coding is to assign variable-length codes to different symbols based on their frequency of occurrence in the data stream [36] [37]. Symbols that occur more frequently are assigned shorter codes, while symbols that occur less frequently are assigned longer codes. In Huffman coding, a binary tree is constructed based on the frequency of occurrence of each symbol in the data stream. The most frequent symbols are placed near the root of the tree, while less frequent symbols are placed further away from the root. Each symbol is then assigned a unique binary code based on its position in the tree. The resulting code is a prefix code, which means that no code word is a prefix of any other code word. At the receiver end, the original data can be reconstructed by decoding each code word using the same binary tree used for encoding. Huffman coding can be very effective for compressing data streams with non-uniform symbol frequencies, such as text or image data. However, constructing an optimal Huffman tree requires knowledge of the frequency distribution of symbols in advance, which may not always be available in real-world applications. The authors of [37] show that given general knowledge of the parameters that must be monitored, conventional Huffman coding can be used to compress the data collected by the sensor nodes. When the data consists of integer measurements, the Huffman dictionary computed using statistics inferred from public datasets often approaches the entropy of the data. This allows for efficient compression of data with minimal loss of information, making it a suitable method for use in WSNs.
- Lempel-Ziv-Welch (LZW) Compression: considered as a lossless data compression algorithm, it is commonly used in WSNs. The basic idea behind LZW compression is to replace frequently occurring sequences of symbols with shorter codes, thereby reducing the overall size of the data stream [38]. In LZW compression, a dictionary of symbol sequences and their corresponding codes is built up as the data stream is processed. Initially, the dictionary contains all possible single symbols in the data stream. As the data stream is processed, frequently occurring symbol sequences are added to the dictionary and assigned shorter codes. At the encoder end, each symbol sequence in the data stream is replaced with its corresponding code from the dictionary. The resulting code sequence can be transmitted using fewer bits than would be required to represent each symbol individually. At the receiver end, the original data can be reconstructed by decoding each code and using the same dictionary used for encoding. LZW compression can be very effective for compressing text or image data streams with repetitive patterns or long runs of identical symbols. However, it requires more computational resources than some other compression techniques and may not always be suitable for use in resource-constrained WSNs.
- Arithmetic Coding: the basic idea behind arithmetic coding is to represent a sequence of symbols as a single fractional value between 0 and 1, which can then be encoded using fewer bits than the original sequence [39]. In arithmetic coding, each symbol in the data stream is assigned a probability based on its frequency of occurrence. A cumulative probability distribution is then constructed by adding up the probabilities of all symbols up to and including the current symbol. The entire data stream can be represented as a single fractional value within the range defined by the cumulative probability distribution. At the encoder end, the fractional value representing the entire data stream is then converted into a binary code using fewer bits than would be required to represent each symbol individually. At the receiver end, the original data can be reconstructed by decoding each binary code and using the same probability distribution used for encoding. Arithmetic coding can be very effective for compressing data streams with non-uniform symbol frequencies, such as text or image data. However, it requires more computational resources than some other compression techniques and may not always be suitable for use in resource-constrained WSNs.
- Burrows-Wheeler Transform (BWT): is a lossless data compression technique that is commonly used in WSNs. The basic idea behind BWT is to rearrange the symbols in a data stream to create a new sequence that has more repeated patterns, which can then be compressed more effectively [40]. In BWT, the original data stream is first transformed into a matrix of all possible cyclic permutations of the symbols in the stream. The rows of this matrix are then sorted lexicographically to create a new matrix. The last column of this new matrix is then extracted and used as the transformed data stream. At the encoder end, this transformed data stream is then compressed using techniques such as run-length encoding or Huffman coding. At the receiver end, the original data can be reconstructed by reversing the BWT process. BWT can be very effective for compressing text or image data streams with repetitive patterns or long runs of identical symbols. However, it requires more computational resources than some other compression techniques and may not always be suitable for use in resource-constrained WSNs.
- Prediction-Based Compression: is a data compression technique that is commonly used in WSNs. The basic idea behind prediction-based compression is to use the correlation between adjacent data samples to predict the value of the next sample, and then encode the difference between the predicted value and the actual value [41]. In prediction-based compression, a model is first created to predict the value of each sample based on its previous samples. This model can be as simple as a linear predictor that uses a weighted sum of the previous samples, or it can be more complex, such as an auto-regressive model that uses a linear combination of past samples and past prediction errors. At the encoder end, each sample in the data stream is predicted using this model, and then the difference between the predicted value and the actual value is encoded using techniques such as Huffman coding or arithmetic coding. At the receiver end, these differences are decoded and added back to the predicted values to reconstruct the original data stream. Prediction-based compression can be very effective for compressing data streams with high temporal correlation, such as audio or video data. However, it requires more computational resources than some other compression techniques and may not always be suitable for use in resource-constrained WSNs.
3.5. Data Prediction
-
Single Prediction Scheme:In this category, prediction is made in a single point of the network (sensor or cluster head/ sink). For example, the Cluster Head has the capability to anticipate the data obtained from sensor nodes and independently determine the optimal timing for acquiring additional measurements, taking into account the reliability of the predictions. Alternatively, sensor nodes can forecast alterations in their environment, enabling them to circumvent unnecessary measurements (and consequently unnecessary transmissions).
- –
-
Model generation in the Cluster Heads / Sink:Given that Cluster Heads (CHs) / Sinks possess greater computational power and energy resources, they can formulate sophisticated prediction models and make crucial decisions regarding the WSN’s operation without compromising the QoI delivered by the measurements. In this case, the sensor nodes are tasked solely with their fundamental functions, which involve measuring environmental parameters and transmitting the raw data gathered by their sensors.In environmental monitoring, the data sensed by nodes typically exhibit spatio-temporal correlations, facilitating the creation of probabilistic models. This allows for the approximation of data to well-known distributions and the assignment of confidence levels to predictions. As a result, the number of transmissions is minimized, as CHs predict measurements and locally verify whether the necessary QoI constraints are met or not. Examples of this model generation scheme include adaptive sampling, topology control, and clustering.
- –
-
Model generation in the Sensor Nodes:The prediction is made at the level of the sensor nodes. Considering the potential constraints on the computing power of sensor nodes, decisions regarding predictions can be supported by the data from their neighboring nodes, making the process distributed. For example, rather than transmitting every measurement to the Cluster Head (CH), a sensor node may autonomously decide not to transmit if it observes that the measurements from its neighbors are adequate for accurately monitoring its region.
-
Dual Prediction Scheme:In this category, predictions are made in CHs and Sensor Nodes in a simultaneous way. The underlying concept of such mechanisms is that sensor nodes possess the capability to generate the same "a priori" knowledge as CHs. However, sensor nodes can independently verify the accuracy of predictions locally, thereby avoiding unnecessary transmissions.
- –
-
Model generation in Cluster Heads/Sink:This scheme capitalizes on the asymmetric distribution of computational power in Wireless Sensor Networks (WSNs): Cluster Heads (CHs)/Sinks typically have more abundant resources, including cheaper energy sources, as well as more memory and processing power, compared to ordinary sensor nodes primarily utilized for measuring and reporting environmental data. Sensor nodes transmit their current measurements to the CHs, enabling them to locally, based on the received values, generate new prediction models for each sensor node, update and transmit new model parameters and error acceptance levels to their sensor nodes. Decisions can be made at the level of the CH, for example the Dual Kalman Filter can be employed, leveraging spatial correlations among measurements from various sensor nodes. Subsequently, the CHs assess their capacity to compute multiple prediction models for each sensor node and opt for the most suitable one based on the received measurements. Alternatively, decisions are made locally within the sensor nodes, using methods such as GP (Gaussan Process) Regression. In this context, each sensor must forecast the information it is about to sample and adapt its sampling schedule in accordance with energy constraints. The goal is to maximize the information collected during a specific time interval.
- –
-
Independent model generation:This scheme relies on an "initialization phase," which is a designated period during which sensor nodes report all the data they have generated to CHs. The purpose of the initialization phase is to ensure that CHs possess comprehensive information about the environment before any prediction model is generated. Following the initialization, CHs can generate prediction models similar to those in the sensor nodes without requiring additional transmissions. Subsequently, both entities commence predicting values, with the advantage that sensor nodes can locally verify the accuracy of predictions. If a prediction is deemed inaccurate, the sensor nodes may transmit the actual measurement as needed. Consequently, sensor nodes have the option to regularly report data to CHs in cases of prediction inaccuracies or refrain from reporting any sensor reading if the predictions are deemed sufficiently accurate.Decisions can be made in the CH/Sink, CHs have the capability to adjust the operation of sensor nodes based on the potential savings that predictions may bring. CHs assess the cost-effectiveness of making predictions in sensor nodes by calculating the relationship between prediction accuracy, the correlation between measurements, and the error tolerance acceptable to the user. This evaluation guides CHs in determining whether it is advantageous to implement predictions in sensor nodes or not. Otherwise, decisions can be made in the sensor nodes. Sensor nodes may possess the capability to make additional decisions contingent on the accuracy of predictions. They can autonomously choose to aggregate data received from neighboring nodes by excluding measurements that fall within their confidence interval, rather than forwarding them to the CHs.
- –
-
Model generation in sensor nodes:Sensor nodes can generate prediction models by distributed knowledge in the WSN. It can be dessiminated from one node to its neighbors, providing additional information about the surroundings enhances each node’s capacity to compute more accurate prediction models before transmitting their parameters to the CH/Sink.
-
Time Series methods:A time series is a series of data points, typically comprised of observations recorded over a specific time interval and arranged in chronological order. A time series prediction model takes as input a time series to generate predictions. These predictions are expressed as a function of past observations and their corresponding time. Typically, this function is defined by parameters calculated based on previous observations. These parameters should be updated over time since the environment may evolve and change. There are many time series methods. The most popular are: the naive approaches, the Auto-regressive (AR), the Moving Average (MA), the Exponential Smoothing (ES) and the Auto-regressive Integrated Moving Average (ARIMA) [43].
-
Regression methods:Regression methods take a different approach compared to time series methods. Rather than relying solely on past values for predictions, they also predict measurements based on different types of measurements. For example, when a value is observed by one sensor node, a regressive model can be employed to predict the value that would be observed by another sensor node. The main regression methods used in WSN data reduction paradigm are: the Linear Regression, the Kernel Regression, the Gaussian process regression and the Principal Component Analysis (PCA). [44]
-
Machine Learning techniques:Machine Learning techniques have found applications in various solutions for WSNs across different levels, including routing, medium access control, and event detection. Among these solutions, Artificial Neural Networks stand out as the only ones specifically applied to reduce the number of transmissions in a WSN. [45].Different Neural Networks can be employed to reduce transmissions in WSNs and consequently reduce energy consumption [46]. Some of these Neural Networks are Self Organizing Maps (SOM), Back Propagation (BP), Radial Basis Function (RBF)... etc.Other data prediction methods exist in the literature such as the multi-node multi-feature (MNMF) [47], based on bidirectional long short-term memory (LSTM) network for WSNs. The proposed method considers the spatial-temporal correlation among sensory data, and aims to improve data quality and reduce unnecessary data transmission. The authors use the quartile method and wavelet threshold de-noising method to improve data quality, and then use the bidirectional LSTM neural network to learn prediction features. Experimental results are provided in [47] to demonstrate the effectiveness of the proposed MNMF method in improving prediction accuracy compared to other existing methods.
3.6. Smart Sampling
4. Conclusions and Future work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kalsoom, T.; Ahmed, S.; Rafi-ul Shan, P.M.; Azmat, M.; Akhtar, P.; Pervez, Z.; Imran, M.A.; Ur-Rehman, M. Impact of IoT on Manufacturing Industry 4.0: A New Triangular Systematic Review. Sustainability 2020, 12, 8465. [Google Scholar] [CrossRef]
- Mashayekhy, Y.; Babaei, A.; Yuan, X.M.; Xue, A. Impact of Internet of Things (IoT) on Inventory Management: A Literature Survey. Logistics 2020, 6, 33. [Google Scholar] [CrossRef]
- Dash, S.P. The Impact of IoT in Healthcare: Global Technological Change & The Roadmap to a Networked Architecture in India. J. Indian Inst. Sci. 2020, 100, 773–785. [Google Scholar] [PubMed]
- Khanna, A.; Kaur, S. Evolution of Internet of Things (IoT) and its significant impact in the field of Precision Agriculture. Computers and Electronics in Agriculture 2019, 157, 218–231. [Google Scholar] [CrossRef]
- Weiser, M. Some Computer Science Issues in Ubiquitous Computing. Communications of the ACM 1993, 36, 75–84. [Google Scholar] [CrossRef]
- Atzori, L.; Iera, A.; Morabito, G. The internet of things: A survey. Computer networks 2010, 54, 2787–2805. [Google Scholar] [CrossRef]
- Akyildiz, I.F.; Su, W.; Sankarasubramaniam, Y.; Cayirci, E. Wireless sensor networks: a survey. Computer networks 2002, 38, 393–422. [Google Scholar] [CrossRef]
- Dey, A.K.; Abowd, G.D. Towards a Better Understanding of Context and Context-Awareness. In Proceedings of the in the Workshop on The What, Who, Where, When, and How of Context-Awareness, as part of the 2000 Conference on Human Factors in Computing Systems (CHI 2000), The Hague, The Netherlands, 3 April 2000. [Google Scholar]
- Perera, C.; Zaslavsky, A.; Christen, P.; Georgakopoulos, D. Context Aware Computing for The Internet of Things: A Survey. IEEE Communications Surveys & Tutorials 2014, 16, 414–454. [Google Scholar]
- O’Donoghue, J.; Herbert, J. Data Management within MHealth Environments: Patient Sensors, Mobile Devices, and Databases. J. Data and Information Quality 2012, 4. [Google Scholar] [CrossRef]
- Attar, H. Joint IoT/ML Platforms for Smart Societies and Environments: A Review on Multimodal Information-Based Learning for Safety and Security. J. Data and Information Quality 2023, 15. [Google Scholar] [CrossRef]
- Badis, H.; Munaretto, A.; Al Agha, K.; Pujolle, G. QoS for Ad hoc Networking Based on Multiple Metrics: Bandwidth and Delay. In Proceedings of the Proceedings of IEEE MWCN2002; IEEE, 2003; pp. 1–6. [Google Scholar]
- Chen, D.; Varshney, P.K. QoS support in wireless sensor networks: a survey. Wireless Communications and Mobile Computing 2004, 4, 907–932. [Google Scholar]
- Meguerdichian, S.; Koushanfar, F.; Potkonjak, M.; Srivastava, M.B. Coverage problems in wireless ad-hoc sensor networks. ACM Sigmobile Mobile Computing and Communications Review 2001, 5, 97–101. [Google Scholar]
- Meguerdichian, S.; Koushanfar, F.; Qu, G.; Potkonjak, M. Exposure in wireless ad-hoc sensor networks. In Proceedings of the Proceedings of the 8th annual international conference on Mobile computing and networking; 2001; pp. 139–150. [Google Scholar]
- Sachidananda, V.; Khelil, A.; Suri, N. Quality of Information in Wireless Sensor Networks: A Survey. Dependable, Embedded Systems and Software Group 2010, 10. [Google Scholar]
- Lehner, W.; Klein, A. Representing Data Quality in Sensor Data Streaming Environments. J. Data Inf. Qual. 2009, 1, 1–28. [Google Scholar] [CrossRef]
- Su, L.; Hu, S.; Li, S.; Liang, F.; Gao, J.; Abdelzaher, T.F.; Han, J. Quality of information based data selection and transmission in wireless sensor networks. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems; 2014; pp. 1–14. [Google Scholar]
- Cheng, H.; Feng, D.; Shi, X.; Chen, C. Data quality analysis and cleaning strategy for wireless sensor networks. Research Open Access 2015, 1, 1–10. [Google Scholar] [CrossRef]
- Li, F.; Nastic, S.; Dustdar, S. Data Quality Observation in Pervasive Environments. IEEE Transactions on Mobile Computing 2012, XX, 602–602. [Google Scholar]
- Lesne, A. Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Mathematical Structures in Computer Science 2014, 24, e240311. [Google Scholar] [CrossRef]
- Ghorbel, O.; Ayedi, W.; Snoussi, H.; Abid, M. Fast and Efficient Outlier Detection Method in Wireless Sensor Networks. IEEE Sensors Journal 2015, 15, 3403–3411. [Google Scholar] [CrossRef]
- Zhuang, Y.; Chen, L. In-network outlier cleaning for data collection in sensor networks. In Proceedings of the 2006 international conference on Wireless communications and mobile computing.; ACM, 2006; pp. 1057–1062. [Google Scholar]
- Hamrani, A.; Belaidi, I.; Monteiro, E.; Lorong, P. On the Factors Affecting the Accuracy and Robustness of Smoothed-Radial Point Interpolation Method. Advances in Applied Mathematics and Mechanics 2017, 9, 43–72. [Google Scholar] [CrossRef]
- Rahm, E.; Do, H.H. Data cleaning: problems and current approaches. Bulletin of the Technical Committee on Data Engineering 2000, 23, 3–13. [Google Scholar]
- Osborne, M.A.; Garnett, R.; Roberts, S.J. Active data selection for sensor networks with faults and changepoints. IEEE Transactions on Signal Processing 2008, 56, 5457–5467. [Google Scholar]
- MacKay, D.J. Information-Based Objective Functions for Active Data Selection, 1992.
- Seo, S.; Wallat, M.; Graepel, T.; Obermayer, K. Gaussian process regression: Active data selection and test point rejection. Advances in Neural Information Processing Systems 2000, 12, 610–616. [Google Scholar]
- Meliou, A.; Guestrin, C.; Hellerstein, J.M. Approximating sensor network queries using in-network summaries. In Proceedings of the 8th International Conference on Information Processing in Sensor Networks, IPSN 2009., San Francisco, California, USA, 13-16 April 2009; pp. 229–240. [Google Scholar]
- Kamal, A.R.M.; Hamid, M.A. Reliable data approximation in wireless sensor network. Procedia Computer Science 2013, 19, 1046–1051. [Google Scholar] [CrossRef]
- Kamal, A.R.M.; Razzaque, M.A.R.; Nixon, P. 2pda: two-phase data approximation in wireless sensor network. In Proceedings of the Proceedings of the 7th ACM workshop on Performance evaluation of wireless ad hoc, sensor, & ubiquitous networks. ACM, 2010, pp. 1–8.
- Srisooksai, T.; Keamarungsi, K.; Lamsrichan, P.; Araki, K. Practical data compression in wireless sensor networks: A survey. ICT Express 2011, 1, 59–63. [Google Scholar] [CrossRef]
- Sheltami, T.; Musaddiq, M.; Shakshuki, E. Data compression techniques in Wireless Sensor Networks. Future Generation Computer Systems 2016, 64, 151–162. [Google Scholar] [CrossRef]
- Aquino, J.F.; Nakamura, E.F.; Loureiro, A.A.; Endler, M. A differential coding algorithm for wireless sensor networks. In Proceedings of the 2008 IEEE 19th International Symposium on Personal, Indoor and Mobile Radio Communications. IEEE, 2008, pp. 1–5.
- Liew, S.C.; Liew, S.W.; Zain, J.M. Reversible Medical Image Watermarking For Tamper Detection And Recovery With Run Length Encoding Compression. World Academy of Science, Engineering and Technology 2010, 4, 674–679. [Google Scholar]
- Huffman, D.A. A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE 1952, 40, 1098–1101. [Google Scholar] [CrossRef]
- Medeiros, H.P.; Maciel, M.C.; Souza, R.D.; Pellenz, M.E. Lightweight Data Compression in Wireless Sensor Networks Using Huffman Coding. Sensors 2015, 15, 29089–29108. [Google Scholar] [CrossRef]
- Apostolico, A. Fast gapped variants for Lempel–Ziv–Welch compression. Information and Computation 2007, 205, 1012–1026. [Google Scholar] [CrossRef]
- Witten, I.H.; Neal, R.M.; Cleary, J.G. Arithmetic coding for data compression. Communications of the ACM 1987, 30, 520–540. [Google Scholar] [CrossRef]
- Manzini, G. An Analysis of the Burrows-Wheeler Transform. Dipartimento di Informatica, Università del Piemonte Orientale, Italy 2001.
- Chen, C.; Zhang, L.; Tiong, R.L.K. A new lossy compression algorithm for wireless sensor networks using Bayesian predictive coding. Wireless Networks 2020, 26, 5535–5547. [Google Scholar] [CrossRef]
- Dias, G.M.; Bellalta, B.; Oechsner, S. A Survey about Prediction-based Data Reduction in Wireless Sensor Networks. ACM Computing Surveys (CSUR) 2016, V, A. [Google Scholar] [CrossRef]
- Fildes, R.; Hibon, M.; Makridakis, S.; Meade, N. Generalising about univariate forecasting methods: further empirical evidence. International Journal of Forecasting 1998, 14, 339–358. [Google Scholar] [CrossRef]
- Jacquot, A.; Chanet, J.P.; Hou, K.M.; Diao, X.; Li, J.J. LiveNCM: A new wireless management tool. In Proceedings of the IEEE AFRICON 2009, Nairobi, Kenya; 2009; p. 6, ZSCC: 0000007. [Google Scholar]
- Alsheikh, M.A.; Lin, S.; Niyato, D.; Tan, H.P. Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications. arXiv 2015, arXiv:1405.4463. [Google Scholar] [CrossRef]
- Enami, N.; Moghadam, R.A.; Dadashtabar, K.; Hoseini, M. Neural Network Based Energy Efficiency in Wireless Sensor Networks: A Survey. International Journal of Computer Science & Engineering Survey 2010, 1, 39–55. [Google Scholar]
- Cheng, H.; Xie, Z.; Wu, L.; Yu, Z.; Li, R. Data prediction model in wireless sensor networks based on bidirectional LSTM. EURASIP Journal on Wireless Communications and Networking 2019, 2019, 203. [Google Scholar] [CrossRef]
- Mesin, L.; Aram, S.; Pasero, E. A neural data-driven algorithm for smart sampling in wireless sensor networks. EURASIP Journal on Wireless Communications and Networking 2014, 2014, 23. [Google Scholar] [CrossRef]
| Data delivery model | Perception | Transmission |
| Event-driven | High | Low |
| Query-driven | Low | Low |
| Continuous delivery | High | High |
| Method | Improved criteria | Advantages | Disadvantages | References |
|---|---|---|---|---|
| Data Cleaning | Depends on the cleaning scheme: completeness, correctness, volume, availability, effectiveness, accuracy, timeliness, etc. |
|
|
[19] [22] [23] [24] [25] |
| Selection and Transmission | Reliability and Redundancy |
|
|
[18] [26] [27] [28] |
| Approximation | Accuracy, Reliability and Energy efficiency | Extends the lifetime of sensor nodes by reducing the amount of data transmitted while maintaining a high level of accuracy. |
|
[29] [30] [31] |
| Compression | Entropy, Probability of error and Energy efficiency. |
|
|
[32] [33] [34] [35] [36] [37] [39] [38] [40] [41] |
| Prediction | Accuracy, Timeliness and Energy efficiency |
|
|
[42] [43] [45] [46] [47] |
| Smart Sampling | Energy efficiency |
|
|
[48] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
