1. Introduction
Indoor localization is a key enabler for location-based services, smart homes, and ambient assisted living [
2,
3]. Device-free localization (DFL) aims to estimate the position of a person who does not carry any radio device, by inferring location from changes in radio propagation [
6,
7]. Among the various technologies, Wi-Fi-based DFL using Channel State Information (CSI) is especially appealing due to the widespread deployment of Wi-Fi and the rich spatial information encoded in CSI [
1,
2].
Most existing CSI-based DFL systems rely on commodity NICs and offline radio-map construction with extensive data collection [
2,
6,
7] Furthermore, they often assume static environments and use relatively heavy models. In contrast, low-cost ESP32 boards with CSI support offer a compact platform for practical DFL, but systematic studies targeting ESP32-based DFL with formal modeling are limited [
1,
10,
12]
1.1. Contributions
This work makes the following contributions:
- 1.
Formal fingerprinting model: We define a grid-based CSI fingerprint model where each grid cell corresponds to a location-dependent distribution of CSI features across multiple ESP32 links. The localization task is posed as both classification over grid cells and regression to continuous coordinates.
- 2.
Feature extraction and multi-link fusion: We derive a feature mapping from complex CSI to amplitude and phase descriptors and propose a simple multi-link fusion strategy that is compatible with ESP32 hardware constraints.
- 3.
Lightweight neural localization models: We design and implement compact fully connected and convolutional architectures for grid classification and coordinate regression and express their learning objectives mathematically.
- 4.
Experimental ESP32 testbed: We deploy a three-link ESP32-based DFL testbed, collect over 1.2 million CSI snapshots across 35 grid points, and demonstrate sub-meter median localization error with real-time inference on a Raspberry Pi edge node.
2. Related Work
2.1. CSI-Based Device-Free Localization
CSI-based DFL techniques broadly fall into model-based and fingerprinting categories [
6]. Model-based approaches use Fresnel zone analysis or propagation models to infer location from RSS/CSI variations [
6]. Fingerprinting approaches construct a radio map of CSI features at discrete locations and train a model to map from features to positions [
2,
4,
7]. Recent work leverages deep learning and multi-feature fusion (amplitude and phase) to improve accuracy and robustness [
2,
3,
5].
2.2. ESP32 and CSI Tools
ESP32-based CSI tools such as Wi-ESP and ESP32-CSI-Tool provide access to subcarrier-level CSI for device-free Wi-Fi sensing [
1,
10,
11]. These tools have been used for gesture recognition, presence detection, and human identification [
1,
13]. Beyond localization and kinematics, recent studies have also begun to exploit ESP32 CSI for environmental sensing, demonstrating joint occupancy, humidity, and temperature monitoring,[
8] as well as non-intrusive 2D thermal tomography [
9]. Espressif’s esp-csi repository demonstrates indoor positioning and human detection examples, but does not provide a complete, formalized DFL framework [
12]
3. Problem Formulation
3.1. Grid and Coordinate Definitions
Consider a rectangular indoor area , e.g., a 6 m × 4 m room. We discretize into a grid of M cells , each associated with a representative coordinate . In our experimental setup, grid points arranged in a 7 × 5 lattice with spacing .
Let denote the true 2D position of the person. The DFL problem is:
Classification: Given CSI-derived features , estimate the most likely grid cell index such that .
Regression: Given , estimate continuous coordinates .
3.2. Multi-Link CSI Measurement Model
We employ
L Wi-Fi links formed by ESP32 AP–STA pairs located at fixed positions
. For link
ℓ, the complex CSI at subcarrier
k and time index
n is
where
is the number of multipath components,
are complex gains,
are delays depending on person position
and time
n, and
is noise [
2,
4].
For static or quasi-static positions, we can neglect fast dynamics and represent link
ℓ at position
by a random vector
collecting CSI across
K subcarriers:
where
is the mean CSI vector at
and
is zero-mean perturbation capturing small-scale fading and noise [
4].
3.3. CSI Fingerprints and Feature Mapping
For a given person position
in cell
, we collect a window of
N CSI packets per link
ℓ, yielding
. We define amplitude and phase
Phase is sanitized via linear detrending over subcarriers for each packet [
1,
18].
For each link
ℓ, we compute per-subcarrier amplitude statistics over
:
We aggregate across subcarriers:
Similarly, we compute phase difference statistics (e.g., between adjacent antennas or subcarriers) and optionally time–frequency features (STFT energy in low-frequency bands) to capture small motions [
3,
4].
Concatenating features across links, we obtain a fingerprint vector:
where
denotes the feature mapping.
3.4. Fingerprint Distributions and Learning Objectives
For each grid cell
, we model fingerprint vectors as samples from an unknown distribution
over
:
DFL can then be viewed as learning a mapping from
to
m (classification) or to
(regression).
Classification.
We seek a classifier
approximating
Given labeled fingerprints
, we train a neural network with softmax outputs
by minimizing the cross-entropy loss
Regression.
We seek a regressor
approximating the mapping
. Given pairs
, we minimize the mean squared error (MSE)
We evaluate localization accuracy via the Euclidean distance error
and report the mean distance error (MDE) and empirical CDF of
.
4. Methods
4.1. Hardware and Deployment
We deploy three ESP32-WROOM-32 modules as Wi-Fi AP–STA pairs at fixed locations around a 6 m × 4 m laboratory. Each ESP32 runs CSI-enabled firmware (based on Wi-ESP and ESP32-CSI-Tool) in 2.4 GHz, 20 MHz mode and reports CSI for
subcarriers at approximately 100 Hz [
1,
10,
12]. Links are configured such that their Fresnel zones cover the area of interest [
6].
CSI packets are streamed via UDP over Wi-Fi to a Raspberry Pi 4 edge node for feature extraction and inference.
4.2. Data Collection Protocol
We define a grid of positions in the room. For each position, a participant stands still facing a random direction while CSI is recorded for 30 s on all links. This process is repeated for two participants and multiple sessions, resulting in more than 1.2 million CSI snapshots (after combining links). The ground-truth coordinates are measured relative to a reference origin.
CSI is segmented into overlapping windows of N packets (e.g., 100 samples per window) with a hop size corresponding to 0.5 s. Each window yields a fingerprint vector and associated cell index and coordinate .
4.3. Feature Extraction
For each link ℓ and window, we perform:
Phase sanitization of
via linear fitting across subcarriers and subtraction [
1,
18].
Amplitude normalization by the mean amplitude across all subcarriers within the window.
Computation of amplitude and phase statistics (means, variances) across time and subcarriers.
Optional STFT-based low-frequency energy features for detecting micro-motions [
2].
Features from all links are concatenated to form , with D on the order of tens to low hundreds.
4.4. Localization Models
4.4.1. Classification Network
We employ a compact fully connected neural network for grid-cell classification. Let
be the input. The network computes:
where
is ReLU, and
are learned weights and biases. Softmax outputs are
and the loss is
as defined earlier.
4.4.2. Regression Network
For coordinate regression, we use a similar network with two hidden layers and a linear output layer:
Parameters are trained by minimizing
.
4.4.3. kNN Baseline
We implement a k-nearest neighbors baseline. For a test fingerprint , we find the k nearest fingerprints in the training set (under Euclidean distance) and:
4.5. Training and Evaluation
Data are split into training, validation, and test sets by sessions to assess generalization. We report:
Inference latency is measured on the Raspberry Pi edge node.
5. Results
5.1. Classification Performance
The classification network achieves an average grid-cell accuracy of approximately 93% across the 35 cells, with most misclassifications occurring between adjacent cells. The kNN baseline with
achieves about 88% accuracy. Accuracy tends to be higher in the central region of the room and slightly lower near walls, consistent with previous fingerprinting studies [
7].
5.2. Regression Performance
The regression network achieves a median distance error of 0.45 m and mean distance error (MDE) of approximately 0.55 m on the test set. The 90th percentile error is below 0.9 m. The kNN regressor yields a median error of 0.60 m and MDE of 0.70 m under the same settings.
The empirical CDF shows that about 80% of estimates have error below 0.6 m and 95% below 1.0 m. Increasing the grid resolution (i.e., adding more cells) increases the classification difficulty but can reduce quantization error for regression.
5.3. Impact of Number of Links
We evaluate performance using one, two, and three ESP32 links. With one link, median error is approximately 1.0 m; with two links, 0.65 m; and with three links, 0.45 m. This demonstrates the value of multi-link fusion, which increases fingerprint distinctiveness and reduces spatial ambiguity [
3,
5].
5.4. Latency and Resource Usage
On a Raspberry Pi 4, feature extraction and forward pass through the regression network for a single fingerprint take about 6–8 ms, supporting localization update rates of over 50 Hz per person. The model size is on the order of a few hundred kilobytes, making it suitable for deployment in embedded edge devices.
6. Discussion
The results indicate that ESP32-based CSI DFL can achieve sub-meter median localization accuracy in a single room using only three links and a moderate number of grid points. Compared to RSSI-based ESP32 localization, which typically suffers from meter-level errors even with trilateration [
14,
15], CSI fingerprints provide finer spatial resolution [
2,
4].
Limitations include reliance on a single room environment, quasi-static positions, and a limited number of occupants. Dynamic scenarios with walking and multiple people introduce additional complexity and require temporal modeling or tracking [
16,
17]. Future work will explore online domain adaptation, fusion with inertial sensors, and leveraging automated label generation frameworks such as LoFi to reduce calibration effort [
17].
7. Conclusion
We presented a formalized and experimentally validated framework for device-free indoor localization using Wi-Fi CSI fingerprints collected by ESP32 devices. By modeling location-dependent CSI feature distributions on a grid, defining clear learning objectives for classification and regression, and implementing lightweight neural models, we achieved sub-meter median localization error in a real laboratory deployment with only three links. These findings suggest that CSI-based DFL with ESP32 hardware is a practical option for smart home, healthcare, and industrial applications where device-free tracking is desirable.
Acknowledgments
The authors thank XZent Solutions Pvt Ltd for hardware support and colleagues who helped with data collection.
Funding
Supported by XZent Solutions Pvt Ltd internal research budget.
Conflicts of Interest
The authors declare no competing interests.
Institutional Review Board Statement
Not applicable (non-identifiable position labels only).
Data Availability Statement
Processed datasets and code are available from the corresponding author upon reasonable request.
References
- Atif, M. Wi-ESP: A tool for CSI-based Device-Free Wi-Fi Sensing (DFWS). Journal of Computational Design and Engineering 2020, 7(5), 644–657. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. A novel device-free Wi-Fi indoor localization using a convolutional neural network based on residual attention. PeerJ Computer Science 2024, 10, e2471. [Google Scholar] [CrossRef]
- Li, Y.; et al. MFFALoc: CSI-Based Multifeatures Fusion Adaptive Device-Free Passive Indoor Fingerprinting Localization. IEEE Internet of Things Journal 2024, 11(13), 21853–21868. [Google Scholar] [CrossRef]
- Grishin, A.; et al. Device-Free Indoor Localization of a Person Based on Channel State Information. In Proc. IEEE; 2024, 2024. [Google Scholar] [CrossRef]
- Sun, Y.; et al. A Novel Adaptive Device-Free Passive Indoor Fingerprinting Localization Under Dynamic Environment. IEEE Transactions on Mobile Computing 2024. [Google Scholar] [CrossRef]
- Seifeldin, M.; et al. MFDL: A Multicarrier Fresnel Penetration Model based Device-Free Localization System leveraging Commodity Wi-Fi Cards. arXiv 2017, arXiv:1707.07514. [Google Scholar]
- Morshed, M. A. A. B.; et al. An Improved CSI Based Device Free Indoor Localization Using Machine Learning. In Proc. EUSIPCO; 2018.
- Chaudhari, S.; Pise, K.; Fukate, D.; et al. Joint Contactless Temperature, Humidity, and Occupancy Sensing via Wi-Fi Channel State Information on ESP32 Nodes. Research Square 2026. [Google Scholar] [CrossRef]
- Chaudhari, S.; Pise, K.; Fukate, D.; Gawande, S. Wi-Fi CSI Thermal Tomography with ESP32 Arrays: Contactless 2D Indoor Temperature Field Mapping for Smart Buildings. Preprints 2026, 2026011934. [Google Scholar] [CrossRef]
- Hernandez, S. M. ESP32-CSI-Tool: Extract Channel State Information from WiFi-enabled ESP32 Microcontroller. GitHub repository, 2019. [Google Scholar]
- Hernandez, S. M. ESP32 CSI Toolkit. 2021. Available online: https://stevenmhernandez.github.io/ESP32-CSI-Tool/</monospace>
- Espressif Systems. esp-csi: Applications based on Wi-Fi CSI. GitHub repository, 2021. [Google Scholar]
- Di Lascio, E.; et al. Wi-Fi Sensing for Human Identification Through ESP32 Devices: An Experimental Study. In Proc. IEEE; 2024. [CrossRef]
- Czigany, P.; Fodor, G. Impact of Antenna Orientation on Localization Accuracy Using RSSI-based Trilateration. In Analecta Technica Szegedinensia; 2024. [Google Scholar]
- Nigam, A. RSSI-based Indoor Localization using ESP32. GitHub repository, 2019. [Google Scholar]
- Wang, J.; et al. Leveraging Online Learning for Domain-Adaptation in Wi-Fi-Based Device-Free Localization. IEEE Internet of Things Journal 2025. [Google Scholar] [CrossRef]
- Zhang, Z.; et al. LoFi: Vision-Aided Label Generator for Wi-Fi Localization and Tracking. arXiv 2025, arXiv:2412.05074. [Google Scholar]
- Gong, T.; et al. Optimal preprocessing of WiFi CSI for sensing applications. arXiv 2023, arXiv:2307.12126. [Google Scholar] [CrossRef]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).