1. Introduction
Biometric identification has received considerable attention and is increasingly becoming a standard practice. Systems in this domain are typically divided into two main categories: unimodal and multimodal. A unimodal system utilizes a single trait to identify and recognize users, analyzing unique biological or behavioral characteristics. A multimodal system employs a combination of various biometric traits to enhance both accuracy and security. Combining data from multiple sources can significantly improve the accuracy of such systems [
1].
Biometric identification holds a significant role in the domain of person recognition and provides a crucial security measure for preventing a variety of fraudulent activities. Document identification remains heavily reliant on handwritten signatures, which are utilized extensively in the financial, administrative, and other sectors. Numerous research studies focus on handwriting analysis for verification purposes, with such approaches being more widely used than person recognition tasks. The process and movements involved in creating handwritten signatures are unique to each person, making signatures a reliable biometric for identity verification [
2]. Signature identification is effective due to the difficulty in replicating these personalized characteristics, which poses a significant challenge for forgers [
3]. To address different verification needs, two primary categories are commonly used: offline (static) and online (dynamic) [
4].
Offline verification methods have previously attracted particular interest, predominantly relying on images of signatures without additional attributes. A conventional method for acquiring signatures involves scanning various documents, including bank cheques, contracts, invoices, and financial agreements. The system proceeds to detect, extract, and analyze distinctive characteristics from the acquired image. Typical analysis involves examining the geometric characteristics of the signature, such as its shape and size ratios. In offline circumstances, however, verifying signatures is particularly challenging due to the absence of a trace of the signing procedure [
5].
As devices equipped with an expanding array of integrated sensors become more widespread, research in this domain has increasingly focused on the field of online signature verification (or identification). These methods facilitate the acquisition of dynamic signature characteristics, a task challenging to achieve solely through conventional pen-and-paper means. Here, the system utilizes a variety of devices and sensors to acquire additional data and extract more features during the signing procedure. The data encompass various aspects such as duration, pressure level, pen movements (up and down), pen tilt, signing velocity, writing trajectory, and stroke order [
6]. Together, they provide valuable insights into the intricate dynamics of the signing process. The use of dynamic features notably enhances identification reliability and yields a more effective verification outcome [
7].
In contemporary contexts, tablet computers with corresponding styluses are frequently employed for the acquisition of signatures. Moreover, devices like smartphones and smartwatches are becoming more accessible and are equipped with various types of sensors. Inspired by this observation and the findings of others, we initially developed a proof-of-concept apparatus to capture various biometric characteristics during writing on a touch-sensitive display. The potential for using this particular kind of handwriting as a method of identification arises from its connection to several characteristics, including shape, movement, and variation. These attributes exhibit distinct patterns that are unique to each individual, making the signature and the act of signing itself a representation of a person’s identity.
Research presented in this manuscript focuses on a unimodal online solution for person recognition, utilizing a single behavioral attribute, specifically handwriting. It involves an implementation of the sensor-fusion concept, distinguishing it from comparable researches that mostly depend on a singular type of sensor or device. Our goal was to introduce a larger number of sensor readings that predominantly come from standard smart devices (rather than specifically designed hardware) into the analysis of handwriting dynamics. Also, in contrast to related studies primarily focused on gathering user signatures, our approach involves the collection of diverse handwriting samples, including signatures, short sentences, words, and individual letters. Accordingly, this study investigates the feasibility of recognizing a person through various forms of handwriting, without relying exclusively on signatures. It expands the analysis to general handwriting, as signatures are learned and practiced, while natural handwriting reflects more spontaneous and unique traits [
8,
9,
10]. In addition, data on handwriting dynamics is collected not only for stylus input but also for finger input, which is still the predominant way of interacting with touchscreen devices. In this way, the research was extended to the analysis of the proposed solution with regard to the input modality.
To provide a brief summary, the following contributions are included in this work:
The paper introduces a sensor-fusion-based method for user identification via touchscreen handwriting, that goes beyond conventional signature-based systems. It incorporates additional handwriting forms for person recognition, namely sentences, words, and individual letters. This investigates whether a person can be successfully identified based on handwriting when the given input pattern is the same for all users.
A comprehensive dataset is obtained by conducting a controlled experiment with 60 participants, featuring dynamic handwriting data coming from both stylus and finger inputs. The dataset includes details across multiple sensor types, making it a valuable resource for future research in handwriting-based person recognition.
A machine learning model based on a convolutional neural network (CNN) is designed for both feature extraction and classification of handwriting. This model achieved high accuracies across different forms of handwriting. Additionally, the accuracy of the model was analyzed with regard to the effect of input modality (stylus vs. finger) and train set size.
The remaining sections of this work are structured as follows. The related work is outlined in
Section 2. An overview of the apparatus, experiment procedure, obtained dataset, and the CNN model development is provided in
Section 3. The research findings, along with the corresponding discussion, are presented within
Section 4. Finally, in
Section 5, we bring the conclusions and outline for future work.
2. Related Work
This section first presents an overview of solutions, methods, and approaches that leverage one or more distinct sensors for signature verification, distinguishing between genuine and forged signatures. It then continues with a review of studies related to person recognition through handwriting.
2.1. Handwriting Verification
mmSign [
11] and mmHSV [
12] represent online handwritten signature verification platforms using commercial mmWave radar to capture in-air hand motions. mmSign focuses on noise reduction, feature extraction, and meta-learning strategy for model adaptation using a custom dataset. mmHSV evaluates robustness against various attack scenarios by combining hand shape and writing process features with custom-collected data. PPGSign [
13] uses photoplethysmography (PPG) sensors to capture blood flow changes during hand movements for signature verification. Due to limited access to raw PPG data from commercial devices, the authors developed the PPGSign Band as a proof-of-concept and introduced a proprietary dataset of raw PPG data.
ASSV [
14] and SilentSign [
15] both use acoustic sensing to capture pen movements during the signing, with device speakers and microphones detecting signal variations. ASSV requires specific phone positioning and distance from the writing surface, while SilentSign offers more flexibility with a larger signature area. SilentSign uses traditional classifiers like logistic naive Bayes (NB), linear regression (LR), random forest (RF), and support vector machine (SVM), whereas ASSV utilizes deep learning with CNN for binary classification. Sadak et al. [
16] used pen-paper friction sounds for signature verification via mobile device microphones, converting them into features for dynamic time warping (DTW) comparison of genuine and forged signatures. Like ASSV, this solution requires no external hardware but has limitations with microphone placement and pen type. SonarSign [
17] broadcasts inaudible acoustic signals and analyzes their echoes through channel impulse response (CIR), implementing an attentional multi-modal Siamese network for verification. SVSV [
18] combines sound captured via a smartphone’s microphone and vibration data from inertial sensors (accelerometer and gyroscope) for signature verification, using a CNN-based one-class classifier.
The growing popularity of wrist-wearable devices makes them valuable for handwriting verification research. For instance, Ramachandra et al. [
19] analyzed writing patterns using smartwatch accelerometer data, processed with 2D continuous wavelet transform (CWT) and pre-trained ResNet50 features, across two different smartwatches and three writing scenarios. Li and Sato [
20] used Siamese recurrent neural network (RNN) to analyze motion sensor data from smartwatches during signing, finding that combined accelerometer and gyroscope data yields the best results. Similarly, Realme [
21] employs a MetaWear sensor with a gyroscope and accelerometer for signature verification on paper.
Multiple works explore in-air signature verification through different approaches. Guo and Sato [
22] evaluated CNN architectures for classifying 3D reconstructions of in-air signatures, with ResNet and Adagrad optimizer yielding the best performance. RNNs have also been successfully utilized to analyze gyroscope and accelerometer data [
23,
24]. AirSign [
25] combines sound and motion sensors, using smartphone components (speakers, microphones, accelerometers, and gyroscopes) to capture hand geometry, signature, and motion data. Atreya et al. [
26] developed a method using the 3D pen trajectory and a 2D spatial-temporal CNN, evaluated with the LeapMotionAirSignature [
27], SVC-2004 [
28], and custom T3AAS-v1 datasets.
Various studies focus on digital pen-based signature verification, using both commercial and custom sensor-embedded stylus systems. Seki [
29] employed the Anoto pen system, a digital tool similar to a ballpoint pen, to capture dynamic data through a built-in camera and pressure sensor. Subedi et al. proposed two sensorized stylus systems for signature verification [
30,
31]. The first uses two accelerometers to classify signatures based on grip, stroke patterns, and speed. The second employs a low-cost, 3D-printed stylus with accelerometers to capture kinematic data, classified by a multilayer perceptron (MLP). Lech and Czyżewski [
32] developed a wireless biometric pen with a three-axis accelerometer, gyroscope, surface pressure sensor, and two touch pressure sensors. It mimics a ballpoint pen, with signature verification based on six dynamic measures derived from the DTW method. Zhou et al. [
33] combined offline and online verification by collecting physical signatures and digital data with a smart pen equipped with a camera and pressure sensor.
The following paragraphs discuss additional relevant studies focusing on CNN-based approaches for signature verification, exploring various datasets and architectural configurations. Zhang et al. [
34] proposed a federated learning framework with 1-D CNN, while Ashwin Shenoy et al. [
35] employed CNNs for digital cheque clearance. Singh and Viriri [
36] employed CNNs as feature extractors with RNN classifiers, using the publicly available SigComp 2009 [
37] and SVC-2004 [
28] datasets. The model’s low validation scores were due to the limited sample sizes. Roszczewska and Niewiadomska-Szynkiewicz [
38] explored CNN-based mobile signature verification using VGG-16, SigNet, and the lightweight SigNetExt. Trained on the MobiBits dataset [
39] with data augmentation, SigNetExt showed strong robustness against skilled forgeries, proving effective for mobile-based verification. Leghari et al. [
40] proposed a deep learning approach using CNNs, applying data augmentation and leveraging a deep aggregated convolutional feature representation on a custom-built online signature repository. Zheng et al. [
41] employed a depth-wise separable convolutional neural network (DWSCNN) for signature verification, achieving similar performance to traditional CNNs while reducing parameters, resource usage, and execution time, utilizing the MCYT-100 [
42] and SVC-2004 datasets. OSVConTramer [
43] is a solution with a hybrid architecture combining CNN and Transformer for online signature verification, evaluated on the MCYT-100, SVC-2004, and SUSIG [
44] datasets, focusing on one-shot learning scenarios. Ji et al. [
45] introduced a multi-path feature fusion network combining static and dynamic handwriting data for online signature verification, using ResNet18 and a 1-D CNN, evaluated on the BiosecurID [
46] and custom CIEHD datasets.
2.2. Handwriting Identification
Considering person recognition via signature or handwriting, various studies employ CNN models (both pre-trained and custom). Li et al. [
47] used a deep CNN with ResNet-18 and a feature pyramid network (FPN) for multi-scale fusion, with the MCTOP algorithm for enhancing the recognition of small signature samples. Çiftçi and Tekin [
48] demonstrated the effectiveness of fine-tuning pre-trained CNNs for biometric signature recognition by testing five architectures. Among them, GoogLeNet and Inception-V3 achieved the highest accuracy, EfficientNet-B0 balanced efficiency and accuracy, while MobileNet-V3 Large was ideal for lightweight applications. Pokharel et al. [
49] proposed a deep learning approach for signature identification using pre-trained GoogleNet with backpropagation, automating feature extraction to reduce training time. Calik et al. [
50] proposed LS2Net, a CNN for signature recognition that improves training with batch normalization and simplifies classification with a class center-based classifier (C3), outperforming models like VGG-16 with limited training data. Culqui-Culqui et al. [
51] developed CNN-GC, a CNN algorithm designed for portable devices that use convolutional and pooling layers with ReLU activation, outperforming two baseline CNNs. Rahim et al. [
52] developed a CNN-BiLSTM model for identifying individuals through Bengali script handwriting, using the novel BHW dataset. The model outperformed standalone CNN and BiLSTM approaches by combining CNN for spatial and BiLSTM for sequential features. Hasan et al. [
53] introduced a hybrid model combining CNN, BiGRU, and BiLSTM layers for person recognition using 10 specific keywords. Poddar et al. [
54] proposed a unique integration of CNN with traditional feature extraction methods (Crest-Trough, Harris Corner Detection, and Speeded-Up Robust Features (SURF)). Gumusbas and Yildirim [
55] proposed capsule networks, outperforming CNNs in person identification tasks, particularly with lower input resolutions and limited data.
As for non-Latin scripts, Rexit et al. [
56,
57] developed a multilingual offline signature recognition system for Uyghur, Chinese, Kazakh, and Han languages, using feature fusion, principal component analysis (PCA) for dimensionality reduction, and classification with RF and k-nearest neighbors (k-NN). Mo et al. [
58] focused on Uyghur and Kirgiz signatures, classifying fused features using SVM with a radial basis function kernel. Al-Shamaileh et al. [
59] introduced an Arabic handwriting dataset for writer authentication using k-NN with DTW and SVM classifiers, while Dargan et al. [
60] applied a similar classification approach for Devanagari writers.
Regarding contactless alternatives to traditional methods, Chuen et al. [
61] introduced a contactless signature recognition system using a custom multi-scale CNN architecture with the Microsoft Kinect sensor. Khoh et al. [
62] proposed a similar approach with Kinect depth images, offering another contactless solution where SVM classification showed the best results. Ghosh et al. [
63] introduced a spatio-temporal Siamese neural network (ST-SNN) for identifying 3D signatures, combining CNN for spatial and LSTM for temporal features to enhance forgery resistance.
Several studies have explored hybrid approaches that combine feature extraction techniques and machine learning classifiers for handwriting recognition, focusing on both offline and online methods. Kette et al. [
64] found that local binary patterns (LBP) work well for small datasets but struggle with larger ones. Hezil et al. [
65] found that combining binary statistical image features (BSIF) and LBP features, along with overlapping zoning techniques, balances feature complexity and performance. Sriwathsan et al. [
66] showed that SURF features outperform scale-invariant feature transform (SIFT) for signature recognition, achieving better results with SVM and K-means clustering. Inan and Sekeroglu [
67] developed a backpropagation neural network (BPNN) with a custom signature database, testing robustness against noise and achieving notable recognition rates with Salt-and-Pepper noise. For online handwriting recognition, Leghari et al. [
68] evaluated five machine learning classifiers and found that ensemble methods, such as random forest and extra tree, outperform traditional classifiers in signature identification tasks. Akash et al. [
69] and Begum et al. [
70] collected real-time handwritten data, extracting six features from predefined keywords and phrases. They evaluated classifiers like SVM, LR, RF, and linear discriminant analysis (LDA), achieving the highest accuracies with SVM and LR.
Table 1 summarizes recent person identification studies through handwriting, detailing the different devices or sensors utilized, the handwriting forms examined, and the reported accuracy. The table emphasizes studies that, like ours, focus on person recognition (identification) through handwriting, rather than verification. While these studies typically utilize limited sensor setups centered on signatures, our research employs a broader approach through sensor fusion, capturing diverse biometric characteristics from various handwriting forms using multiple sensors.
2.3. Research Gap
Most of the reviewed works in the field of handwriting biometrics are centered on verification systems, which involve distinguishing between genuine and forged handwriting samples. Such systems represent a two-class classification problem, where a decision is made on whether a sample is authentic or forged. While this is a crucial aspect of biometric security, these studies often overlook the broader context of user recognition across various handwriting forms. Our study addresses this research gap by broadening the focus from verification to a wider range of handwriting identification tasks, a general process of detecting the handwriting owner that involves multi-class classification. Specifically, our approach extends beyond signatures for handwriting-based person recognition to include various handwriting forms, such as sentences, words, and letters. We thereby extend the research by investigating whether a person can be successfully identified based on handwriting when the given input is the same for all users. This expansion has resulted in the introduction of a newly curated dataset, consisting of measurements from different types of sensors during writing with both a stylus and a finger.
Furthermore, many of the reviewed studies rely on specialized hardware, such as custom-designed pens or complex sensor systems, which are often limited and costly. In contrast, our approach utilizes readily available off-the-shelf components and standard smart devices. This not only simplifies the experimental process but also enhances the reproducibility of the findings. Unlike most existing solutions that are mainly based on the use of a single sensor (or a rather small number of sensors), our approach implements a sensor fusion concept, with the aim of collecting and utilizing a much larger amount of handwriting dynamics information.
The proposed handwriting-based biometric system has several potential practical applications in secure authentication and user recognition. One potential application is digital identity identification in banking and financial transactions, where handwriting dynamics can provide an additional layer of security against fraud. Another potential use case is access control and device security, where handwriting recognition can serve as an alternative biometric authentication method for unlocking smartphones, tablets, or secure workstations.
This system offers an alternative or supplement to traditional authentication methods such as passwords, PINs, or fingerprint scanning. For comparison, fingerprint systems can be compromised using latent fingerprint residues left on sensors or by artificial fingerprint replication. In contrast, handwriting recognition systems analyze dynamic behavioral traits such as writing speed, pressure, and stroke order, which are more difficult to replicate or forge. Additionally, handwriting-based recognition does not suffer from the challenges associated with fingerprint wear and tear. Individuals with worn fingerprints due to aging, manual labor, or skin conditions often experience difficulty with fingerprint scanners. Since handwriting is a learned and practiced behavior, it remains relatively stable over time, providing a more consistent identification method for such users.
While signatures have traditionally been used for identifying individuals, for example in various legal and administrative systems, they usually consist of brief, well-practiced movements. This makes them particularly susceptible to forgery, especially when attackers deliberately train to replicate them. While each individual’s signature is unique, analyzing different forms of handwriting, such as sentences, provides greater insight into a person’s dynamic writing characteristics. Additionally, signatures are commonly found on paper or scanned documents, making them more accessible to attackers. In contrast, longer handwriting samples, like complete sentences or words, are less readily available and more difficult for an attacker to obtain. This supports the use of various handwriting types instead of signatures, as employing other handwriting forms as a "signature pattern" would make forgery attempts more challenging by requiring notably more effort to produce a convincing imitation.
3. Materials and Methods
3.1. Apparatus Description
The experimental apparatus within the proposed solution features a variety of sensors from different devices [
71]. These include the touch sensor and magnetometer from the tablet, the accelerometer and gyroscope from the smartwatch, the camera from the smartphone, and two external piezoelectric sensors. The components were securely mounted on a perforated
cm plate using screws, brackets, and holders to ensure consistent positioning during the experiment.
Figure 1 shows an experiment setup wherein a participant, wearing a smartwatch, uses a stylus on a touch-sensitive screen. Two external piezoelectric sensors are located beneath the tablet – one connected to the tablet and the other to the smartphone. Applications for Android and Tizen OS are developed in order to streamline device synchronization and ensure reliable data acquisition.
The primary component of the apparatus is the Samsung Galaxy Tab S6 tablet, featuring a touchscreen suitable for handwriting input via stylus or finger. The device includes the S Pen stylus, capable of detecting 4096 pressure levels based on the applied force while writing. Due to the constraints of the S Pen Remote SDK, tilt data can not be provided by default.
The tablet incorporates various sensors, and the magnetometer is one of those that is rarely used in similar research. That is why we included permanent magnets in the apparatus. Namely, a pair of neodymium magnets, each with a remanence value of
T, are utilized to disrupt the magnetic field around the tablet device when writing with a stylus or finger, thus creating a unique magnetic trail. Custom-designed 3D-printed holders secure the magnets, including a ring-shaped holder for finger-writing and a specialized holder for the stylus. Additionally, 3D-printed mounts were created for ArUco markers, square patterns that cameras detect to provide precise 3D position data [
72]. An ArUco marker was placed at the top of the stylus to acquire tilt data during writing. The output includes the X and Y coordinates of the ArUco marker readings obtained from videos. From this data, rotation and translation vectors can be obtained to determine the angle at which the user holds the stylus.
Regarding the tablet touch sensor, it operates under display featuring a resolution of pixels and a density of approximately 287 pixels per inch (ppi). It can detect inputs from both the stylus and the user’s finger, collecting various data during writing, including pressure, size, velocity, rotation, and the precise X and Y coordinates corresponding to each touch event.
When it comes to external sensory components, the tablet was equipped with a piezoelectric sensor, acting as a contact microphone, to collect more data on the pressure exerted on the screen during writing. It captures acoustic oscillations generated by the movement of the stylus or finger, providing information on sliding actions, touches, and pressure applied to the screen or the sensor itself. The audio signal is obtained through the tablet’s microphone input using an external USB sound card and a USB-C to USB-A adapter, as the tablet lacks a physical mm audio connector.
The experimental setup additionally includes a Samsung Galaxy smartphone, which was at our disposal. This device was essential because the tablet was unable to establish a connection with the smartwatch due to restrictions imposed by the manufacturer. In contrast, the smartphone enables communication with the smartwatch, allowing for the acquisition of gyroscope and accelerometer data. Thus, the smartphone plays a critical role in facilitating this connectivity. Furthermore, the smartphone’s camera sensor captures the handwriting process and allows for extracting stylus tilt data from the recorded video and ArUco marker tracking.
In addition to using the camera sensor, the smartphone complements touchscreen pressure acquisition with a second, smaller piezoelectric sensor placed beneath the tablet. This sensor is positioned on the opposite side of the first piezo sensor. The advantage of this smartphone model lies in its provision of a physical mm audio connector, a feature lacking in the majority of contemporary devices. Consequently, it was considered appropriate to utilize this feature for the connection of an additional piezoelectric sensor of a different shape, dimensions, and characteristics.
Finally, the Samsung Gear S3 Frontier smartwatch was integrated into the apparatus to collect additional data from its built-in sensors. It features an accelerometer and a gyroscope, used to capture the unique dynamics of a user’s writing style. The purpose of the smartwatch is to acquire the motion-based biometric details of the user while they are actively engaged in writing on the tablet screen. Participants were required to wear a smartphone device on their dominant hand as part of the experimental protocol, ensuring consistency in the data collection process.
3.2. Experiment Procedure
Prior to the initiation of the experiment, all participants were asked to sign an informed consent form. Each participant was given the opportunity to examine a document detailing a concise overview of the experiment, its objectives, the data to be collected, and the research goals. Afterward, the participants were engaged in the system familiarization process to understand its operational procedures. Data collection was not initiated during this phase. Following their familiarity with the apparatus, participants were presented with detailed explanations of all experiment methods.
In the beginning, participants were asked to position the smartwatch on their dominant wrist. They were asked to write on the tablet screen utilizing a stylus initially and then using their index finger. During finger-writing, participants wore a custom 3D-printed ring fitted with a neodymium magnet on their index finger. Its purpose was to interfere with the magnetic field values captured by the magnetometer. To ensure accurate measurements, the magnetometer was calibrated at the start of each trial to account for environmental noise.
Considering that handwritten signatures represent a pre-learned action for each individual, this study explores the possibility of user recognition through alternative handwriting forms. The objective is to examine the extent to which, besides signatures, the process of writing sentences, words, or letters varies from person to person, as well as its level of uniqueness for each individual. Hence, aside from obtaining handwritten signatures, the process of collecting data from participants required the writing of short sentences, words, and individual letters.
In greater detail, each respondent was tasked with reproducing their signature 25 times. Regarding the composition of sentences, they pertained to 10 sentences in total. They were primarily composed of a few short words, deliberately selected to encompass all of the letters in the alphabet. Participants were instructed to write each of the sentences a total of 5 times.
Furthermore, 10 words were selected following a methodology similar to the previously described. In this instance as well, each word was required to be written 5 times. Concerning the letters, all characters from the English alphabet were included. In addition to uppercase letters, lowercase letters were also taken into account. In contrast to sentences and words, each letter was written out 3 times.
Each handwriting form (signature, sentence, word, and letter) was collected using two input modalities: first with a stylus and then with a finger. Participants had a designated writing space on the tablet screen, varying in size based on the handwriting form currently being obtained. Throughout the process, the application displayed textual labels to indicate the current sentence, word, or letter being collected, ensuring accurate input. To begin data collection, participants activated a button start, which activated all sensors within the fusion-based setup. After finishing the writing activity, selecting the stop button halted the sensor readings and stored the data. This method ensured data collection occurred only between these two events, allowing participants to take breaks as needed. The average duration of the experiment for each participant was between 70 and 90 minutes, including breaks.
In a controlled environment, 60 participants, 12 females and 48 males, took part in the trial. Most were young individuals aged 20 to 24, primarily students, who reported regular use of modern touchscreen devices. Notably, 9 individuals were left-handed. The experiment and data collection were approved by the Ethics Committee of the University of Rijeka, Faculty of Engineering (approval no. 640-08/23-01/3). Due to privacy concerns, and following the contemporary practices regarding biometric datasets, it was decided in advance that the obtained dataset would not be publicly available for uncontrolled download. Instead, it could made available upon request, subject to a signed confidentiality agreement to ensure the protection of participant data. This agreement should specify the terms and conditions for data usage, so the dataset could be securely shared with researchers who met these requirements.
3.3. Dataset
The obtained dataset comprised data collected from the various sensors described in previous chapters during the handwriting process using either a finger or a stylus. The acquired measurements were represented in multiple formats due to the use of different sensor types.
Regarding the tablet device, the touchscreen sensor collected diverse information using the MotionEvent object in Android to report stylus or finger movement events. This included data such as touch position, contact area size and orientation, pressure level, and touch event (start of touch, end of touch, and movement during touch). Velocity data was obtained through the Velocity Tracker API, allowing measurement of writing direction and velocity of movement along the X and Y axes. For each touchscreen interaction event, supplementary information, such as the timestamp of the event, was recorded and stored. The Javascript Object Notation (JSON) format was used for storing the aforementioned data, particularly for structuring data relevant to touch events.
A screenshot was taken at the conclusion of each writing session, specifically capturing the handwriting produced by the user. The image of the input was saved in the Portable Network Graphics (PNG) format, ensuring clarity and quality preservation for future reference and analysis.
Within this framework, it was essential to ensure that the sampling rate of the additional sensors matched that of the touchscreen, as it had the lowest sampling rate. Readings from other sensors were considered only when synchronized with the touch event. For example, magnetic field readings (along the X, Y, and Z axes) from the tablet’s built-in magnetometer were stored in JSON format only when relevant touchscreen input was detected. Although the magnetometer produced significantly higher read rates than the touchscreen, only data pertinent to the specific touch event during writing was recorded. This principle was applied to all other sensors in the apparatus.
Two piezoelectric sensors were positioned beneath the tablet, each on opposite edges, to collect vibrations and pressure changes. Functioning as audio capture devices like standard microphones, the information was stored as separate audio files in the waveform audio file format (WAVE, or WAV). The captured signal contained details regarding gliding, tapping, and applying pressure to the touchscreen, thereby synchronized with the recorded handwriting data.
Concerning the smartwatch, the integrated accelerometer was utilized to measure acceleration forces, while the gyroscope captured the rate of rotation around all three axes. In this scenario, commands signaling the initiation and termination of sensor recording were relayed from the smartphone to the smartwatch. Consequently, the data was continuously acquired from the very beginning until the conclusion of the writing process. The readings were captured as distinct textual files in the internal memory of the device and later transferred to a computer using our custom script.
Regarding the smartphone device, its camera sensor captures the stylus movement during the handwriting process. Due to the limitations of the S Pen Remote SDK, tilt data could not be obtained, so recorded videos served this purpose. The recordings were trimmed according to the exact timestamps written in a separate text file, based on precise moments at the start and conclusion of the writing process. The outcome of the mentioned process consisted of MP4 video assets corresponding to individual instances of written signatures, sentences, words, or letters. These files were further utilized for ArUco marker detection using a custom Python script. Finally, the 3D translation and rotation vectors of the detected markers were separately stored in text files to determine the stylus’s inclination.
The obtained dataset is divided into train and test sets, where the test set is further split into gallery and query subsets. The train set size represents all biometric samples (recordings) from 10, 20, or 30 randomly selected participants. The size of the test set, comprising the gallery and query subsets, depends on the train set, as it is derived from the remaining participants (50, 40, or 30, respectively). Depending on the input modality (stylus or finger), handwriting form (signature, sentence, word, letter), and train size, datasets were generated and observed independently. For the sentence and word handwriting forms, the gallery subset comprised the second and fourth samples, while the query subset contained the first, third, and fifth samples. For the letter handwriting form, the second sample was included in the gallery subset, while the first and third samples were assigned to the query subset. Since there were 25 samples for signatures, the assignment to gallery and query subsets was determined using their ordinal positions after applying a modulo operation with the value 5. This facilitated the same selection pattern applied to the sentence and word samples, ensuring a consistent and systematic distribution across the gallery and query subsets.
The sensor data was organized into 24 distinct vectors, each containing specific measurements from a sensor during handwriting sessions. As an illustration, readings from the magnetometer along the X, Y, and Z axes formed three vectors, while six vectors were created from the translation and rotation values of ArUco markers along the same axes. Furthermore, each of the 24 vectors was organized into 6 distinct categories, referred to as sensor subsets in the text. This categorization was introduced to simplify the analysis and was based on the type of data and the sensors used to obtain them. Each category includes specific measurement parameters, as follows:
touchscreen: Includes touch positions (X, Y) and touch velocities (in X and Y directions) while writing.
magnetometer: Measurements of magnetic field along the X, Y, and Z axes.
input specific: Tracks stylus tilt, stylus pressure, and finger touch size.
piezos: Obtained piezoelectric data from two piezoelectric sensors connected to tablet and smartphone.
smartwatch: Gathers rotational data and acceleration data across three axes from gyroscope and accelerometer.
visual tracking: Monitors translation and rotation of a stylus using an ArUco marker.
3.4. Data Preprocessing
Handwriting samples were collected while users wrote on the tablet screen, represented as a series of data points. Since multiple sensors are utilized, each data point corresponds to measurements from different internal and external sensors collected during the writing process. Consequently, every point within the handwriting sample has associated sensor readings that capture a range of writing characteristics, including pressure, velocity, magnetometer readings, gyroscope data, and accelerometer measurements, among others.
The initial data preprocessing step involved loading data for the 24 different vectors in the dataset. The length of data (i.e., the number of samples) for each vector varied due to factors like writing speed, the complexity of the stroke, or hardware sampling rate. Therefore, the next action was to resample the vectors to ensure that each one has a consistent number of samples, denoted as . This adjustment simplifies the process of feeding the data into machine learning models later. The value of is determined by selecting the minimum number of samples obtained from all sensors. In this specific context, it corresponds to the quantity acquired from the tablet’s touchscreen. The maximum lengths of vectors per collected handwriting form are as follows: signature 613, sentence 777, word 607, and letter 130. Based on the number of samples in the dataset, a predefined maximum size (MAX_SIZE) of 1024 samples was selected, resizing each feature vector to the same number of data points.
Since different sensors generate data in varying ranges, vectors were normalized to standardize the values. Min-max normalization of the data was conducted, where minimum and maximum values were specified for each vector, according to:
In the last phase, right padding is applied to ensure all vectors have the same length. This involves expanding the vectors by adding zeros to the right side of the samples until the maximum size is reached.
3.5. Feature Extraction
The training process for feature extraction entails focusing on the train subset to extract pertinent features. For each of the specified 24 vectors, a distinct feature extractor is trained.
As illustrated in
Figure 2, the structure consists of three convolutional layers, each with
kernel size of 3,
stride and
padding is set to 1. The convolutional layers are followed by the rectified linear unit (ReLU) activation layer, the batch normalization layer, and the dropout layer with a probability of
. These regularization techniques were applied to the CNN model to prevent overfitting. Finally, there is a fully connected linear layer that reduces the number of features from 1024 to 128.
In the training phase, a simple classifier was utilized with a single fully connected linear layer, whose output size corresponds to the train set size. This linear layer was followed by a softmax layer that converts a vector of arbitrary real values into a probability distribution, enabling the classification of handwriting samples.
Also, during the training phase, a cross-entropy loss was used with the adaptive moment estimation (Adam) optimizer to effectively classify handwriting, minimize classification errors, and optimize model parameters [
73]. The values of the hyperparameters were determined by empirical fine-tuning, with the
learning rate set to
and the
weight decay set to
. The training was performed for 1000 epochs with a
batch size of 32. To facilitate the training process, the Nvidia RTX 4090 GPU with 24 GB VRAM was utilized.
3.6. Classification
Following the training of the feature extractors, we proceeded to freeze all layers of each respective feature extractor. By employing this approach, the weights within the feature extractor network remain unchanged during the classifier training process. In the context of classifier training, we utilize the gallery subset and subsequently assess its performance on the query subset.
Regarding classifier models, the feature vectors derived from each of the N feature extractors are concatenated vertically to form a matrix with dimensions of . The matrix in question consists of N rows, each pertaining to an individual feature extractor, and 128 columns designated for every feature vector sample. In this context, the number N depends on the number of sensors considered; when all sensors are taken into account, N equals 24.
Figure 3 illustrates the classifier model employed within the scope of our research. It consists of a single convolutional layer, with a
kernel size of 3, and both
stride and
padding set to 1. After the convolutional layer, a linear layer of size 128 is sequentially followed by a ReLU activation layer, a dropout layer with a probability of
, and a batch normalization layer. Finally, the architecture concludes with a fully connected linear layer, whose outputs are utilized within the loss function, specifically the
softmax function for classification.
4. Results and Discussion
To address the inherently non-deterministic nature of neural networks and their training process, both the feature extractor and classifier were trained and evaluated across 30 repeated runs. The recorded accuracies from these 30 runs were then averaged, and their standard deviation was calculated.
The mean values and standard deviations for model accuracies are depicted in
Figure 4. The highest scores were predictably achieved for signatures, whether written with a stylus or a finger, even when using the smallest train set size. This can be attributed to the unique and distinctive characteristics of signatures as a form of handwriting [
74,
75,
76]. The repeated strokes and patterns in signatures help the model learn and identify these unique features more effectively. On the other hand, all participants wrote the same sentences, words, and letters. Since each person’s signature was unique, creating noticeable inter-person differences, this explains why accuracies were highest specifically for signatures.
In all instances, higher accuracies were attained when participants wrote words as opposed to sentences. Although writing sentences provides more dynamic information than words, intra-person variability in sentence writing introduces noise, making it harder to distinguish between individuals. This could be attributed to possible variations in spacing between words in a sentence, a variability absent when only single words are written exclusively. Since writing sentences takes longer and requires more space, it is reasonable to expect some variation between different attempts by the same participant. When focusing on words, people could experience less cognitive load [
77], allowing them to concentrate better on accurately forming each word. Additionally, writing shorter words repetitively could lead to better consistency in execution compared to writing longer, more complex sentences that might require greater mental processing and planning [
78].
The lowest accuracies were observed in relation to letters across both input modalities. This can be attributed to how writing individual letters takes noticeably less time than writing sentences and words, resulting in a smaller amount of data collected for analysis. The limited information collected when writing individual letters makes distinguishing between individuals more difficult. With less data, finding enough distinguishing features for the model becomes challenging, especially with the less precise finger input.
The increased accuracy of using a stylus over a finger can be attributed to the greater precision and control it provides. A stylus enables finer motor control, leading to more accurate and consistent writing [
79]. Writing with a pen-like tool feels more natural for many individuals, as it simulates the experience of writing on paper. Additionally, using a stylus can reduce screen smudging and enhance visibility, which can be an issue when writing with fingers. The lower accuracy associated with finger input can be attributed to the reduced precision inherent in using a finger on a touchscreen (a well-known fat-finger syndrome). In contrast to a stylus, fingers are wider and less stable, which results in less consistent and more variable input [
80].
4.1. Main Effects
A three-way () repeated measures ANOVA was employed to investigate the effects of the following within-subjects factors on the model’s accuracy: input modality (stylus, finger), train set size (), and collected handwriting forms (signatures, short sentences, words, and individual letters). The Greenhouse-Geisser adjustment () was applied in cases where violations of sphericity had been observed. In cases where a statistically significant effect was found, post hoc pairwise comparisons were performed using the Bonferroni correction.
Mauchly’s test indicated that the assumption of sphericity had been violated solely for the variable related to the collected handwriting form: .
The results of the RM ANOVA test revealed significant main effects as follows:
Input modality – Model’s mean accuracy differs statistically significantly between two input modalities, , ,
Train set size – A size of the train set significantly affects the accuracy of the model, , , .
Handwriting form – Significant main effect was observed for collected handwriting form as well, , , , .
Hence, regarding input modality, stylus-based handwriting results in a higher model performance () when compared to finger-based input (), with a statistically significant difference (). Writing with a stylus seems to result in more consistent handwriting strokes and patterns across repeated writing attempts. In contrast, finger-writing on a touchscreen appears to introduce greater variability, possibly due to less precision and control over the writing movements.
When it comes to the train set size, post hoc analysis revealed that the accuracy of the model increases significantly with the larger train set size. Specifically, the model proved to be significantly more accurate when a train set size of 30 was utilized (), compared to cases with train set size of 20 () and 10 () – with for both pairwise comparisons. The difference in accuracy with regard to the train set sizes of 10 and 20 also proved to be statistically significant (). This aligns with initial expectations, as a larger dataset helps mitigate overfitting by preventing the model from memorizing specific examples in the train set.
With regard to the collected handwriting form, the model exhibits the best accuracy when signatures are employed (), followed by other handwriting forms in this particular order: words (), sentences (), and individual letters (). Post hoc analysis revealed a significant difference in all pairwise comparisons observed herein (with in all cases).
In terms of comparing with other studies, a
Table 1 in
Section 2 presents a list of relevant studies in this domain. Instead of focusing on verification systems like most studies in the related work, the table highlights research that, like ours, identifies individuals through handwriting. Alongside the compilation of utilized sensors, the table displays the handwriting forms used for user recognition, as well as the reported accuracy from the related studies. The comparison might be more meaningful when limited to studies involving online (dynamic) handwriting, specifically those using smartphones or tablets. In relation to similar research, our solution justifies the introduction of a greater number of sensors for person recognition via handwriting. The proposed solution has attained high accuracy for signature recognition (
), which is adequately comparable to the other studies examined. In addition, it has demonstrated high accuracy levels in recognizing words (
) and short sentences (
), with a lower accuracy for individual letters (
). The obtained results support the idea of utilizing various handwriting forms, thus going beyond signatures only, as possible alternatives for user identification. We also provide an analysis in relation to the input modality, by comparing handwriting recognition with regard to input with a stylus or a finger. This approach enables a more comprehensive assessment of an individual’s writing style on a touchscreen device.
4.2. Ablation Study
The aim of the ablation study is to examine the impact of each of the six sensor subsets on the accuracy of handwriting-based person recognition. This process involves removing one sensor subset at a time to assess its influence on the accuracy of the model with the remaining subsets. For instance, if the exclusion of a particular sensor subset results in an increase in model accuracy, it suggests that this subset has a negative impact on user identification in the observed context. Hence, the model training is additionally conducted for six setups independently, each of which had five active sensor subsets. Results concerning the attained accuracies are presented in
Table 2. In the respective procedure, the train set size was fixed to 30, as this set size was shown to produce the highest accuracies. Obtained accuracies are provided for different input modalities (stylus or finger) and distinct handwriting forms (signatures, sentences, words, letters). Each row corresponds to scenarios in which a specific sensor subset was disabled, except for the last row, which includes results when all subsets within the fusion were active. From the table data, it is evident that the exclusion of certain sensor subsets impacts accuracy differently across the observed categories.
For recognizing a signature entered with a stylus, the model achieves its highest accuracy () when either the touchscreen or piezo sensors are excluded. This outcome suggests that these sensors do not enhance signature recognition when using a stylus, possibly due to the characteristics of signature dynamics, which involve a pre-learned action. In contrast, when finger input is employed, the model reaches peak accuracy () when the piezo sensor is excluded, indicating its negative impact on accuracy in this scenario. When all sensors are included in the fusion, the model’s accuracy slightly decreases for both stylus () and finger () inputs, highlighting that the touchscreen and piezo sensors may introduce noise or redundancy rather than improve recognition. Regarding the impact of magnetic field measurements, we can see that the exclusion of the magnetometer results in a slight decrease in accuracy for both stylus and finger inputs. This indicates that the magnetometer captures distinct patterns during the signing process, irrespective of whether a permanent magnet is mounted on the stylus or the finger during writing.
In stylus-based sentence writing tasks, the model achieves its best accuracy () when the full fusion setup is active. This outcome suggests that a multi-sensor approach contributes valuable information for sentence recognition with stylus input, which involves more complex and extended strokes compared to signatures. Conversely, when relying on finger input data only, the model records its highest accuracy () when input-specific sensors are excluded. The reason for this may be the lack of data in the input specific sensor subset when writing with a finger. Specifically, while stylus input involves data on tilt and pressure level, these are not available for finger input and are substituted with zeros. Consequently, this introduces noise and variations into the analysis and contributes to reducing the model’s accuracy. Interestingly, in sentence-based finger input context, excluding the magnetometer measurements reduces the model’s accuracy to , underscoring its importance in capturing valuable data in handwriting dynamics.
The recognition of written words shows a similar trend, where the model reaches its highest accuracy for stylus input of when all sensors are active. This outcome reaffirms the importance of sensor fusion, this time for word recognition tasks. In comparison, the model based on finger input achieves its best performance () when the readings from the piezo sensor are excluded. As the same observation also applies to finger-based signatures, we can assume that the piezo sensor introduces noise. This could potentially reduce its effectiveness in capturing the tactile details of finger movements and, consequently, lead to inconsistencies in finger-based word recognition. Similarly to sentence writing, excluding the magnetometer or smartwatch sensors (accelerometer and gyroscope) notably reduces the model’s accuracies, particularly for finger input, highlighting their importance in word classification tasks.
Letter recognition shows a slightly different pattern, with the model reaching its highest accuracy for stylus input () when all sensors are included. Here, the fusion of sensors also demonstrates its benefits, indicating that detailed multi-sensor data is essential for interpreting finer handwriting details. In contrast, the finger-input-based model achieves its peak accuracy () when the visual tracking sensor is excluded. These findings are expected, as the ArUco marker was only utilized during stylus writing to detect tilt, whereas this was not the case for finger input. Consequently, missing samples in such a scenario can introduce noise and inconsistencies in the analysis. Notably, accuracies for letter classification remain relatively lower across most sensor subsets, indicating that letters are inherently more challenging for accurate classification. This is possibly due to the very subtle differences when writing certain letters, the short time of writing, and the similarity of shapes used in forming certain letters.
Overall, the obtained results suggest that the best sensor fusion subset is highly dependent on both the handwriting form and the input modality. When all sensors are included, the model’s accuracy improves for recognizing sentences, words, and letters that are entered using a stylus. This highlights the benefit of utilizing a comprehensive multi-sensor setup. Hence, it can be concluded that including additional sensors can certainly enhance the accuracy of the model, thereby justifying their integration into the experiment apparatus.
5. Conclusions
In a controlled experiment focused on the acquisition of biometric characteristics of touchscreen handwriting, a total of 60 participants were involved. As opposed to similar studies which usually involve only one sensor or specifically designed electronic device, our approach utilizes an apparatus able to collect data based on the sensor fusion concept. Additionally, rather than concentrating on verification like most related studies, this research centers on identifying individuals through various forms of handwriting. Besides signatures, the focus was also on gathering handwritten sentences, words, and individual letters. The goal was to investigate alternative methods for person recognition beyond relying solely on signatures, typically considered pre-learned actions. Signatures may be susceptible to forgery, especially when skilled attackers practice replicating them. They are also commonly found on paper or scanned documents, making them easily accessible to an attacker. In contrast, longer handwriting samples, like sentences, are less available and more difficult to obtain. This supports using various handwriting types instead of signatures, as analyzing different forms of handwriting provides greater insight into a person’s dynamic writing characteristics. Employing these as an alternative "signature pattern" would make forgery attempts more challenging, requiring notably more effort to create a convincing imitation. The research was therefore extended to explore the possibilities of identification by conventional handwriting. The outcome of the conducted experiment is an original dataset comprising readings from various sensors (camera, accelerometer, gyroscope, magnetometer, touchscreen, and two piezo sensors), obtained during both a stylus and a finger input.
To facilitate handwriting recognition, a CNN-based model was developed. The accuracy of the model was analyzed with regard to the influence of three different factors: train set size, input modality, and handwriting form. During the model training process, train set sizes of 10, 20, and 30 were utilized. The obtained results demonstrate a statistically significant effect of set size on the model’s accuracy, with the highest accuracies achieved with the largest train set size.
The statistical analysis also revealed a significant difference in the model’s accuracy when writing with a stylus compared to a finger. In line with the [
81], finger-writing prioritizes large but less accurate motions, while stylus usage enhances precision and ensures consistency. Following the outcomes of our research, higher accuracy was consistently achieved for recognizing stylus-based handwriting. Experiment participants confirmed the stylus provides greater precision compared to the finger, offering a writing experience similar to a conventional pen-and-paper context.
Among the different handwriting forms observed, the highest accuracy was achieved for signatures (). This outcome is expected, given that each signature represents a pre-learned action unique to a particular participant. However, recognition accuracy was shown to be at a very good level for other handwriting forms as well, especially for words (). The model has demonstrated a statistically significant advantage in accuracy when words were entered, compared to both sentences () and letters (). This is because sentences typically have inconsistent spacing and demand more time and cognitive effort to compose. Writing shorter words repetitively enhances muscle memory, reducing mental processing and planning, which enhances focus on forming words consistently. The model’s lowest accuracy, achieved for recognizing individual letters, can be attributed to the relatively small number of strokes involved in writing letters, consequently resulting in a significantly lower amount of collected sensor data.
To address the limitations of the S Pen Remote SDK and the lack of direct connectivity between the tablet and smartwatch, alternative devices could be utilized that allow access to stylus data and enable direct connections. This would also reduce increased battery consumption from using the smartphone’s camera for visual tracking purposes.
Future work may focus on the consistency of biometric features, by conducting a longitudinal study to investigate how an individual’s handwriting characteristics evolve over time. Such research would examine how factors such as aging, changes in writing habits, and physical conditions impact handwriting dynamics. Future studies could also explore handwriting biometrics for different languages and scripts. Collecting samples from unique character sets would help assess the universality of handwriting recognition techniques and determine if adjustments are needed for non-Latin scripts.
Additionally, it is possible to investigate the effectiveness of the proposed solution when introducing new sensors or devices to the existing fusion-based apparatus. For instance, incorporating motion capture devices or electromyography (EMG) sensors could facilitate a deeper analysis of muscle activity during the handwriting process. Optical or infrared sensors can also be used to detect hand movements. These sensors may capture subtle biometric markers that could possibly improve the model’s ability to distinguish between individuals. Moreover, it is possible to experiment with different placements of piezoelectric sensors to identify the best positions for capturing pressure and vibrations on the tablet screen. Also, various models of piezoelectric sensors could be included to test their performance in this context.
Finally, another aspect worth considering is the potential practical application of the proposed solution in different scenarios that, from a security point of view, either support or require handwriting-based authentication. For instance, it could be used for digital identity identification in banking and financial transactions, adding an extra layer of protection against fraud. Additionally, it can serve as an alternative biometric method for device security, such as unlocking smartphones, tablets, or secure workstations.