ARTICLE | doi:10.20944/preprints202001.0207.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: federated learning; federated averaging; mutual information; correlation
Online: 19 January 2020 (04:37:27 CET)
Federated learning is a decentralized topology of deep learning, that trains a shared model through data distributed among each client (like mobile phones, wearable devices), in order to ensure data privacy by avoiding raw data exposed in data center (server). After each client computes a new model parameter by stochastic gradient decrease (SGD) based on their own local data, all locally-computed parameters will be aggregated in the server to generate an updated global model. Almost all current studies directly average different client computed parameters by default, but no one gives an explanation why averaging parameters is a good approach. In this paper, we treat each client computed parameter as a random vector because of the stochastic properties of SGD, and estimate mutual information between two client computed parameters at different training phases using two methods in two learning tasks. The results confirm the correlation between different clients and show an increasing trend of mutual information with training iteration. However, when we further compute the distance between client computed parameters, we find that parameters are getting more correlated while not getting closer. This phenomenon suggests that averaging parameters may not be the optimum way of aggregating trained parameters.
ARTICLE | doi:10.20944/preprints202301.0092.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: Federated Learning; Learning Analytics
Online: 5 January 2023 (02:39:04 CET)
Federated learning techniques aim to train and build machine learning models based on distributed datasets across multiple devices, avoiding data leakage. The main idea is to perform training on remote devices or isolated data centers without transferring data to centralized repositories, thus mitigating privacy risks. Data analytics in education, in particular learning analytics, is a promising scenario to apply this approach to address the legal and ethical issues related to processing sensitive data. Indeed, given the nature of the data to be studied (personal data, educational outcomes, data concerning minors), it is essential to ensure that the conduct of these studies and the publication of the results provide the necessary guarantees to protect the privacy of the individuals involved and the protection of their data. In addition, the application of quantitative techniques based on the exploitation of data on the use of educational platforms, student performance, use of devices, etc., can account for educational problems such as the determination of user profiles, personalized learning trajectories, or early dropout indicators and alerts, among others. This paper presents the application of federated learning techniques to two learning analytics problems: dropout prediction and unsupervised student classification. The experiments allow us to conclude that the proposed solutions achieve comparable results from the performance point of view with the centralized versions, avoiding centralizing the data for training the models.
ARTICLE | doi:10.20944/preprints202209.0435.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Federated Learning Strategies; Relational-Regularized Autoencoder; Time-Series Classification
Online: 28 September 2022 (09:06:58 CEST)
Increasingly measured data in the context of smart cities can be used to develop new and innovative business models to increase efficiency and the value of life. A time-series classification algorithm can support to automatize many different processes such as forecasting services. In order to ensure data security and privacy, Federated Learning trains a global model collaboratively on multiple clients. Having different data-distributions and data-quantities across participating clients, neural networks suffer from slow convergence and overfitting. Based on different data-distributions, data-quantities and number of clients, we develop and evaluate different data-clustering strategies to update global model weights in comparison to the state of the art. We use public time-series data, generate various synthetic datasets and train a Relational-Regularized Autoencoder for classification purposes. Our results show an improvement of model performance concerning generalization.
ARTICLE | doi:10.20944/preprints202209.0176.v1
Subject: Mathematics & Computer Science, Applied Mathematics Keywords: Differential privacy; Federated learning; Vertically partitioned data
Online: 13 September 2022 (10:57:53 CEST)
We present a differentially private extension of the block coordinate descent based on objective perturbation. The algorithm iteratively performs linear regression in a federated setting on vertically partitioned data. In addition to a privacy guarantee, the algorithm also offers a utility guarantee; a tolerance parameter indicates how much the differentially private regression may deviate from an analysis without differential privacy. The algorithm’s performance is compared with the standard block coordinate descent algorithm and the trade-off between utility and privacy is studied. The performance is studied using both artificial test data and the forest fires data set. We find that the algorithm is fast and able to generate practical predictions with single-digit privacy budgets, albeit with some accuracy loss.
ARTICLE | doi:10.20944/preprints201907.0310.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: EU law; federated search; document repository; network diagram
Online: 28 July 2019 (12:29:27 CEST)
We have developed an application aiming at federated search for EU and Hungarian legislation and jurisdiction. It now contains above 1 million documents, with daily updates. The database holds documents downloaded from the EU sources EUR-Lex and Curia Online as well as public jurisdiction documents from the Constitutional Court of Hungary and The National Office for The Judiciary. The application is termed Justeus. Justeus provides comprehensible search possibilities. Besides free text and metadata (dropdown list) searches, it features hierarchical data structures (concept hierarchy trees) of directory codes and classification as well as subject terms. Justeus collects all links of a particular document to other documents (court judgements citing other case law documents as well as legislation, national court decisions referring to EU regulation etc.) as tables and directed graph networks. Choosing a document, its relations to other documents are visualized in real time as a network. Network graphs help in identifying key documents influencing or referred by many other documents (legislative and/or jurisdictive) and sets of documents predominantly referring to each other (citation networks).
ARTICLE | doi:10.20944/preprints202212.0426.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Speech Recognition; Keyword Spotting; Child abuse; Federated Learning; Whisper; Wav2vec2.0
Online: 22 December 2022 (09:27:37 CET)
The growth in online child exploitation material is a significant challenge for European Law Enforcement Agencies (LEAs). One of the most important sources of such online information corresponds to audio material that needs to be analyzed to find evidence in a timely and practical manner. That is why LEAs require a next-generation AI-powered platform to process audio data from online sources. We propose the use of speech recognition and keyword spotting to transcribe audiovisual data and to detect the presence of keywords related to child abuse. The considered models are based on two of the most accurate neural-based architectures to date: Wav2vec2.0 and Whisper. The systems are tested under an extensive set of scenarios in different languages. Additionally, keeping in mind that obtaining data from LEAs is very sensitive, we explore the use of federated learning to have more robust systems for the addressed application, while maintaining the privacy of the data to LEAs. The considered models achieved a word error rate between 11% and 25%, depending on the language. In addition, the systems are able to recognize a set of spotted words with true positives rates between 82% and 98%, depending on the language. Finally, federated learning strategies show that they can maintain and even improve the performance of the systems when compared to centralized trained models. The proposed systems sit the basis for an AI-powered platform for automatic analysis of audio in the context of forensic applications within child abuse. The use of federated learning is also promising for the addressed scenario, where data privacy is an important issue to be managed.
ARTICLE | doi:10.20944/preprints201805.0079.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: decentralized access control; Internet of Things (IoT); blockchain protocol; smart contract; federated delegation; capability-based access control
Online: 3 May 2018 (13:06:09 CEST)
While the Internet of Things (IoT) technology has been widely recognized as the essential part of Smart Cities, it also brings new challenges in terms of privacy and security. Access control (AC) is among the top security concerns, which is critical in resource and information protection over IoT devices. Traditional access control approaches, like Access Control Lists (ACL), Role-based Access Control (RBAC) and Attribute-based Access Control (ABAC), are not able to provide a scalable, manageable and efficient mechanism to meet the requirements of IoT systems. Another weakness in today's AC is the centralized authorization server, which can be the performance bottleneck or the single point of failure. Inspired by the smart contract on top of a blockchain protocol, this paper proposes BlendCAC, which is a decentralized, federated capability-based AC mechanism to enable an effective protection for devices, services and information in large scale IoT systems. A federated capability-based delegation model (FCDM) is introduced to support hierarchical and multi-hop delegation. The mechanism for delegate authorization and revocation is explored. A robust identity-based capability token management strategy is proposed, which takes advantage of the smart contract for registering, propagating and revocating of the access authorization. A proof-of-concept prototype has been implemented on both resources-constrained devices (i.e., Raspberry PI node) and more powerful computing devices (i.e., laptops), and tested on a local private blockchain network. The experimental results demonstrate the feasibility of the BlendCAC to offer a decentralized, scalable, lightweight and fine-grained AC solution for IoT systems.
ARTICLE | doi:10.20944/preprints202003.0442.v1
Subject: Engineering, General Engineering Keywords: Terrain Referenced Navigation (TRN); Federated Filter; Interferometric Radar Altimeter (IRA); Batch Processing; Auxiliary Particle Filter; Digital Elevation Model (DEM); Captive Flight Test
Online: 31 March 2020 (04:29:41 CEST)
Autonomous unmanned aerial vehicles (UAVs) require highly reliable navigation information. Generally, navigation systems with the inertial navigation system (INS) and global navigation satellite system (GNSS) have been widely used. However, the GNSS is vulnerable to jamming and spoofing. The terrain referenced navigation (TRN) technique can be used to solve this problem. In this study, to obtain reliable navigation information even if a GNSS is not available or the degree of terrain roughness is not determined, we propose a federated filter based INS/GNSS/TRN integrated navigation system. we also introduce a TRN system that combines batch processing and an auxiliary particle filter to ensure stable flight of UAVs even in a long-term GNSS-denied environment. As an altimeter sensor for the TRN system, we use an interferometric radar altimeter (IRA) to obtain reliable navigation accuracy in high altitude flight. In addition, a parallel computing technique with general-purpose computing on graphics processing units (GPGPU) is applied to process a high resolution terrain database and a nonlinear filter in real time on board. Finally, we verify the performance of the proposed system through software-in-the-loop (SIL) tests and captive flight tests in a GNSS unavailable environment.