Computer Science and Mathematics

Sort by

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Soushi Futamura

,

Tomohiro Fukuda

Abstract: In built environmental design, incorporating building user participation and verifying indoor thermal performance at early design stages have become increasingly important. Although Computational Fluid Dynamics (CFD) analysis is widely used to predict indoor thermal environments, its results are difficult for non-expert stakeholders to interpret, even when visualized using Mixed Reality (MR). Interpreting CFD visualizations in MR requires quantitative reasoning that explicitly cross-references visual features with legend information, rather than relying on prior color–value associations learned from natural images. This study investigates the capability of Vision–Language Models (VLMs) to interpret MR visualizations of CFD results and respond to user queries. We focus on indoor temperature distributions and airflow velocities visualized in MR. A novel dataset was constructed, consisting of MR images with CFD results superimposed onto real indoor spaces, paired with domain-specific question–answer annotations requiring legend-based reasoning. Using this dataset, a general-purpose VLM (Qwen2.5-VL) was fine-tuned. Experimental results show that the baseline model achieved less than 30% accuracy, whereas fine-tuning improved accuracy to over 60% across all categories while largely preserving general reasoning performance. These results demonstrate that domain adaptation enables VLMs to quantitatively interpret physical information embedded in MR visualizations, supporting non-expert understanding in built environmental design.

Article
Computer Science and Mathematics
Computer Vision and Graphics

Yongqi Shi

,

Ruopeng Yang

,

Changsheng Yin

,

Yiwei Lu

,

Bo Huang

,

Yongqi Wen

,

Yihao Zhong

,

Zhaoyang Gu

Abstract: Unsupervised change detection (UCD) from heterogeneous bitemporal optical–SAR imagery is challenging due to modality discrepancy, speckle/illumination variations, and the absence of change annotations. We propose MV-S2CD, a vision foundation model (VFM)-based framework that learns a modality-bridged latent space and produces dense change maps in a fully unsupervised manner. To robustly adapt pretrained VFM priors to heterogeneous inputs with minimal task-specific parameters, MV-S2CD incorporates lightweight modality-specific adapters and parameter-efficient low-rank adaptation (LoRA) in high-level layers. A shared projector embeds the two observations into a common geometry, enabling consistent cross-modal comparison and reducing sensor-induced domain shift. Building on the bridged representation, we design a dual-branch change reasoning module that decouples structure-sensitive cues from semantic-consistency cues: a structure pathway preserves fine boundaries and local variations, while a semantic-consistency pathway employs reliability gating and multi-scale context aggregation to suppress pseudo-changes caused by modality-specific nuisances and residual misregistration. For label-free optimization, we develop a difference-centric self-supervision scheme with two perturbation views and reliability-guided pseudo partitioning, jointly enforcing pseudo-unchanged invariance, pseudo-changed/unchanged separability, and sparsity and edge-preserving regularization. Experiments on three heterogeneous optical–SAR benchmarks demonstrate that MV-S2CD consistently improves the precision–recall trade-off and achieves state-of-the-art performance among unsupervised baselines, while remaining backbone-flexible and efficient.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Oraya Sooknit

,

Jakkarin Suksawatchon

,

Ureerat Suksawatchon

Abstract: Next Point-of-Interest (POI) recommendation aims to predict a user’s next location based on historical check-in data. However, real-world check-in records often contain uncertain check-ins, in which ambiguous spatial, temporal, or behavioral information obscures true mobility patterns and degrades prediction accuracy. To mitigate this issue, this study first learns user preferences from historical trajectories and adjusts transition importance based on temporal and spatial proximity, before modeling transition relationships using three complementary features: category, spatial area, and routine/non-routine behavior patterns. Based on transition probability analysis, feature-level dependencies in user mobility are systematically examined. The results indicate that these transition features contribute unequally to prediction performance, with area-based transitions being the most effective when considered individually. Nevertheless, their integration consistently yields the highest accuracy, highlighting the importance of transition-aware modeling. Experiments on two real-world datasets demonstrate that the proposed framework outperforms state-of-the-art methods in terms of Recall and NDCG, confirming the effectiveness of the proposed approach.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

WoonGi Bin

,

SangHyuk An

,

WooZoo Chung

Abstract: In this paper, we present a deep neural network–based approach for computing radar cross section (RCS) over a wide frequency band and a broad range of incident angles.The proposed network, termed WBRCS-Net, is designed to converge to the solution of the method of moments (MoM) formulation by minimizing a mean-squared residual loss without explicitly solving the MoM linear system, thereby avoiding the numerical instabilities commonly encountered in conventional iterative solvers. Moreover, by using only the frequency and incident angle as inputs, WBRCS-Net enables wideband RCS prediction over a broad range of incident angles while substantially simplifying the network architecture. The performance of WBRCS-Net is evaluated on perfectly electrically conducting (PEC) spheres and cubes and compared with the Maehly approximation based on Chebyshev polynomials. Experimental results show that, once trained, WBRCS-Net provides accurate and stable wideband RCS computations over a wide range of incident angles with instantaneous inference speed, highlighting a key advantage of the neural network–based approach.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Aniket Deroy

Abstract: The scarcity of high-quality, labeled audio data for legal proceedings remains a significant barrier to developing robust speech-to-text and speaker diarization systems for the judiciary. This paper in- troduces Deepcounsel, a high-fidelity synthetic speech dataset simulating courtroom environments. Utilizing a multi-agent system powered by the Gemini 2.5 Pro model, we orchestrated complex interactions between eleven distinct roles, including judges, attor- neys, witnesses, and court staff. By leveraging native multimodal generation, Deepcounsel provides a diverse range of legal termi- nology, emotional prosody, and multi-speaker overlaps. Our results demonstrate that synthetic datasets generated via multi-agent Large Language Models (LLMs) can serve as a viable proxy for training specialized legal AI models where real-world data is restricted by privacy laws.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Thiago Q. Oliveira

,

Leandro A. Carvalho

,

Flávio R. C. Sousa

,

João B. F. Filho

,

Khalil F. Oliveira

,

Daniel A. B. Tavares

Abstract: Background: Sepsis remains a leading cause of mortality in Intensive Care Units (ICUs) worldwide. Machine learning models for clinical prediction must be accurate, fair, transparent, and reliable to ensure that physicians feel confident in their decision-making process. Methods: We used the MIMIC-IV, version 3.1, database to evaluate several machine learning architectures, including Logistic Regression, XGBoost, LightGBM, LSTM (Long Short-Term Memory) networks and Transformer models. We predicted three main clinical targets: hospital mortality, length of stay, and septic shock onset. Model interpretability was assessed using Shapley Additive Explanations (SHAP). Results: The XGBoost model demonstrated superior performance in prediction tasks, particularly for hospital mortality (AUROC 0.874), outperforming traditional LSTM networks, transformers and linear baselines. Importance analysis of the variables confirmed the clinical relevance of the model. Conclusions: While XGBoost and ensemble algorithms demonstrate superior predictive power for sepsis prognosis, their clinical adoption necessitates robust explainability mechanisms to gain the doctors trust.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Linghui Ye

,

Qingbing Sang

,

Zhiyong Xiao

Abstract: Reliable visual characterization of food composition is a fundamental prerequisite for image-based dietary assessment and health-oriented food analysis. In fine-grained food recognition, models often suffer from large intra-class variation and small inter-class differences, where visually similar dishes exhibit subtle yet discriminative differences in ingredient compositions, spatial distribution, and structural organization, which are closely associated with different nutritional characteristics and health relevance. Capturing such composition-related visual structures in a non-invasive manner remains challenging. In this work, we propose a fine-grained food classification framework that enhances spatial relation modeling and key-region awareness to improve discriminative feature representation. The proposed approach strengthens sensitivity to composition-related visual cues while effectively suppressing background interference. A lightweight multi-branch fusion strategy is further introduced for stable integration of heterogeneous features. Moreover, to support reliable classification under large intra-class variation, a token-aware subcenter-based classification head is designed. The proposed framework is evaluated on the public FoodX-251 and UEC Food-256 datasets, achieving accuracies of 82.28% and 82.64%, respectively. Beyond benchmark performance, the framework is designed to support practical image-based dietary analysis under real-world dining conditions, where variations in appearance, viewpoint, and background are common. By enabling stable recognition of the same food category across diverse acquisition conditions and accurate discrimination among visually similar dishes with different ingredient compositions, the proposed approach provides reliable food characterization for dietary interpretation, thereby supporting practical dietary monitoring and health-oriented food analysis applications.

Article
Computer Science and Mathematics
Computer Networks and Communications

Saio Alusine Marrah

,

Jiahao Wang

,

Koroma Abu Bakarr

,

Gibrilla Deen Kamara

,

Ryvel Timothy Stamber

,

Ologun Sodiq Babatunde

,

Mabel Ernestine Cole

Abstract: This paper presents a deep learning-based adaptive sensor fusion framework for re-al-time control and fault-tolerant automation in Industrial IoT systems. The core of the framework is an attention-based CNN-Transformer model that dynamically fuses het-erogeneous sensor streams; its interpretable weighting signals are leveraged directly for fault detection and to inform a supervisory control policy. By dynamically weighting multiple heterogeneous sensor streams using an attention-based CNN-Transformer architecture, the proposed method reduces estimation error under noisy and fault-prone conditions, and seamlessly integrates with a closed-loop controller that adjusts to detected faults through a stability-aware supervisory policy. Experiments on synthetic IIoT data with injected transient faults demonstrate significant improvements in fusion accuracy (RMSE: 0.049 ± 0.003 vs 0.118 ± 0.008 for Kalman filter, p < 0.001), faster fault detection (F1-score: 0.89 ± 0.02) and recovery (1.1 ± 0.2 seconds), and hard real-time performance suitable for edge deployment (99th percentile latency: 58ms). The results show that the proposed approach outperforms classical baselines in terms of RMSE, detection F1-score, recovery time, and latency trade-offs. This work contributes to more reliable, adaptive automation in industrial settings with minimal manual tuning and empirical stability validation.

Technical Note
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Walter Chen

,

Kieu Anh Nguyen

Abstract: Root mean squared error (RMSE) and mean absolute error (MAE) are among the most widely used performance metrics in machine learning and scientific modeling. Although their mathematical relationship is well established, misunderstandings and misapplications of these metrics continue to appear in the literature. This technical note revisits the fundamental bounds relating RMSE and MAE and identifies a systematic error in a recently published paper in Artificial Intelligence Review, in which RMSE values are numerically smaller than the corresponding MAE values, a relationship that is mathematically impossible. Notably, these incorrect RMSE and MAE values are reported alongside other cited results within the same study that correctly satisfy the inequality RMSE greater than or equal to MAE. In addition, supplementary experiments using two common and straightforward machine learning models, Random Forest and XGBoost, demonstrate that comparable or superior performance can be achieved in several of the same datasets used in the aforementioned paper without resorting to highly complex optimization frameworks. Collectively, these findings underscore the importance of verifying the correctness of basic performance metrics and of contextualizing claimed performance gains through transparent baseline comparisons in machine learning evaluation.

Article
Computer Science and Mathematics
Computer Science

Chimeng Ly

Abstract: Pathfinding and target selection algorithms play a critical role in real-time strategy and mobile games, directly influencing gameplay balance, fairness, and player skill expression. Unlike traditional shortest-path algorithms such as A*, many commercial games intentionally employ simplified or constrained pathfinding to preserve strategic depth. This paper presents a modeling and experimental analysis of troop pathfinding and target selection behavior inspired by Clash of Clans, with a focus on Clan War attack scenarios involving troop movement toward Town Halls and defensive structures. Since the internal implementation of Clash of Clans is proprietary, this study proposes a behavioral approximation model based on observable in-game mechanics. A hybrid algorithm combining greedy nearest-target selection, local obstacle-aware movement, and priority-based cost functions is designed and evaluated. Multiple simulated base layouts with varying densities and wall configurations are tested. Results show that intentionally non-optimal pathfinding enhances game balance, prevents deterministic outcomes, and promotes strategic base design. The study follows the IMRAD structure and applies standard experimental game AI research methodologies.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Pamela Cuce

Abstract: We introduce Subsumption Pattern Learning (SPL), a hierarchical multi-agent framework that transforms collections of autonomous AI agents into a self-distilling swarm intelligence through shared collective memory. SPL adapts Brooks’ subsumption architecture from behavioral robotics to foundation model economics, implementing a formally-defined three-layer hierarchy (Reactive, Tactical, Deliberative) where learned patterns are distilled into a centralized Shared State via explicit inhibition signals. We provide a complete mathematical formalization of the pattern distillation process, defining state transitions from deliberative reasoning to tactical reflexes through confidence-bounded suppression logic. Our framework unifies three previously disparate research streams: subsumption control from robotics, social learning theory from cognitive science, and swarm intelligence from distributed systems. We present rigorous empirical evaluation on a benchmark of 100,000 heterogeneous enterprise tasks, demonstrating 5–15× cost reduction per agent with an additional 40% reduction in foundation model escalations across coordinated multi-agent networks. Ablation studies confirm that cost savings preserve accuracy within 1.3% of baseline. We formalize the intelligence compounding phenomenon, proving that collective competency grows logarithmically with processed requests under mild assumptions. SPL provides a principled path toward AI systems that grow more intelligent with every transaction while maintaining robustness through decentralized resilience.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Vanesa Gómez-Martínez

,

David Chushig-Muzo

,

Cristina Soguero-Ruiz

Abstract: Deep Learning (DL) models have demonstrated strong performance in dermatological applications, particularly when trained on dermoscopic images. In contrast, tabular clinical data—such as patient metadata and lesion-level descriptors—are difficult to integrate into DL-based pipelines due to their heterogeneous, non-spatial, and often low-dimensional nature. As a result, these data are commonly handled using separate classical machine learning (ML) models. In this work, we present a proof-of-concept study that investigates whether dermatological tabular data can be transformed into two-dimensional image representations to enable convolutional neural network (CNN)-based learning. To this end, we employ the Low Mixed-Image Generator for Tabular Data (LM-IGTD), a framework designed to transform low-dimensional and heterogeneous tabular data into two-dimensional image representations, through type-aware encoding and controlled feature augmentation. Using this approach, we encode low-dimensional clinical metadata, high-dimensional lesion-level statistical features extracted from dermoscopic images, as well as their feature-level fusion, into grayscale image representations. The resulting image representations serve as input to CNNs, and the performance is compared with ML models trained on tabular data. Experiments conducted on the Derm7pt and PH2 datasets show that traditional ML models generally achieve the highest Area Under the Curve values, while LM-IGTD-based representations provide comparable performance and enable the use of CNNs on structured clinical data used in dermatology.

Article
Computer Science and Mathematics
Algebra and Number Theory

Kurmet Sultan

Abstract: A simple proof of Fermat’s Last Theorem (FLT) for the cube is obtained using the binomial expansion formula. It is shown that the difference between two natural numbers raised to the same natural power must be represented by an incomplete binomial formula. It is proven that the cube of a natural number cannot be represented as an incomplete binomial, which means a simple proof of FLT for n=3 has been obtained.

Article
Computer Science and Mathematics
Software

Iosif Iulian Petrila

Abstract: The augmented assembly language @Asm is proposed in order to transcend the fragmentation of architecture-specific dialects, to provide a unified framework for diverse processing paradigms as a universal assembly language and to function as a self-compiling bootstrap instrument adaptable to any processor system. The language augmentations include: flexible machine-language descriptions, general memory and data management directives, custom lexical identification through regular expressions, parsing facilities, generalized macroprocessing, flexible assembly control instructions, customizable encoding and code generation features, compiler-oriented abstraction mechanisms at the language level. The native abstraction augmentations enable expressive and concise high-level descriptions within assembly language for any present, emerging, or future systems.

Article
Computer Science and Mathematics
Mathematics

Mohamed Haj Yousef

Abstract: We develop a dual-time topological framework for the mathematical description of non-equilibrium systems, aimed at reconciling time-reversible microscopic dynamics with irreversible macroscopic behavior. The formulation introduces two independent but coupled temporal parameters: a reversible time associated with microscopic or generative dynamics, and an irreversible time governing dissipation, entropy production, and macroscopic evolution. Physical states are defined on a bi-temporal manifold, allowing reversible and irreversible processes to be treated within a unified geometric setting. Temporal evolution is described using independent temporal connections and their associated curvature. We show that nonvanishing temporal curvature induces path dependence in temporal evolution, providing a geometric origin for memory effects, non-Markovian dynamics, and aging phenomena. Temporal asymmetry emerges dynamically through symmetry breaking between the temporal sectors and through projection from the bi-temporal domain onto a single observable time parameter. The relationship between the dual-time formalism and conventional single-time non-equilibrium models is analyzed. Standard evolution equations are recovered in integrable or decoupling limits, demonstrating that the proposed framework constitutes a genuine generalization compatible with established approaches. By encoding irreversibility in the geometry and topology of temporal evolution, this work provides a mathematically consistent framework for the emergence of the arrow of time in non-equilibrium theoretical physics. Unlike conventional approaches in which irreversibility and memory are encoded phenomenologically at the level of effective equations, the present framework derives non-Markovian dynamics and temporal asymmetry from the geometry and topology of coupled temporal evolution. In particular, a representation theorem is established showing that a broad class of convolution-type non-Markovian equations arise as projections of local dual-time dynamics.

Concept Paper
Computer Science and Mathematics
Data Structures, Algorithms and Complexity

José Vicente Quiles Feliu

Abstract: We present Model G, a mathematical formalization of information spaces where coherence is an intrinsic property guaranteed by algebraic construction. We define the global space G through a triaxial structure (Attribute, Key, Connection) and a coherence operator Φ that filters the managed universe Ω. Four fundamental axioms establish existence by coherence, location uniqueness, acyclicity of the dependency graph, and determinism through the propagation vector Π and the determinant δ. We extend relational normal forms with five semantic-temporal normal forms (SRGD-FN1 to FN5). The SRGD implementation materializes the model through a three-layer stateless architecture. Experimental validation confirms impossibility of incoherent states and O(|Π|) complexity in operations.This work was initiated in December 2025 and the initial version was published on January 6, 2026, temporally coinciding with independent advances such as DeepSeek’s Engram (January 12, 2026).

Article
Computer Science and Mathematics
Computer Vision and Graphics

Shuxin Mo

,

Bowen Lou

Abstract: Image-to-PointCloud place recognition is vital for autonomous systems, yet faces challenges from the inherent modality gap and drastic environmental variations. We propose Cross-Modal Invariant Representation Learning (CMIRL) to learn highly invariant cross-modal global descriptors. CMIRL introduces an Adaptive Cross-Modal Alignment (ACMA) module, which dynamically projects point clouds based on image semantics to generate view-optimized dense depth maps. A Dual-Stream Invariant Feature Encoder, featuring a Transformer-based Cross-Modal Attention Fusion (CMAF) module, then explicitly learns and emphasizes features shared across modalities and insensitive to environmental perturbations. These fused local features are subsequently aggregated into a robust global descriptor using an enhanced multi-scale NetVLAD network. Extensive experiments on the challenging KITTI dataset demonstrate that CMIRL significantly outperforms state-of-the-art methods in terms of top-one recall and overall recall. An ablation study validates the effectiveness of each proposed module, and qualitative analysis confirms enhanced robustness under adverse conditions, including low light, heavy shadows, simulated weather, and significant viewpoint changes. Strong generalization capabilities on an unseen dataset and competitive computational efficiency further highlight CMIRL's potential for reliable long-term autonomous localization.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Liu Linsong

,

Yu Gu

,

He Zhang

,

Shuang Wang

,

Chenyu Li

,

Quan Ande

,

Haixiang Lin

Abstract: With the rapid advancement of speech emotion recognition, the transition from unimodal to multimodal approaches has become inevitable. However, multimodal methods introduce new challenges, particularly classification ambiguity in complex samples when compared to unimodal approaches. To address this, we propose a Mutual Refinement Distillation (MRD) method, which incorporates three key components: (1) Modal Interaction Calibration, enhancing classification accuracy for complex samples; (2) Interactive Learning Constraints, mitigating overfitting; and (3) Reverse Curriculum Learning, further improving model robustness. Experiments with the MELD and IEMOCAP datasets demonstrate that our approach outperforms state-of-the-art methods in emotion recognition, achieving a notable 6.07% improvement over the baseline on IEMOCAP.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Manikandan Chandran

,

Vimal Shanmuganathan

Abstract: Tariff unpredictability and logistic uncertainty are consistently becoming bigger challenges to supply chain planners as they attempt to evaluate reshoring options. Traditional evaluation methods using spreadsheet programs treat tariff and logistics costs as constant inputs and do not capture nonlinear interactions between component structures, routing decisions, and the assembly capacity. To formulate reshoring assessment as a digital twin-driven decision system, this paper presents a stochastic process optimization framework. The architecture combines automated tariff classification, stochastic landed cost simulation, and mixed-integer linear programming (MILP) to enable repeatable and auditable decision-making. Bills of materials are represented by dependency graphs, which allow one to reason at the process level about alternative assembly configurations. Operational uncertainties, such as variation in transportation, labor throughput, and volatility in tariffs, are factored into the optimization process through Monte Carlo simulation. With a synthetic yet realistic product scenario, experimental assessment shows that a cost reduction of about 9-16% and a major improvement in robustness is obtained over the static estimation methods. The findings establish that a stochastic decision process is better suited to the explicit modeling of reshoring evaluation, with respect to its scalability and resilience. The suggested framework offers a solid basis of decision support in adaptive supply chain systems.

Article
Computer Science and Mathematics
Computer Vision and Graphics

Zihan Pu

,

Linyu Bian

Abstract: Current multimodal artificial intelligence suffers from fragmentation, with models typically optimized for single tasks, impeding efficient and uniform handling of diverse tasks like Text-to-Image (T2I), Image-to-Text (I2T), and Visual Question Answering (VQA) within a single framework. To address this, we propose the Synergistic Multimodal Diffusion Transformer (SyMDit), a novel unified discrete diffusion model. SyMDit integrates an Adaptive Cross-Modal Transformer (ACMT) with a Synergistic Attention Module (SAM) for dynamic interaction, alongside Hierarchical Semantic Visual Tokenization (HSVT) for multi-scale visual understanding and Context-Aware Text Embedding with special tokens for nuanced textual representation. Trained under a unified discrete diffusion paradigm, SyMDit employs a multi-stage strategy, including advanced data augmentation and selective masking. Our extensive evaluations demonstrate that SyMDit consistently achieves superior performance across T2I, I2T, and VQA tasks, outperforming existing baselines. Furthermore, SyMDit significantly enhances inference efficiency, offering substantial speedups compared to autoregressive and prior discrete diffusion methods. This work presents a significant step towards truly unified and efficient multimodal AI, offering a robust framework for general-purpose multimodal intelligence.

of 653

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated