Computer Science and Mathematics

Sort by

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Emanuel Shirbint

,

Alexander Rybalov

Abstract: Large language models embedded in person-modelling systems, and human beings reflecting on themselves, share a structural vulnerability: both can generate fluent, narratively persuasive, yet abductively unsound accounts of a person. Building on a governed abductive architecture developed for medical patient digital twins, this paper argues that personality should be modelled neither as a stable self-description nor as a free variable of circumstance, but as a layered system in which surface self-narrative, role-specific manifestations, and recurrent attractor structures are architecturally separated. The central claim is that pressure does not create personality; it changes the evidentiary conditions under which a person is observed, revealing which self-descriptions are structurally supported and which remain conditional on comfort, low cost, or the absence of threat. We identify six recurrent modes of unsound person-modelling — missing-premise neglect, weak-mechanism support, counter-evidence discounting, narrative essentialism, contextual overfitting, and premature identity closure — each mapped to an architectural absence and a corresponding control. We specify a seven-contour governed architecture and operationalise its distinctive elements as a Pressure Diagnostic Runtime, which annotates naturally occurring or ethically consented pressure events as evidence, and an Attractor Registry, which stores recurrent if-then behavioural signatures rather than trait labels. Integrity is formalised as a bounded operational contour; self-knowledge as a discordance-detection function comparing self-report against behaviour under load; transformation as governed ontology-revision rather than re-narration; and the witness as a strictly functional, non-generating governance layer. The paper draws out implications for AI systems that model persons — provenance and staleness labels on inferred self-attributes, role-code separation, user contestability, refusal of premature identity closure, and a prohibition on covert pressure engineering. The argument is conceptual: it proposes a model and a research programme, not a diagnostic tool, a therapeutic intervention, or a metaphysical claim about the existence of a self.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Zhengke Gao

,

Yan Fu

,

Bing Ye

,

Le Chang

,

Qiyuan Zhu

,

Yancheng Liu

,

Alex Mihailidis

Abstract: With the accelerating pace of global population aging, emotion-aware technologies have become increasingly important for improving the quality of life and psychological well-being of older adults. However, most facial expression recognition (FER) systems exhibit substantial performance degradation among elderly users due to the lack of age-diverse data and inadequate model adaptation. This study investigates age-related bias in FER and proposes a subgroup-aware data augmentation framework to enhance recognition robustness for older populations. We first retrain a ResNet-50–based age estimation model using the UTK-Face dataset to provide reliable age annotations for three benchmark FER datasets: RAF-DB, AffectNet, and ExpW. Subsequently, we introduce an age-adaptive augmentation strategy that applies stronger transformations to elderly facial images while maintaining moderate augmentation for younger ones. Experimental results demonstrate that the proposed approach significantly improves recognition accuracy and generalization in elderly subgroups without sacrificing performance in younger populations. This work provides a practical and scalable pathway toward age-inclusive affective computing, highlighting the importance of integrating demo-graphic priors into data processing pipelines for fair and trustworthy emotion recognition systems.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Penyo Georgiev

Abstract: Social service professionals operate in legally sensitive, administratively intensive, and context-dependent environments in which decision-making requires the simultaneous interpretation of regulatory norms, institutional procedures, and individual case circumstances. This paper proposes a conceptual model of a Personal Legal and Social Artificial Intelligence (AI) Assistant intended to support professional decision-making in social services, and demonstrates its functionality through a working prototype. The model is formulated as a domain-specific retrieval-augmented generation (RAG) framework in which a controlled legal and social document corpus is processed through text extraction, chunking, semantic indexing via SentenceTransformer embeddings, top-k retrieval through cosine similarity, and bounded large-language-model reasoning to produce grounded and explainable responses. The proposed framework is informed by three successive prototype versions and by observed sensitivity to corpus scope, document prioritization, and prompt constraints. The current prototype version operates on a prioritized corpus of sixteen Bulgarian normative acts complemented by three supplementary resources, comprising 883 indexed fragments, and uses DeepSeek as the reasoning model accessed through the OpenRouter API. The functionality of the model is validated through a representative use case concerning child protection, in which the prototype identifies the applicable legal provisions, exposes the retrieved documentary evidence, and generates a four-part structured analysis comprising legal qualification, applicable provisions, legal consequences, and recommendations for action. The main contribution lies in the formalization and prototype-level demonstration of a domain-specific AI assistant that combines legal grounding, social-context awareness, and bounded language-model reasoning for trustworthy decision support in regulated social-service practice.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Trien Phat Tran

,

Fareed Ud Din

,

Ljiljana Brankovic

,

Cesar Sanin

,

Susan M. Hester

Abstract: Smartphone-based plant identification increasingly serves as the edge tier of agricultural Internet of Things (IoT) systems, where models must adapt to crowdsourced data under bandwidth, memory, and energy constraints. No prior work has systematically investigated continual learning at the scale of thousands of fine-grained medicinal plant species, nor how retraining frequency affects the cost–performance trade-off in an IoT model-lifecycle setting. We evaluate three continual learning strategies—naïve fine-tuning, experience replay, and Learning without Forgetting—under periodic retraining schedules (updating every K increments), tested on 2,719 species (≥25 images each) from the Viet Medi Species 2026 dataset (310,647 images; 4,799 species total). All three strategies exhibit negative forgetting (performance improvement rather than degradation) in the instance-incremental setting, with naïve fine-tuning and LwF showing the strongest gains. Periodic retraining with K=2 reduces retraining operations by approximately 50% while maintaining performance. A baseline MobileNetV2 model achieves 54.07% top-10 accuracy across 2,719 species and has been deployed via TensorFlow Lite (FP16, ∼11.5 MB) in the Med Herb Lens Android application. Naïve fine-tuning is recommended as the practical default for instance-incremental agricultural IoT deployments.

Review
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Dong Li

,

Yanchi Liu

,

Xujiang Zhao

,

Xintao Wu

,

Baoluo Meng

,

Yufei Han

,

Zhong Chen

,

Rui Meng

,

Haifeng Chen

,

Chen Zhao

Abstract: The rapid rise of Large Language Model (LLM) agents is driving a fundamental paradigm shift in Multi-Agent Systems (MAS) research, moving from manually orchestrated static architectures toward automated configuration and optimization. Despite its significant potential, this frontier lacks a systematic and rigorous survey with clearly defined operational boundaries. To address this gap, this paper provides a comprehensive review of Automated MAS Optimization, formally anchoring it as the P4 paradigm within a six-stage evolutionary framework spanning from Foundation LLMs (P0) to Agentic Swarms (P5). We introduce precise mathematical definitions for core concepts, establishing a unified MAS configuration space that encompasses agent-level, system-level, and underlying components, and formulate the optimization objective as a holistic system-utility maximization problem. Furthermore, we partition P4 into three operationally distinct sub-paradigms based on the orthogonal dimensions of optimization timing and effect persistence: Design-Time Adaptive MAS, Test-Time Adaptive MAS, and Self-Evolving MAS. Guided by this taxonomy, we systematically review over 200 state-of-the-art works, covering both general methodologies and domain-specific applications. Beyond algorithmic perspectives, we critically examine key supporting issues including benchmarking, evaluation, and safety, while analyzing the evolutionary trajectory toward decentralized, emergent P5 Agentic Swarms. Finally, we identify core open challenges and propose future research directions centered on holistic configuration co-optimization, life-cycle evaluation, endogenous safety mechanisms, and the controllable transition from P4 to P5. This survey aims to provide a rigorous theoretical foundation and strategic navigation for researchers and practitioners in this rapidly evolving field.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Sebastian Raubitzek

,

Krzysztof Werner

,

Georg Goldenits

,

Sebastian Schrittwieser

,

Kamil Wereszczyński

,

Krzysztof A. Cyran

,

Kevin Mallinger

Abstract: This paper studies neural network layers that use learnable Lie-group actions as structured feature-space transformations. Instead of treating Lie groups only as input-domain symmetry constraints, the proposed approach embeds real-valued features into local vector banks, learns coordinates in a Lie algebra, maps these coordinates to group elements through the matrix exponential, and applies the resulting matrices to intermediate feature vectors. The framework supports groups such as SO(3), SU(2), and SU(3), and can be used either as a standalone structured backbone or as a component inside conventional neural architectures. The experimental evaluation covers several settings: tabular classification, tabular regression, synthetic signal denoising, generative adversarial learning, and recursive time-series forecasting. The classification and regression studies compare dense neural baselines, MLP–Lie hybrids, deeper Lie-group architectures, CatBoost, and ExtraTrees across repeated train-validation-test splits. The denoising experiment compares a classical autoencoder with an SU(3)-based autoencoder on synthetic oscillatory signals. The GAN experiment inserts an SU(3) layer into the discriminator and compares it with a standard convolutional GAN on MNIST digit generation. The time-series experiments compare a regular Transformer, a hybrid Transformer with one Lie-group layer, a Lie-group Transformer, and CatBoost under recursive holdout forecasting. The results show that Lie-group feature transformations are useful in selected settings, but they are not uniformly superior across all tasks. In classification, the structured models improve over the dense baseline on several datasets, while tree-based methods remain strongest on others. In regression, MLP–Lie models are competitive on some tasks, but CatBoost and ExtraTrees are often stronger. The clearest improvement is observed in signal denoising, where the Lie-group autoencoder reduces reconstruction error and improves signal-to-noise ratio. In the GAN experiment, the Lie-group discriminator gives moderate improvements in stability and discriminator metrics. In time-series forecasting, Lie-group Transformer variants improve over the regular Transformer on some series, while CatBoost remains a strong rolling-window baseline. Overall, the results support a dataset-dependent interpretation. Lie-group layers can act as useful structured feature mixers, especially when local vector structure or oscillatory behavior is relevant. At the same time, their benefit depends on the task, architecture, and computational cost. The framework therefore provides a practical basis for studying when algebraic feature-space transformations improve learning and when simpler baselines are sufficient.

Review
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Umakant Singh

,

Punit Kumar Chaubey

Abstract: Precision medicine focuses on customizing diagnostic, prevention and treatment approaches by accounting for the individual characteristics of each patient. This personalization draws on diverse sources of information including clinical records, genomic data, medical imaging, lifestyle patterns and environmental factors. As the volume and complexity of such multimodal healthcare data continue to expand, machine learning (ML) and deep learning (DL) techniques have become crucial for identifying complex patterns, estimating disease risk, and supporting personalized treatment decisions. Despite their efficiency, many of these models function as opaque systems, generating forecasts without clearly indicating the reasoning behind them. This lack of transparency can undermine clinician confidence, hinder adoption in clinical practice, and raise ethical as well as regulatory concerns, particularly in healthcare contexts where decisions must be explainable and defensible. Explainable Artificial Intelligence (XAI) addresses these challenges by providing methods that make model behaviour more transparent and interpretable. Techniques such as SHAP, LIME, saliency and attention-based visualizations, counterfactual analysis, and rule-based explanations enable clinicians to inspect the rationale behind predictions, evaluate alignment with established medical knowledge, and identify potential sources of bias within data or algorithms. From a patient perspective, explain-ability improves communication, supports informed consent, and strengthens trust in AI-supported care. Regulatory authorities also depend on transparent and interpretable systems to ensure accountability, traceability and compliance with clinical safety requirements. This paper offers a comprehensive examination of explainable AI in the context of precision medicine. It introduces fundamental XAI concepts, organizes key methodological approaches, and reviews applications spanning genomics, medical imaging, and electronic health record (EHR) analytics. The chapter also discusses methods for assessing explanation quality, highlights the role of human-centred design, and addresses critical ethical and legal considerations. It concludes by outlining ongoing challenges and future research directions aimed at developing reliable, interpretable AI systems that can be effectively integrated into advanced personalized healthcare.

Review
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Yifei Wang

,

Ziteng Wang

,

Yuling Shi

,

Silin Chen

,

Xinrui Wang

,

Yueqi Wang

,

Beijun Shen

,

Linjing Li

,

Xiaodong Gu

,

Julian McAuley

+1 authors

Abstract: As Large Language Models (LLMs) evolve into autonomous agents for long-horizon tasks, managing unbounded interaction trajectories under fixed context budgets becomes a core systems challenge. Unlike standard long-context documents, agent trajectories are heterogeneous and interleave observations, reasoning traces, and tool executions, so compression must preserve temporal dependencies, actionable state, and structural fidelity. Yet existing methods remain fragmented, making it difficult to compare design choices and reason about their reliability implications. This survey introduces a unified taxonomy of agent context compression along three dimensions: compression target (what is compressed), compression mechanism (how it is transformed and retained), and control policy (who decides when compression is triggered). We further organize recurring failures in compressed execution into F1: Pre-compression Decision Error, F2: In-compression Information Loss, and F3: Post-compression Access Failure, and examine domain-specific trade-offs in software engineering, web navigation, and deep research. By unifying the design space, failure taxonomy, and evaluation perspective, this survey provides a foundation for building scalable and recoverable LLM agents. A collection of papers available at https://github.com/YerbaPage/Awesome-Context-Compression.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Gregor Herbert Wegener

Abstract: Advanced artificial intelligence systems increasingly exhibit behaviors that are not adequately captured by component-local metrics, benchmark scores, or layer-specific monitoring. Such behaviors arise across coupling surfaces, control regimes, deployment boundaries, and emergent interaction patterns, indicating that the relevant analytical object is the composed system rather than the isolated component. This article introduces \emph{SORT-AI} as a \emph{Level-0 structural assessment architecture} for advanced AI systems and as the canonical domain reference within the SORT-AI research line. The framework organizes the AI domain along four main axes: \emph{Domain} as the problem space, \emph{Cluster} as the structural problem class, \emph{Application} as a recurrent structural problem form, and \emph{Structural Dimensions} V1 to V4 as the diagnostic grammar linking observed phenomena to structural causes, effect spaces, and decision surfaces. Below the application level, the architecture admits a further diagnostic decomposition into \emph{Scenario Classes}, \emph{Metric Sets}, and a \emph{Regime Classification} that distinguishes core, boundary, and overlap regimes. Applications are therefore treated not only as recurrent structural problem forms, but also as structured regime spaces. The current AI domain comprises 52 applications distributed across five clusters: Coupling, Learning, Control, Emergence, and Evidence. To make the domain paper self-contained at the level of AI-domain interpretation, a compact mathematical basis is provided using a closed set of 22 idempotent operators, a global consistency projector, a calibrated projection kernel, and a structured projection space in which AI systems are read as operator chains on structured execution states. Within this architecture, the Core-3 applications serve as three complementary structural coupling axes: \sortapp{AI.01} expresses physical/interconnect coupling, \sortapp{AI.04} logical/runtime-control coupling, and \sortapp{AI.13} semantic/agentic coupling. Runtime Control Coherence, represented by \sortapp{AI.04}, is used as the canonical example to illustrate how locally correct control mechanisms can generate globally incoherent behavior under scale. The paper further incorporates SORT-Sovereign as a meta-domain that projects technical structural findings into strategic, regulatory, and state decision spaces. In this form, SORT-AI is positioned as a reusable Level-0 structural assessment foundation for subsequent domain-specific analyses and application-level studies across the AI domain.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

George Melville

,

Julian Yeomans

Abstract: Previous research has shown sector-conditional asymmetry in implied volatility levels and in option returns. However, no prior work has parameterized that asymmetry at the effective-theta layer in a form that fires a non-discretionary rule trigger. This study supplies that parameterization, its formulation, the first observation, and the corpus evidence. An effective theta is defined as Θe=αs,r⋅ΘBS, where ΘBS is the standard Black-Scholes (BS) theta and αs,r is a sector- and regime-conditional scaling factor. A SIMDEC decomposition is used to partition the input space and to determine the corner where α matters most. The use of SIMDEC renders all AI-created solutions free of hallucination and fully explainable. A “first observation” arising from a three-position long-call cohort traversing terminal decay is deployed using eleven intraday snapshots tracked on the trajectory at primary-source resolution. The cohort behaviour matches the α parameterisation to existing market conditions. To empirically evaluate the effectiveness of the approach, a SIMDEC L2 corpus from the same deployment supplies population-level support across 12 sectors and a three-tier quality stratification. The L2 corpus is the output of the THETA AI/ML pipeline – a multi-architecture deep-learning inference system that treats SIMDEC joint-state partitioning and Sobol variance decomposition as complementary interpretability inputs, with the regime classifier carrying the labels and the composite quality scorer carrying the stratification. The mathematical formulation and overall analysis of the asymmetry in the effective-theta provides a “next level” contribution to traditional option methodology.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Khrystyna Lipianina-Honcharenko

,

Pavlo Bykovyy

,

Myroslav Komar

,

Andriy Krysovatyy

,

Borys Yazlyuk

Abstract: Large language models (LLMs) increasingly require robust evaluation under realistic instruction-following conditions, particularly for fine-tuned task-specific adapters operating in multilingual environments. This study proposes a scenario-adaptive evaluation framework for assessing the reliability of fine-tuned text models across two application regimes: misinformation detection (disinfo) and knowledge-grounded factual biography generation (heroes). The framework integrates automated generation of balanced risk-oriented scenarios, bilingual evaluation in English and Ukrainian, the LLM-as-a-Judge paradigm, and multidimensional robustness analysis through the Alignment Robustness Index (ARI). Six LoRA-adapted models based on Qwen2.5-3B-Instruct, SmolLM2-1.7B-Instruct, and TinyLlama-1.1B-Chat-v1.0 were evaluated. The implemented pipeline generated 2052 scenarios and 6156 model responses, producing a final bilingual analytical subset of 4104 judged records. Experimental results show that task-specific adaptation produces task-dependent robustness profiles. In the disinfo case, Qwen2.5-3B achieved the strongest overall performance, combining the highest safety and classification accuracy. In contrast, the heroes case revealed a more compressed and multidimensional vulnerability space without a single dominant model. The results further demonstrate the importance of multilingual evaluation, as weaker adapters exhibited substantially larger cross-lingual safety gaps. Overall, the proposed framework provides a reproducible and practically applicable methodology for auditing fine-tuned language models under imperfect instructions.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Doru Constantin

,

Costel Bălcău

Abstract: Spiking Neural P systems provide a rule-based model of distributed computation inspired by membrane computing, while kernel P systems use guarded transformations and structured control of rule applicability. This paper introduces Convolutive Kernel-Guarded Spiking Neural P systems (CKSNP systems), a formal and trainable framework in which spike-rule applicability may depend on local kernel responses computed over ordered neighborhoods of spike multiplicities. The proposed model provides a general mechanism for local feature computation, combining explicit operational semantics with kernel-based predicates that can be fixed, selected, or embedded in trainable realizations. We define the syntax and transition semantics of the model, relate the construction to delay-free extended Spiking Neural P systems and kernel P systems under stated assumptions, and present a reproducible instantiation for electrocardiographic beat classification under a patient-independent protocol. The empirical study illustrates how CK-SN P local responses can be combined with RR, Gaussian, and Fourier descriptors and evaluated with classical and neural classifiers. Overall, the study clarifies both the formal role of guarded local computation and its practical use as an interpretable feature-generation mechanism.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Rahid Zahid Alekberli

,

Hikmat Karimov

Abstract: Background. Deploying large language models (LLMs) at the edge introduces distributional output drift that existing monitoring approaches cannot detect within the latency and resource constraints of safety-critical autonomous systems [3,4]. The Kerimov–Alekberli (K–A) information-geometric framework proposes a First-Passage Time (FPT) criterion grounded in the Fisher Information Metric (FIM) to detect such drift [5,6]. No multi-run, statistically characterised empirical validation of K–A on edge hardware has previously been reported. Methods. We present a Phase 1 proxy-KL validation of the K–A proxy-gated token-budget criterion across five open-source LLMs (2.0–17.4 GB, Q4_K_M quantisation) deployed via Ollama v0.23.2 on an Apple M5 unified-memory workstation (32 GB, macOS 26.0). A response-level proxy instability score ˆDKL(r) = max(0.004, 0.016+h(r)·0.015+0.10/(w(r)+1) is computed on a completed baseline response; if it exceeds τFIM = 0.065 (above-FIM), a separate capped-regeneration call with Nka = ⌊Nbase/2⌋ provides a counterfactual token-budget estimate. Energy is proxy-estimated via ˆPm = Pbase + βSGB (R2 = 0.97). Results. After exclusion of 14 degenerate evaluations (6.4% of 220 above-FIM cases), Pearson r = 0.806 and Spearman ρ = 0.728 (n = 28, p<0.001) between FPT trigger rate and token saving confirm implementation consistency. Bootstrap 95% CIs: llama3.2 34.0 ± 4.0% [31.9, 36.3] (n = 12); gemma3:latest 34.6 ± 2.9% [32.5, 36.6] (n = 6); gemma3:27b 30.8 ± 5.7% [27.4, 34.8] (n = 8). Supplementary controlled validation (370 stored-response evaluations) confirms 100% exact-match quality for factual prompts, and reveals zero proxy-FPT triggers under deterministic and fixed-seed decoding. Conclusions. The K–A surfaceproxy proxy-gated criterion produces statistically characterised token reductions across three model families under stochastic decoding. A key central limitation: the surface proxy requires stochastic response-length variation to trigger; it does not detect geometric distributional instability. Phase 2 must replace the surface proxy with direct logit-level DKL computation.

Technical Note
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Gregor Wegener

Abstract: This technical note introduces a reproducible kernel-damping evidence protocol for the SORT-AI Core-3 applications AI.01 (Interconnect Stability Control), AI.04 (Runtime Control Coherence), and AI.13 (Agentic System Stability). These applications span complementary structural coupling regimes in advanced AI systems: physical/interconnect coupling, logical/runtime-control coupling, and semantic/agentic coupling. The protocol evaluates whether declared structural risk-transition scenarios admit a Gaussian kernel-damping reconstruction under the declared canonical SORT scale parameter σ 0 = 0.00190643. The analysis is restricted to the structural analysis layer and does not claim production deployment, vendor-specific measurement, empirical benchmarking, runtime optimization, or execution by MOCK v4. MOCK v4 is treated as the frozen structural reference architecture, not as a runtime engine. The accompanying archived evidence release contains machine-readable scenario inputs, declared risk-transformation rules, executable scripts, expected outputs, generated outputs, and a reproduction manifest sufficient to reproduce all reported κ, ξ, scenario-level means, sample dispersions, and coefficients of variation. The contribution is methodological: the note formalizes a reproducibility protocol through which SORT-AI Core-3 applications can be tested as structurally defined damping regimes without converting MOCK v4 into an execution environment or introducing a new MOCK version.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Qingyun Sun

,

Haonan Yuan

,

Yi Huang

,

Ziwei Zhang

,

Xingcheng Fu

,

Ruijie Wang

,

Haoyi Zhou

,

Jia Wu

,

Jianxin Li

,

Philip S Yu

Abstract: Foundation models have emerged as a dominant paradigm in machine learning, enabling broad generalization and efficient adaptation across diverse tasks and domains. While this paradigm has achieved remarkable success in language and vision data, its extension to structured data remains far less understood. Foundation models for structured data are an emerging yet highly impactful research area with a rapidly growing body of literature. In this survey, we provide a systematic analysis of foundation models for structured data, focusing on tabular, time series, and graph data, covering over 150 representative methods. We analyze the intrinsic properties and inductive biases of structured data, clarify the core concepts of foundation models, and conduct an in-depth analysis of the key challenges that hinder the development of foundation models for structured data. Building on these insights, we organize existing approaches into a coherent taxonomy based on tokenization, architectures, pre-training objectives, and adaptation strategies. Finally, we discusse merging research directions and open problems, aiming to provide guidance toward more principled and scalable foundation models for structured data.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Chang Liu

,

Haibo Jin

Abstract: Recently, Mamba based on State Space Models (SSMs) has shown great potential for hyperspectral image (HSI) classification due to its long-range modeling capability and linear complexity. However, existing Mamba-based methods usually employ fixed and limited scanning directions, restricting anisotropic spatial modeling. Moreover, full-pixel scanning introduces substantial computational redundancy. To address these issues, this paper proposes DESDA-Mamba, a direction-adaptive Mamba network with diagonal-enabled strided scanning for HSI classification. Specifically, a lightweight direction adaptation module is designed to implicitly predict suitable scanning directions from learned direction-sensitive feature-channel responses and perform batch-level unified direction aggregation, revealing that finer patch-level direction routing does not necessarily improve performance. In addition, a strided scanning strategy is introduced to skip redundant adjacent pixels during sequence serialization, reducing computational cost while enlarging the effective receptive field. Furthermore, two diagonal scanning modes, namely main-diagonal and anti-diagonal scanning, are proposed to improve the modeling of oblique spatial structures. Efficient diagonal scanning is implemented through coordinate-sequence indexing and caching mechanisms, enabling flexible diagonal strided scanning. Extensive comparison, ablation, and model-variant experiments on four public HSI datasets demonstrate that DESDA-Mamba achieves superior classification performance with competitive efficiency. The source code is available at https://github.com/ll-netizen/DESDA-MAMBA.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Wenbin Meng

,

Ming Xu

Abstract: Precise semantic matching between natural language queries and unconstrained videos remains a fundamental yet unresolved challenge in multimedia retrieval. Although recent transformer-based dual encoders and CLIP-style contrastive frameworks have improved global text–video alignment, they still struggle in complex scenes where (i) spatiotemporal cues are highly entangled among objects, motion patterns, and background context, and (ii) cross-modal interactions are easily biased by spurious correlations, resulting in brittle retrieval performance under compositional or ambiguous language. To overcome these limitations, we propose a unified framework that enhances text–video correspondence through three closely coupled components: Query-adaptive Semantic Routing (QSR), Counterfactual Bi-directional Alignment (CBA), and Temporal Causal Regularization (TCR). QSR introduces a query-conditioned routing mechanism that decomposes video representations into multiple semantic experts and dynamically assigns token-level relevance, allowing the model to selectively emphasize appearance, motion, and contextual cues according to the textual query. Based on the routed representations, CBA performs reciprocal attention in both text-to-video and video-to-text directions, while introducing a counterfactual alignment branch to suppress background-driven shortcuts; this encourages robust matching based on causal evidence rather than incidental correlations. Finally, TCR imposes temporal causality-aware consistency by penalizing alignment instability under lightweight temporal perturbations, thereby improving motion sensitivity without requiring dense frame sampling. For scalable deployment, we further incorporate parameter sharing across experts and quantization-friendly projections, achieving a favorable accuracy–latency trade-off. Experiments on MSR-VTT, MSVD, and VATEX demonstrate consistent improvements over strong baselines, achieving Recall@1 scores of 55.0%, 60.3%, and 68.5%, respectively, while maintaining high inference efficiency.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Paolo Pagliuca

Abstract: (1) Background: Evolutionary Strategies (ESs) are optimization metaheuristics largely adopted in Evolutionary Computation (EC). Since their introduction in early 70s, researchers in the field attempted to improve the efficacy of these algorithms. The most advanced ESs, such as Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) and Exponential Natural Evolution Strategies (xNES), make use of covariance matrices storing relationships between parameters to be optimized, which enable the algorithms to fasten the search in the solution spaces. However, the computational cost of calculating covariance matrices linearly scales with the number of parameters. Recently, OpenAI Evolutionary Strategy (OpenAI-ES) emerged as an effective ES in different domains, thanks to the parameter information stored in two momentum vectors. Furthermore, OpenAI-ES gains an advantage from the usage of symmetric sampling and weight decay techniques. (2) Methods: In this work, we delve into the application of symmetric sampling and weight decay to CMA-ES, xNES and Separable Natural Evolution Strategies (sNES), with the aim to improve their performance in domains in which they get stuck in local minima outcomes. Specifically, we propose three novel variants for each ES and verify their efficacy with respect to the Pybullet halfcheetah and hopper robot locomotion problems, and two collective tasks (i.e., swarm aggregation and swarm foraging). (3) Results: Our findings reveal that symmetric sampling produces performance enhancements in all the domains, whereas the effect of weight decay varies across the considered problems. Furthermore, symmetric sampling allows ESs to keep parameter size limited, which is paramount in these scenarios. (4) Conclusions: This research identifies techniques enhancing the success of modern ESs, proposes several ES variants, and discusses relationship between algorithmic performance and task properties.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Tao Jingchu

,

Abdul Salam Shah

,

Aisha Farooq

Abstract: The given research paper is an end-to-end architecture of grayscale clothing image classification with a lightweight Convolutional Neural Network (CNN) with the Fashion-MNIST dataset. Its architecture consists of three convolutional layers with Batch normalization to stabilize training, Dropout to avoid overfitting, MaxPooling to reduce spatial, and data augmentation (random rotation, shifting, zooming, flipping) to increase the effective training set. Early Stopping callback was used to terminate training when the validation performance leveled off. The model obtained 88.63%. test accuracy, which indicates that a tailor-crafted lightweight CNN can be used to perform competitively on Fashion-MNIST without resorting to complex heavyweight architectures. The precision and F1-scores were high when it came to categories that had distinct visual characteristics (trousers, sandals, bags) and categories with similar textures and outlines (T-shirts, pullovers, coat) were likely to be misclassified. The paper also contextualizes these findings concerning the development of CNN architecture of LeNet-5 to AlexNet and VGGNet, and explains the implications of the results to the effective use of AI in resource-restricted settings.

Article
Computer Science and Mathematics
Artificial Intelligence and Machine Learning

Low Hong Yi

,

Abdul Salam Shah

,

Manzoor Hussain

Abstract: The given research paper describes a CNN model of classifying images belonging to more than two classes on the Fashion-MNIST data. The model performed a test accuracy of 92.44% and test loss of 0.2533 the greatest accuracy as compared to similar studies with similar architectures. The architecture has three convolutional-pooling blocks, a dense layer with dropout regularization (0.3), and a softmax output layer. The analysis of training and validation curves demonstrates mild overfitting of the later epochs, and the validation loss starts growing even though the training loss continues to decrease. In-depth analysis using confusion matrix and classification report identifies certain patterns of misclassification between visually similar categories. The paper also discusses implications on batch normalization, data augmentation as well as Vision Transformer architecture.

of 250

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated